Page MenuHomeSoftware Heritage
Feed Advanced Search

Oct 12 2020

vlorentz added a parent task for T2686: Use hashes for all kafka keys: T2668: Package loaders should write extrinsic metadata on directories instead of revisions/releases.
Oct 12 2020, 1:06 PM · Data Model, Storage manager
vlorentz updated the task description for T2686: Use hashes for all kafka keys.
Oct 12 2020, 1:05 PM · Data Model, Storage manager
vlorentz added a parent task for T2686: Use hashes for all kafka keys: T2520: Setup dedicated kafka cluster on new rocquencourt hardware.
Oct 12 2020, 1:04 PM · Data Model, Storage manager
vlorentz triaged T2686: Use hashes for all kafka keys as Normal priority.
Oct 12 2020, 1:04 PM · Data Model, Storage manager
vlorentz added a comment to T2668: Package loaders should write extrinsic metadata on directories instead of revisions/releases.

@rdicosmo a full example of what?

Oct 12 2020, 10:57 AM · Package Loader, Storage manager, Extrinsic metadata
rdicosmo added a comment to T2668: Package loaders should write extrinsic metadata on directories instead of revisions/releases.

The suggestion was to have extrinsic metadata on directories that come from a deposit of a bundle (e.g. .tar.gz or .zip file coming from HAL), instead of on a synthetic revision as is currently the case, so they can be accessed knowing the hash of the directory (which is an intrinsic id).

Oct 12 2020, 10:44 AM · Package Loader, Storage manager, Extrinsic metadata

Oct 8 2020

vlorentz added a comment to T2668: Package loaders should write extrinsic metadata on directories instead of revisions/releases.

Alternatively, we could keep writing the metadata on revision/releases, and use the provenance service (when it's ready) to find them from a directory SWHID. What do you think?

Oct 8 2020, 11:47 AM · Package Loader, Storage manager, Extrinsic metadata

Oct 6 2020

vlorentz updated the task description for T2668: Package loaders should write extrinsic metadata on directories instead of revisions/releases.
Oct 6 2020, 10:45 AM · Package Loader, Storage manager, Extrinsic metadata
rdicosmo updated subscribers of T2668: Package loaders should write extrinsic metadata on directories instead of revisions/releases.
Oct 6 2020, 10:37 AM · Package Loader, Storage manager, Extrinsic metadata
vlorentz renamed T2668: Package loaders should write extrinsic metadata on directories instead of revisions/releases from Package loaders write extrinsic metadata on directories instead of revisions/releases to Package loaders should write extrinsic metadata on directories instead of revisions/releases.
Oct 6 2020, 10:30 AM · Package Loader, Storage manager, Extrinsic metadata
vlorentz triaged T2668: Package loaders should write extrinsic metadata on directories instead of revisions/releases as Normal priority.
Oct 6 2020, 10:30 AM · Package Loader, Storage manager, Extrinsic metadata

Oct 1 2020

zack added a subtask for T1926: FUSE filesystem to navigate the archive: T2654: modprobe fuse on the CI build machine.
Oct 1 2020, 2:12 PM · Software Heritage filesystem

Sep 29 2020

zack added a revision to T1926: FUSE filesystem to navigate the archive: D4064: Early FUSE implementation, with support for blob and directory objects.
Sep 29 2020, 9:38 AM · Software Heritage filesystem
zack changed the status of T1926: FUSE filesystem to navigate the archive from Open to Work in Progress.
Sep 29 2020, 9:38 AM · Software Heritage filesystem

Sep 25 2020

tenma closed T2287: Improve code in BufferingProxyStorage as Resolved.
Sep 25 2020, 4:47 PM · Easy hack, Storage manager
zack added a revision to T1926: FUSE filesystem to navigate the archive: D4042: docs: add design notes.
Sep 25 2020, 4:03 PM · Software Heritage filesystem

Sep 24 2020

vlorentz added a parent task for T1910: Redesign origin search using a dedicated component (swh-search): T1117: Origin search is *slow* when you look for very common words.
Sep 24 2020, 11:05 AM · Archive search, Storage manager
vlorentz added a subtask for T1117: Origin search is *slow* when you look for very common words: T1910: Redesign origin search using a dedicated component (swh-search).
Sep 24 2020, 11:05 AM · Web app, Storage manager
vlorentz claimed T1117: Origin search is *slow* when you look for very common words.
Sep 24 2020, 11:05 AM · Web app, Storage manager
vlorentz added a comment to T1117: Origin search is *slow* when you look for very common words.

aka T1910

Sep 24 2020, 11:05 AM · Web app, Storage manager

Sep 23 2020

ardumont closed D4014: pytest_plugin: Change dbname to storage to avoid clash in tests.
Sep 23 2020, 1:58 PM · Storage manager
douardda accepted D4014: pytest_plugin: Change dbname to storage to avoid clash in tests.
Sep 23 2020, 1:11 PM · Storage manager
tenma added a revision to T2287: Improve code in BufferingProxyStorage: D4017: Improve code quality and doc in BufferedProxyStorage.
Sep 23 2020, 1:08 PM · Easy hack, Storage manager
swh-public-ci added a comment to D4014: pytest_plugin: Change dbname to storage to avoid clash in tests.

Build is green

Sep 23 2020, 12:35 PM · Storage manager
ardumont updated the diff for D4014: pytest_plugin: Change dbname to storage to avoid clash in tests.

Rename tests db to "storage"

Sep 23 2020, 12:28 PM · Storage manager
ardumont added inline comments to D4014: pytest_plugin: Change dbname to storage to avoid clash in tests.
Sep 23 2020, 12:06 PM · Storage manager
ardumont added inline comments to D4014: pytest_plugin: Change dbname to storage to avoid clash in tests.
Sep 23 2020, 12:02 PM · Storage manager
ardumont added inline comments to D4014: pytest_plugin: Change dbname to storage to avoid clash in tests.
Sep 23 2020, 12:00 PM · Storage manager
ardumont added a comment to D4014: pytest_plugin: Change dbname to storage to avoid clash in tests.

it's unclear to me if the hunk in swh_storage_backend_config is related with the fix or not.

Sep 23 2020, 12:00 PM · Storage manager
swh-public-ci added a comment to D4014: pytest_plugin: Change dbname to storage to avoid clash in tests.

Build is green

Sep 23 2020, 11:58 AM · Storage manager
douardda added a comment to D4014: pytest_plugin: Change dbname to storage to avoid clash in tests.

it's unclear to me if the hunk in swh_storage_backend_config is related with the fix or not.
isn't the very last diff hunk enough for the job?

Sep 23 2020, 11:56 AM · Storage manager
ardumont updated the diff for D4014: pytest_plugin: Change dbname to storage to avoid clash in tests.
  • Drop unneeded postgresql_proc
Sep 23 2020, 11:54 AM · Storage manager
ardumont updated the summary of D4014: pytest_plugin: Change dbname to storage to avoid clash in tests.
Sep 23 2020, 11:52 AM · Storage manager
ardumont added a comment to P773 storage check at startup in deposit tests: dbversion discrepancy make the tests refuse to start....

Related to D4013

Sep 23 2020, 10:55 AM · Storage manager
ardumont updated subscribers of P773 storage check at startup in deposit tests: dbversion discrepancy make the tests refuse to start....

More information:

(Pdb) cfg['storage']
{'check_config': {'check_write': True}, 'cls': 'local', 'db': 'postgresql://postgres@127.0.0.1:16607/tests', 'objstorage': {'args': {}, 'cls': 'memory'}}
(Pdb) cfg['scheduler']
{'args': {'db': "user=postgres password=xxx dbname=tests host=127.0.0.1 port=16607 options=''"}, 'cls': 'local'}
Sep 23 2020, 10:55 AM · Storage manager
ardumont created P773 storage check at startup in deposit tests: dbversion discrepancy make the tests refuse to start....
Sep 23 2020, 10:08 AM · Storage manager

Sep 22 2020

olasd placed T1117: Origin search is *slow* when you look for very common words up for grabs.

This is very probably superseded by @vlorentz 's work on swh.search.

Sep 22 2020, 4:47 PM · Web app, Storage manager
ardumont triaged T2622: Proxy storages: Split storage.'*_missing' calls in chunks as Normal priority.
Sep 22 2020, 9:45 AM · Storage manager

Sep 21 2020

vlorentz closed T2053: support graph export for the cassandra backend as Resolved.
Sep 21 2020, 3:39 PM · Compressed graph service, Storage manager

Sep 18 2020

tenma claimed T2287: Improve code in BufferingProxyStorage.
Sep 18 2020, 11:59 AM · Easy hack, Storage manager

Sep 17 2020

olasd added a comment to T2604: Handle multiple "database profiles" in the swh-storage (/...) SQL configurations.

Btw, sqitch uses the native scripting for the database engine (in case of postgres, psql scripts), so if we ever end up using it, we can keep this approach.

Sep 17 2020, 8:34 PM · Storage manager
olasd added a comment to T2604: Handle multiple "database profiles" in the swh-storage (/...) SQL configurations.

So, I've first attempted something along the lines of https://www.depesz.com/2008/06/18/conditional-ddl/, which uses a function to execute DDL commands stored in a string.

Sep 17 2020, 8:26 PM · Storage manager
olasd added a revision to T2604: Handle multiple "database profiles" in the swh-storage (/...) SQL configurations: D3981: Support different database flavors in the SQL scripts.
Sep 17 2020, 8:05 PM · Storage manager

Sep 16 2020

olasd triaged T2604: Handle multiple "database profiles" in the swh-storage (/...) SQL configurations as Normal priority.
Sep 16 2020, 5:32 PM · Storage manager

Sep 15 2020

vlorentz triaged T2602: Investigate how to upgrade the schema of the Cassandra storage as Normal priority.
Sep 15 2020, 1:56 PM · Storage manager
seirl added a comment to T2600: SQL storage: experiment with flattened layouts for directory nodes.

We considered three possibilities for the schema (assuming that we want to get rid of the three separate tables for dir_entries, rev_entries and file_entries -- otherwise, there's 6 possibilities).

Sep 15 2020, 1:40 PM · Storage manager
zack triaged T2600: SQL storage: experiment with flattened layouts for directory nodes as Normal priority.
Sep 15 2020, 12:53 PM · Storage manager

Sep 14 2020

vlorentz triaged T2590: Finish the indexer -> swh-search pipeline as Normal priority.
Sep 14 2020, 5:39 PM · Journal, Archive search
ardumont closed T2547: pgstorage: Migrate db to storage 0.13.2 (db versions 160, 161) as Resolved.
Sep 14 2020, 1:09 PM · System administration, Storage manager
ardumont added a comment to T2547: pgstorage: Migrate db to storage 0.13.2 (db versions 160, 161).

everything ran now including and up to D3936.

Sep 14 2020, 12:24 PM · System administration, Storage manager
ardumont added a revision to T2547: pgstorage: Migrate db to storage 0.13.2 (db versions 160, 161): D3936: sql: Make the extra_headers not null a constraint.
Sep 14 2020, 11:00 AM · System administration, Storage manager
zack assigned T1926: FUSE filesystem to navigate the archive to haltode.
Sep 14 2020, 9:59 AM · Software Heritage filesystem

Sep 10 2020

ardumont changed the status of T2547: pgstorage: Migrate db to storage 0.13.2 (db versions 160, 161) from Open to Work in Progress.

started back the migration, this time running from a shared tmux session on belvedere (before it was with my user):

Sep 10 2020, 4:58 PM · System administration, Storage manager

Sep 9 2020

ardumont added a comment to T2547: pgstorage: Migrate db to storage 0.13.2 (db versions 160, 161).

revisions done so far (as of yesterday, when it got stopped):

Sep 9 2020, 5:39 PM · System administration, Storage manager
ardumont changed the status of T2547: pgstorage: Migrate db to storage 0.13.2 (db versions 160, 161) from Work in Progress to Open.
Sep 9 2020, 11:20 AM · System administration, Storage manager
ardumont added a project to T2547: pgstorage: Migrate db to storage 0.13.2 (db versions 160, 161): System administration.
Sep 9 2020, 11:18 AM · System administration, Storage manager
ardumont updated the task description for T2547: pgstorage: Migrate db to storage 0.13.2 (db versions 160, 161).
Sep 9 2020, 11:17 AM · System administration, Storage manager
ardumont added a comment to T2547: pgstorage: Migrate db to storage 0.13.2 (db versions 160, 161).

stand-by related to T2561

Sep 9 2020, 11:13 AM · System administration, Storage manager

Sep 8 2020

vlorentz closed T2556: using InMemoryStorage.directory_ls may return `status: None` as Resolved.
Sep 8 2020, 2:14 PM · Storage manager

Sep 7 2020

zack raised the priority of T1926: FUSE filesystem to navigate the archive from Wishlist to Normal.
Sep 7 2020, 10:59 AM · Software Heritage filesystem

Sep 6 2020

zack updated subscribers of T1926: FUSE filesystem to navigate the archive.

Noting down that I had a tentative very preliminary implementation in the feature/fuse branch of swh-graph; see in particular fuse.py there.
It's probably no worth picking up and we should restart from scratch at this point, but might still contain useful material.
(The webclient in there has since become a proper thing, see T2279. So that part is definitely obsolete.)

Sep 6 2020, 4:48 PM · Software Heritage filesystem

Sep 4 2020

ardumont added a revision to T645: Type swh-storage endpoints with swh.model objects: D3883: algos.diff: Add missed revision_get conversion.
Sep 4 2020, 3:37 PM · Data Model, Storage manager
vlorentz removed a project from T2564: migrate existing revisions metadata extra_headers to actual extra_headers field: Datasets.
Sep 4 2020, 11:34 AM · System administration, Storage manager
ardumont added projects to T2564: migrate existing revisions metadata extra_headers to actual extra_headers field: Storage manager, Datasets.
Sep 4 2020, 11:30 AM · System administration, Storage manager

Sep 3 2020

ardumont added a revision to T645: Type swh-storage endpoints with swh.model objects: D3877: Adapt storage.revision_get calls according to latest api change.
Sep 3 2020, 5:48 PM · Data Model, Storage manager
vlorentz added a revision to T2556: using InMemoryStorage.directory_ls may return `status: None` : D3874: directory_ls: Don't return None for status/length/sha1/... if the content is known but skipped..
Sep 3 2020, 3:24 PM · Storage manager
ardumont added a revision to T645: Type swh-storage endpoints with swh.model objects: D3870: Adapt storage.revision_get calls according to latest api change.
Sep 3 2020, 1:26 PM · Data Model, Storage manager
ardumont added a revision to T645: Type swh-storage endpoints with swh.model objects: D3869: test_loader: Adapt to latest storage revision_get change.
Sep 3 2020, 1:20 PM · Data Model, Storage manager

Sep 2 2020

ardumont added a revision to T645: Type swh-storage endpoints with swh.model objects: D3868: loader: Adapt to latest storage revision_get change.
Sep 2 2020, 6:39 PM · Data Model, Storage manager
ardumont added a revision to T645: Type swh-storage endpoints with swh.model objects: D3865: metadata: Adapt to latest storage revision_get change.
Sep 2 2020, 4:08 PM · Data Model, Storage manager
ardumont added a revision to T645: Type swh-storage endpoints with swh.model objects: D3864: migrations: Adapt according to latest storage revision_get api change.
Sep 2 2020, 3:49 PM · Data Model, Storage manager
ardumont added a revision to T645: Type swh-storage endpoints with swh.model objects: D3863: Refactor revision_get storage API to return Revision objects.
Sep 2 2020, 3:25 PM · Data Model, Storage manager
vlorentz triaged T2556: using InMemoryStorage.directory_ls may return `status: None` as Normal priority.
Sep 2 2020, 2:53 PM · Storage manager

Aug 31 2020

ardumont added a revision to T645: Type swh-storage endpoints with swh.model objects: D3854: swh.web: Adapt to latest storage release_get api change.
Aug 31 2020, 4:44 PM · Data Model, Storage manager
vlorentz raised the priority of T2548: Restore CRAN visits deleted in january 2020 from backups from Normal to High.
Aug 31 2020, 4:37 PM · Storage manager, System administration, Origin-CRAN
ardumont added a revision to T645: Type swh-storage endpoints with swh.model objects: D3853: test_loader: Adapt to latest storage release_get change.
Aug 31 2020, 3:49 PM · Data Model, Storage manager
ardumont added a revision to T645: Type swh-storage endpoints with swh.model objects: D3852: storage*: release_get(...) -> List[Optional[Release]].
Aug 31 2020, 3:42 PM · Data Model, Storage manager

Aug 28 2020

vlorentz triaged T2549: Restore Mercurial visits deleted in august 2018 from backups as High priority.
Aug 28 2020, 4:09 PM · Storage manager, System administration, Mercurial loader
vlorentz updated subscribers of T2549: Restore Mercurial visits deleted in august 2018 from backups.
Aug 28 2020, 4:09 PM · Storage manager, System administration, Mercurial loader
vlorentz added a project to T2548: Restore CRAN visits deleted in january 2020 from backups: Storage manager.
Aug 28 2020, 4:09 PM · Storage manager, System administration, Origin-CRAN
vlorentz created T2549: Restore Mercurial visits deleted in august 2018 from backups.
Aug 28 2020, 4:09 PM · Storage manager, System administration, Mercurial loader
ardumont added a comment to T2547: pgstorage: Migrate db to storage 0.13.2 (db versions 160, 161).

ETA as of 28/08/2020 with 2 update process concurrently running (and workers) of ~55 days (actual speed around 1.5M revisions per hour, which is not fast but steady and replication compliant).
I have started a 3rd update process, i'll update the task with some ETA update on monday.

Aug 28 2020, 4:06 PM · System administration, Storage manager
ardumont updated the task description for T2547: pgstorage: Migrate db to storage 0.13.2 (db versions 160, 161).
Aug 28 2020, 2:13 PM · System administration, Storage manager
ardumont renamed T2547: pgstorage: Migrate db to storage 0.13.2 (db versions 160, 161) from pgstorage: Migrate db to storage 0.13.2 (db version 160 + 161) to pgstorage: Migrate db to storage 0.13.2 (db versions 160, 161).
Aug 28 2020, 11:23 AM · System administration, Storage manager
ardumont updated the task description for T2547: pgstorage: Migrate db to storage 0.13.2 (db versions 160, 161).
Aug 28 2020, 11:23 AM · System administration, Storage manager
ardumont updated the task description for T2547: pgstorage: Migrate db to storage 0.13.2 (db versions 160, 161).
Aug 28 2020, 11:21 AM · System administration, Storage manager
ardumont changed the status of T2547: pgstorage: Migrate db to storage 0.13.2 (db versions 160, 161) from Open to Work in Progress.
Aug 28 2020, 11:20 AM · System administration, Storage manager

Aug 27 2020

ardumont added a comment to T2478: backfill origin-visit and origin-visit-status topics.

i guess so, yes.

Aug 27 2020, 5:18 PM · Storage manager, Data Model
ardumont added a comment to T645: Type swh-storage endpoints with swh.model objects.

well, storage is typed now but some endpoints remains inconsistent (as T645#47156 explicits with a unification proposal which is not done yet, aside content_get_data and content_get_metadata)

Aug 27 2020, 5:11 PM · Data Model, Storage manager
douardda closed T2478: backfill origin-visit and origin-visit-status topics, a subtask of T2310: Make origin visits immutable, as Wontfix.
Aug 27 2020, 4:43 PM · Storage manager, Data Model
douardda closed T2478: backfill origin-visit and origin-visit-status topics as Wontfix.

I guess this task can be closed, since this backfilling process will be part of the one we will run soon to fill the new kafka cluster

Aug 27 2020, 4:43 PM · Storage manager, Data Model
douardda added a comment to T645: Type swh-storage endpoints with swh.model objects.

what's missing for this task to be closed?

Aug 27 2020, 4:32 PM · Data Model, Storage manager
ardumont edited P747 160-161-bis.sql.
Aug 27 2020, 11:34 AM · Storage manager

Aug 26 2020

vlorentz moved T2514: Add raw_extrinsic_metadata to the journal backfiller from Backlog to Done on the Roadmap 2020 board.
Aug 26 2020, 5:00 PM · Journal, Storage manager, Roadmap 2020
ardumont updated the title for P747 160-161-bis.sql from 160-bis.sql to 160-161-bis.sql.
Aug 26 2020, 12:25 PM · Storage manager
ardumont edited P747 160-161-bis.sql.
Aug 26 2020, 12:20 PM · Storage manager
ardumont added a comment to P747 160-161-bis.sql.

actually again modified so we do the 3 update queries (160, 161) next to each other to avoid caveats [1].

Aug 26 2020, 12:10 PM · Storage manager

Aug 25 2020

ardumont added a comment to T2524: Storage database migration tooling.

I don't know where to add the link about this. I started a wip documentation
with pieces of what happens for a worker or databage (data) upgrades on the
intranet [1]

Aug 25 2020, 4:37 PM · Storage manager
ardumont edited P747 160-161-bis.sql.
Aug 25 2020, 4:18 PM · Storage manager
ardumont edited P747 160-161-bis.sql.
Aug 25 2020, 4:18 PM · Storage manager