Do you actually want to keep these objects? This would be inconsistent with the fixed loader behavior that would just reject those objects, and not load the repository at all.

Oct 25 2022, 6:06 PM · Archive integrity, Object storage, Data Model

vlorentz changed the visibility for T4644: replayer crashes on invalid directory entry name (b'gitter/gitter.xml').

Oct 25 2022, 5:57 PM · Archive integrity, Object storage, Data Model

vlorentz updated subscribers of T4644: replayer crashes on invalid directory entry name (b'gitter/gitter.xml').

I tried to add a workaround in the backfiller, but it is incredibly hard to do properly, especially as entries as disordered, so raw_manifest needs to be fixed in two different ways.

Oct 25 2022, 5:41 PM · Archive integrity, Object storage, Data Model

Oct 24 2022

vlorentz added a revision to T2309: Add support for other hash algo than sha1 in current objstorage implementation: D8756: azure: Add tests based on Azurite in addition to mocks.

Oct 24 2022, 2:45 PM · Object storage

Oct 19 2022

gitlab-migration closed T3703: [objstorage] Be able to configure to access permissions through keycloak as Migrated.

This task has been migrated to GitLab.

Oct 19 2022, 6:04 PM · System administration, Object storage

gitlab-migration changed the status of T3702: [objstorage] Support a basic authentication configuration from Resolved to Migrated.

This task has been migrated to GitLab.

Oct 19 2022, 6:04 PM · System administration, Object storage

gitlab-migration closed T3477: Add alerting when the copy to S3 starts lagging, a subtask of T3085: Complete and updated copy of the archive on S3 (objects+graph), as Migrated.

Oct 19 2022, 6:03 PM · Roadmap 2022, meta-task, Roadmap 2021, System administration, Object storage

gitlab-migration closed T3430: check_config is not supported by the azure backend as Migrated.

This task has been migrated to GitLab.

Oct 19 2022, 6:03 PM · System administration, Object storage

gitlab-migration closed T3085: Complete and updated copy of the archive on S3 (objects+graph) as Migrated.

This task has been migrated to GitLab.

Oct 19 2022, 6:01 PM · Roadmap 2022, meta-task, Roadmap 2021, System administration, Object storage

gitlab-migration changed the status of T1954: Up-to-date objstorage mirror on S3 from Resolved to Migrated.

This task has been migrated to GitLab.

Oct 19 2022, 5:56 PM · System administration, Object storage

gitlab-migration changed the status of T1954: Up-to-date objstorage mirror on S3, a subtask of T3085: Complete and updated copy of the archive on S3 (objects+graph), from Resolved to Migrated.

Oct 19 2022, 5:56 PM · Roadmap 2022, meta-task, Roadmap 2021, System administration, Object storage

Oct 18 2022

vlorentz added a parent task for T4644: replayer crashes on invalid directory entry name (b'gitter/gitter.xml'): T2033: Run Cassandra storage backend with production data.

Oct 18 2022, 3:40 PM · Archive integrity, Object storage, Data Model

vlorentz triaged T4644: replayer crashes on invalid directory entry name (b'gitter/gitter.xml') as High priority.

Oct 18 2022, 3:39 PM · Archive integrity, Object storage, Data Model

swh-sentry-integration assigned T4644: replayer crashes on invalid directory entry name (b'gitter/gitter.xml') to vlorentz.

Oct 18 2022, 3:39 PM · Archive integrity, Object storage, Data Model

Aug 23 2022

vlorentz added a revision to T4402: Pass dict of hashes instead of single sha1 to objstorage.get(): D8286: Pass 'obj_id' argument to objstorage.add().

Aug 23 2022, 10:34 AM · Object storage

Jul 19 2022

vlorentz added revisions to T4403: Update objstorage interface to return dicts of hashes instead of single sha1: D8009: Make obj_id argument of ObjStorage.add() required, D8013: Drop the now unused add_stream and get_stream methods, D8017: Make add() and restore() return None instead of ObjId, D8026: Remove get_random(), D8029: Start introducing composite ObjId in the interface, D8074: Remove ID-based filters, D8076: Make __iter__ actually return composite objids.

Jul 19 2022, 3:19 PM · Object storage

vlorentz triaged T4403: Update objstorage interface to return dicts of hashes instead of single sha1 as Normal priority.

Jul 19 2022, 3:18 PM · Object storage

vlorentz added a revision to T4402: Pass dict of hashes instead of single sha1 to objstorage.get(): D8029: Start introducing composite ObjId in the interface.

Jul 19 2022, 3:16 PM · Object storage

vlorentz added revisions to T4402: Pass dict of hashes instead of single sha1 to objstorage.get(): D8138: Update for swh-objstorage >= 2.0.0, D8137: Call objstorage.get() with a HashDict instead of single hash, D8135: rehash: Call objstorage.content_get() with a HashDict instead of single hash, D8127: Call objstorage.content_get() with a HashDict instead of single hash, D8126: Replace Dict[str, bytes] with a TypedDict to represent dicts of hashes, D8122: Fix crash when calling __contains__/get/check/delete with composite obj ids.

Jul 19 2022, 3:16 PM · Object storage

vlorentz triaged T4402: Pass dict of hashes instead of single sha1 to objstorage.get() as Normal priority.

Jul 19 2022, 3:16 PM · Object storage

Jul 1 2022

douardda added a comment to T2309: Add support for other hash algo than sha1 in current objstorage implementation.

In T2309#87779, @douardda wrote:

do you have in mind to make the actual hash used as primary key in an objstorage a configuration of said storage instance? e.g. create a pathslicer or s3 objstorage using sha256 is just a matter of configuration of the objstorage?

Jul 1 2022, 10:38 AM · Object storage

douardda added a comment to T2309: Add support for other hash algo than sha1 in current objstorage implementation.

do you have in mind to make the actual hash used as primary key in an objstorage a configuration of said storage instance? e.g. create a pathslicer or s3 objstorage using sha256 is just a matter of configuration of the objstorage?

Jul 1 2022, 10:34 AM · Object storage

Jun 27 2022

bchauvet added a revision to T2309: Add support for other hash algo than sha1 in current objstorage implementation: D8029: Start introducing composite ObjId in the interface.

Jun 27 2022, 2:35 PM · Object storage

Jun 21 2022

vlorentz added a parent task for T2309: Add support for other hash algo than sha1 in current objstorage implementation: T3775: Dealing with repositories with contents that produces hash conflicts (example included from GitLab).

Jun 21 2022, 2:41 PM · Object storage

olasd added a revision to T2309: Add support for other hash algo than sha1 in current objstorage implementation: D8008: Set object id when calling objstorage.add.

Jun 21 2022, 2:35 PM · Object storage

May 1 2022

seirl closed T1848: refresh graph dataset export, a subtask of T3085: Complete and updated copy of the archive on S3 (objects+graph), as Resolved.

May 1 2022, 12:08 PM · Roadmap 2022, meta-task, Roadmap 2021, System administration, Object storage

Apr 29 2022

seirl changed the status of T1848: refresh graph dataset export, a subtask of T3085: Complete and updated copy of the archive on S3 (objects+graph), from Open to Work in Progress.

Apr 29 2022, 6:23 PM · Roadmap 2022, meta-task, Roadmap 2021, System administration, Object storage

seirl closed T1743: create a nice landing web page for exported dataset, a subtask of T3085: Complete and updated copy of the archive on S3 (objects+graph), as Resolved.

Apr 29 2022, 6:14 PM · Roadmap 2022, meta-task, Roadmap 2021, System administration, Object storage

Apr 8 2022

anlambert closed T4119: TestRemoteObjStorage::test_content_iterator is failing since werkzeug 2.1.0 release as Resolved by committing rDOBJSdd99e5d64e20: api/server: Fix streaming responses implementation.

Apr 8 2022, 3:10 PM · Object storage

anlambert added a revision to T4119: TestRemoteObjStorage::test_content_iterator is failing since werkzeug 2.1.0 release: D7534: api/server: Fix streaming responses implementation.

Apr 8 2022, 12:15 PM · Object storage

Apr 5 2022

zack changed the status of T1743: create a nice landing web page for exported dataset, a subtask of T3085: Complete and updated copy of the archive on S3 (objects+graph), from Open to Work in Progress.

Apr 5 2022, 1:39 PM · Roadmap 2022, meta-task, Roadmap 2021, System administration, Object storage

Mar 30 2022

anlambert triaged T4119: TestRemoteObjStorage::test_content_iterator is failing since werkzeug 2.1.0 release as Normal priority.

Mar 30 2022, 3:07 PM · Object storage

anlambert created T4119: TestRemoteObjStorage::test_content_iterator is failing since werkzeug 2.1.0 release.

Mar 30 2022, 3:07 PM · Object storage

Mar 25 2022

bchauvet lowered the priority of T3085: Complete and updated copy of the archive on S3 (objects+graph) from High to Low.

Mar 25 2022, 5:28 PM · Roadmap 2022, meta-task, Roadmap 2021, System administration, Object storage

Mar 23 2022

bchauvet added a project to T3085: Complete and updated copy of the archive on S3 (objects+graph): Roadmap 2022.

Mar 23 2022, 4:39 PM · Roadmap 2022, meta-task, Roadmap 2021, System administration, Object storage

Jan 25 2022

vlorentz updated subscribers of T3527: Self-host Software Heritage on grid5000.

@vsellier already did this to benchmark cassandra. it's indeed necessary to see how the backends behave with real loader and vault workloads. (less so for the objstorage, since the workloads should be much more uniform)

Jan 25 2022, 11:30 PM · Object storage

dachary added a comment to T3527: Self-host Software Heritage on grid5000.

I'mt not exactly sure why I thought that would be necessary for benchmarking. In any case... it's not ;-)

Jan 25 2022, 9:53 PM · Object storage

dachary closed T3527: Self-host Software Heritage on grid5000, a subtask of T3432: Add winery backend, as Wontfix.

Jan 25 2022, 9:53 PM · Object storage

dachary closed T3527: Self-host Software Heritage on grid5000 as Wontfix.

Jan 25 2022, 9:53 PM · Object storage

dachary closed T3525: grid5000 tools and documentation, a subtask of T3432: Add winery backend, as Resolved.

Jan 25 2022, 9:52 PM · Object storage

dachary closed T3525: grid5000 tools and documentation as Resolved.

Jan 25 2022, 9:52 PM · Object storage

dachary added a comment to T3525: grid5000 tools and documentation.

The documentation is at:

Jan 25 2022, 9:52 PM · Object storage

dachary closed T3634: Create swh-perfecthash module as Resolved.

Jan 25 2022, 9:51 PM · Object storage

dachary closed T3528: Add winery backend: grid5000 benchmark, a subtask of T3432: Add winery backend, as Resolved.

Jan 25 2022, 9:50 PM · Object storage

dachary closed T3528: Add winery backend: grid5000 benchmark as Resolved.

Jan 25 2022, 9:50 PM · Object storage

dachary added a comment to T3528: Add winery backend: grid5000 benchmark.

It's documented in the winery test environment and was actually able to use the instructions successfully (after a few fixes...). It does work an this can be closed as resolved.

Jan 25 2022, 9:50 PM · Object storage

dachary added a comment to T3432: Add winery backend.

Added a wiki page to be a more accessible version of the benchmark process than the README in the sources.

Jan 25 2022, 9:48 PM · Object storage

Jan 22 2022

dachary changed the status of T3532: IO throttling, a subtask of T3432: Add winery backend, from Open to Work in Progress.

Jan 22 2022, 4:14 PM · Object storage

Dec 16 2021

olasd closed T1954: Up-to-date objstorage mirror on S3, a subtask of T3085: Complete and updated copy of the archive on S3 (objects+graph), as Resolved.

Dec 16 2021, 3:12 PM · Roadmap 2022, meta-task, Roadmap 2021, System administration, Object storage

olasd closed T1954: Up-to-date objstorage mirror on S3 as Resolved.

Dec 16 2021, 3:12 PM · System administration, Object storage

Dec 14 2021

dachary added a comment to D6834: docker: add the swh-winery-db and swh-winery services.

In D6834#177663, @vlorentz wrote:

Ah wait I got it, docker/services/swh-winery/entrypoint.sh launches winery itself, not an actual objstorage backend; you're only reusing the scafholding.

Dec 14 2021, 5:35 PM · Object storage

vlorentz accepted D6834: docker: add the swh-winery-db and swh-winery services.

Ah wait I got it, docker/services/swh-winery/entrypoint.sh launches winery itself, not an actual objstorage backend; you're only reusing the scafholding.

Dec 14 2021, 5:03 PM · Object storage

vlorentz updated the summary of D6834: docker: add the swh-winery-db and swh-winery services.

Dec 14 2021, 4:57 PM · Object storage

vlorentz added a comment to D6834: docker: add the swh-winery-db and swh-winery services.

I don't understand why. Currently, we have: client --> objstorage pathslicing backend (port 5003) --> disk.
What you want to do is: client --> objstorage proxy (port 5003) --> objstorage winery backend (port 5012) --> winery --> ceph, right?
(objstorage winery backend is what is launched by docker/services/swh-winery/entrypoint.sh in this diff)

Dec 14 2021, 4:54 PM · Object storage

dachary added a comment to D6834: docker: add the swh-winery-db and swh-winery services.

In D6834#177606, @vlorentz wrote:

Instead of defining a new service, could you provide an alternative docker-compose config file? This way, it can be used as to switch all services to use it, just by adding a CLI parameter. eg. we do this to replace the postgres storage backend with cassandra: https://docs.softwareheritage.org/devel/getting-started/using-docker.html#cassandra

Dec 14 2021, 4:17 PM · Object storage

vlorentz added a comment to D6834: docker: add the swh-winery-db and swh-winery services.

Instead of defining a new service, could you provide an alternative docker-compose config file? This way, it can be used as to switch all services to use it, just by adding a CLI parameter. eg. we do this to replace the postgres storage backend with cassandra: https://docs.softwareheritage.org/devel/getting-started/using-docker.html#cassandra

Dec 14 2021, 3:57 PM · Object storage

dachary updated the test plan for D6834: docker: add the swh-winery-db and swh-winery services.

Dec 14 2021, 3:54 PM · Object storage

dachary updated the summary of D6834: docker: add the swh-winery-db and swh-winery services.

Dec 14 2021, 3:53 PM · Object storage

dachary added a comment to D6834: docker: add the swh-winery-db and swh-winery services.

It depends on https://forge.softwareheritage.org/D6796 and will fail until it is merged. It can be tested from sources with an override like this:

Dec 14 2021, 3:52 PM · Object storage

dachary added a project to D6834: docker: add the swh-winery-db and swh-winery services: Object storage.

Dec 14 2021, 3:49 PM · Object storage

Dec 13 2021

dachary updated the task description for T3804: Winery backend server.

Dec 13 2021, 6:23 PM · Object storage

dachary renamed T3804: Winery backend server from Winery backend proxy to Winery backend server.

Dec 13 2021, 6:22 PM · Object storage

olasd added a comment to T3804: Winery backend server.

In practical terms, the two winery objstorage database servers and Ceph itself will be hosted at CEA, while the main ingestion storage / graph storage / ... will remain in Rocquencourt (separated sites, with fairly high bandwidth networking between them).

Dec 13 2021, 6:07 PM · Object storage

vsellier added a watcher for Object storage: vsellier.

Dec 13 2021, 5:58 PM

dachary updated the task description for T3804: Winery backend server.

Dec 13 2021, 5:58 PM · Object storage

dachary updated the task description for T3804: Winery backend server.

Dec 13 2021, 5:58 PM · Object storage

dachary added a subtask for T3432: Add winery backend: T3804: Winery backend server.

Dec 13 2021, 5:54 PM · Object storage

dachary added a parent task for T3804: Winery backend server: T3432: Add winery backend.

Dec 13 2021, 5:54 PM · Object storage

dachary triaged T3804: Winery backend server as Normal priority.

Dec 13 2021, 5:54 PM · Object storage

Dec 12 2021

dachary updated the task description for T3634: Create swh-perfecthash module.

Dec 12 2021, 6:05 AM · Object storage

dachary updated the task description for T3634: Create swh-perfecthash module.

Dec 12 2021, 5:54 AM · Object storage

dachary added a comment to T3634: Create swh-perfecthash module.

@olasd I split the debian packaging in its own task at T3797 so that this task can be closed. I'll let you revert this if you think it is not appropriate. My rationale is that it would be easier to figure out what's left to be done with this one other task. Rather than coming back to this rather overloaded ticket. But it's just a matter of personal taste :-)

Dec 12 2021, 5:51 AM · Object storage

dachary updated the task description for T3634: Create swh-perfecthash module.

Dec 12 2021, 5:48 AM · Object storage

dachary added a parent task for T3797: swh-perfecthash: debian package: T3432: Add winery backend.

Dec 12 2021, 5:48 AM · Object storage

dachary added a subtask for T3432: Add winery backend: T3797: swh-perfecthash: debian package.

Dec 12 2021, 5:48 AM · Object storage

dachary triaged T3797: swh-perfecthash: debian package as Normal priority.

Dec 12 2021, 5:47 AM · Object storage

dachary added a comment to T3634: Create swh-perfecthash module.

The documentation now shows as expected. The previous problems in rendering it were probably because the package was not published.

Dec 12 2021, 5:44 AM · Object storage