Page MenuHomeSoftware Heritage
Feed Advanced Search

Jan 8 2023

gitlab-migration changed the status of T1048: Clean striped object storages from objects they should not be containing, a subtask of T1044: Write all contents synchronously to azure, from Resolved to Migrated.
Jan 8 2023, 4:24 PM · Object storage
gitlab-migration changed the status of T1046: Stripe local contents between uffizi and banco, a subtask of T1045: Implement a striping object storage, from Resolved to Migrated.
Jan 8 2023, 4:24 PM · Object storage
gitlab-migration changed the status of T1046: Stripe local contents between uffizi and banco from Resolved to Migrated.

This task has been migrated to GitLab.

Jan 8 2023, 4:24 PM · Object storage
gitlab-migration changed the status of T1047: Write all contents synchronously to the ceph cluster, a subtask of T1043: handle the uffizi content store being full, from Wontfix to Migrated.
Jan 8 2023, 4:24 PM · Object storage
gitlab-migration changed the status of T1046: Stripe local contents between uffizi and banco, a subtask of T1043: handle the uffizi content store being full, from Resolved to Migrated.
Jan 8 2023, 4:24 PM · Object storage
gitlab-migration changed the status of T1047: Write all contents synchronously to the ceph cluster from Wontfix to Migrated.

This task has been migrated to GitLab.

Jan 8 2023, 4:24 PM · Object storage
gitlab-migration changed the status of T1045: Implement a striping object storage from Resolved to Migrated.

This task has been migrated to GitLab.

Jan 8 2023, 4:24 PM · Object storage
gitlab-migration changed the status of T1044: Write all contents synchronously to azure, a subtask of T1043: handle the uffizi content store being full, from Resolved to Migrated.
Jan 8 2023, 4:24 PM · Object storage
gitlab-migration changed the status of T1045: Implement a striping object storage, a subtask of T1043: handle the uffizi content store being full, from Resolved to Migrated.
Jan 8 2023, 4:24 PM · Object storage
gitlab-migration changed the status of T1043: handle the uffizi content store being full from Resolved to Migrated.

This task has been migrated to GitLab.

Jan 8 2023, 4:24 PM · Object storage
gitlab-migration changed the status of T1044: Write all contents synchronously to azure from Resolved to Migrated.

This task has been migrated to GitLab.

Jan 8 2023, 4:24 PM · Object storage
gitlab-migration changed the status of T746: Objstorage: add a way to delete items (with a filter for production environments) from Resolved to Migrated.

This task has been migrated to GitLab.

Jan 8 2023, 4:21 PM · Object storage
gitlab-migration changed the status of T564: clean up bogus python object references from the archiver DB table from Resolved to Migrated.

This task has been migrated to GitLab.

Jan 8 2023, 4:19 PM · Object storage
gitlab-migration changed the status of T545: Create puppet manifests for the content integrity checkers from Resolved to Migrated.

This task has been migrated to GitLab.

Jan 8 2023, 4:19 PM · Object storage

Dec 22 2022

vlorentz added a comment to T4644: replayer crashes on invalid directory entry name (b'gitter/gitter.xml').

We have a new one that went unnoticed until 19 days ago: b'superduper/super/sub/bye.txt' is not a valid directory entry name.

Dec 22 2022, 12:40 PM · Archive integrity, Object storage, Data Model

Dec 16 2022

douardda added a revision to T4736: Allow to specify a content size limit in objstorage replayer: D8962: Add a --size-limit cli option to the replay command.
Dec 16 2022, 10:32 AM · Object storage
douardda created T4736: Allow to specify a content size limit in objstorage replayer.
Dec 16 2022, 10:27 AM · Object storage

Nov 22 2022

olasd added a comment to T4644: replayer crashes on invalid directory entry name (b'gitter/gitter.xml').

To force kafka compaction to run I've done the following:

Nov 22 2022, 5:19 PM · Archive integrity, Object storage, Data Model

Oct 31 2022

vlorentz added a comment to T2309: Add support for other hash algo than sha1 in current objstorage implementation.

Possibly relevant for the Azure storage: https://learn.microsoft.com/en-us/rest/api/storageservices/find-blobs-by-tags

Oct 31 2022, 1:50 PM · Object storage

Oct 25 2022

olasd added a comment to T4644: replayer crashes on invalid directory entry name (b'gitter/gitter.xml').

This is now done, the objects are fixed in the production DB and kafka.

Oct 25 2022, 8:10 PM · Archive integrity, Object storage, Data Model
olasd added a comment to T4644: replayer crashes on invalid directory entry name (b'gitter/gitter.xml').

@vlorentz I'm running the following adaptation to your script:

Oct 25 2022, 7:03 PM · Archive integrity, Object storage, Data Model
seirl added a comment to T4644: replayer crashes on invalid directory entry name (b'gitter/gitter.xml').

Oh yeah, I was thinking of just removing the entire project, but your solution also works.

Oct 25 2022, 6:15 PM · Archive integrity, Object storage, Data Model
vlorentz updated subscribers of T4644: replayer crashes on invalid directory entry name (b'gitter/gitter.xml').

Holes are bad. And I just opened a diff to make the git loader apply the same transformation, as @olasd made the same comment: D8776

Oct 25 2022, 6:10 PM · Archive integrity, Object storage, Data Model
seirl added a comment to T4644: replayer crashes on invalid directory entry name (b'gitter/gitter.xml').

Do you actually want to keep these objects? This would be inconsistent with the fixed loader behavior that would just reject those objects, and not load the repository at all.

Oct 25 2022, 6:06 PM · Archive integrity, Object storage, Data Model
vlorentz changed the visibility for T4644: replayer crashes on invalid directory entry name (b'gitter/gitter.xml').
Oct 25 2022, 5:57 PM · Archive integrity, Object storage, Data Model
vlorentz updated subscribers of T4644: replayer crashes on invalid directory entry name (b'gitter/gitter.xml').

I tried to add a workaround in the backfiller, but it is incredibly hard to do properly, especially as entries as disordered, so raw_manifest needs to be fixed in two different ways.

Oct 25 2022, 5:41 PM · Archive integrity, Object storage, Data Model

Oct 24 2022

vlorentz added a revision to T2309: Add support for other hash algo than sha1 in current objstorage implementation: D8756: azure: Add tests based on Azurite in addition to mocks.
Oct 24 2022, 2:45 PM · Object storage

Oct 19 2022

gitlab-migration closed T3703: [objstorage] Be able to configure to access permissions through keycloak as Migrated.

This task has been migrated to GitLab.

Oct 19 2022, 6:04 PM · System administration, Object storage
gitlab-migration changed the status of T3702: [objstorage] Support a basic authentication configuration from Resolved to Migrated.

This task has been migrated to GitLab.

Oct 19 2022, 6:04 PM · System administration, Object storage
gitlab-migration closed T3477: Add alerting when the copy to S3 starts lagging, a subtask of T3085: Complete and updated copy of the archive on S3 (objects+graph), as Migrated.
Oct 19 2022, 6:03 PM · Roadmap 2022, meta-task, Roadmap 2021, System administration, Object storage
gitlab-migration closed T3430: check_config is not supported by the azure backend as Migrated.

This task has been migrated to GitLab.

Oct 19 2022, 6:03 PM · System administration, Object storage
gitlab-migration closed T3085: Complete and updated copy of the archive on S3 (objects+graph) as Migrated.

This task has been migrated to GitLab.

Oct 19 2022, 6:01 PM · Roadmap 2022, meta-task, Roadmap 2021, System administration, Object storage
gitlab-migration changed the status of T1954: Up-to-date objstorage mirror on S3 from Resolved to Migrated.

This task has been migrated to GitLab.

Oct 19 2022, 5:56 PM · System administration, Object storage
gitlab-migration changed the status of T1954: Up-to-date objstorage mirror on S3, a subtask of T3085: Complete and updated copy of the archive on S3 (objects+graph), from Resolved to Migrated.
Oct 19 2022, 5:56 PM · Roadmap 2022, meta-task, Roadmap 2021, System administration, Object storage

Oct 18 2022

vlorentz added a parent task for T4644: replayer crashes on invalid directory entry name (b'gitter/gitter.xml'): T2033: Run Cassandra storage backend with production data.
Oct 18 2022, 3:40 PM · Archive integrity, Object storage, Data Model
vlorentz triaged T4644: replayer crashes on invalid directory entry name (b'gitter/gitter.xml') as High priority.
Oct 18 2022, 3:39 PM · Archive integrity, Object storage, Data Model
swh-sentry-integration assigned T4644: replayer crashes on invalid directory entry name (b'gitter/gitter.xml') to vlorentz.
Oct 18 2022, 3:39 PM · Archive integrity, Object storage, Data Model

Aug 23 2022

vlorentz added a revision to T4402: Pass dict of hashes instead of single sha1 to objstorage.get(): D8286: Pass 'obj_id' argument to objstorage.add().
Aug 23 2022, 10:34 AM · Object storage

Jul 19 2022

vlorentz added revisions to T4403: Update objstorage interface to return dicts of hashes instead of single sha1: D8009: Make obj_id argument of ObjStorage.add() required, D8013: Drop the now unused add_stream and get_stream methods, D8017: Make add() and restore() return None instead of ObjId, D8026: Remove get_random(), D8029: Start introducing composite ObjId in the interface, D8074: Remove ID-based filters, D8076: Make __iter__ actually return composite objids.
Jul 19 2022, 3:19 PM · Object storage
vlorentz triaged T4403: Update objstorage interface to return dicts of hashes instead of single sha1 as Normal priority.
Jul 19 2022, 3:18 PM · Object storage
vlorentz added a revision to T4402: Pass dict of hashes instead of single sha1 to objstorage.get(): D8029: Start introducing composite ObjId in the interface.
Jul 19 2022, 3:16 PM · Object storage
vlorentz added revisions to T4402: Pass dict of hashes instead of single sha1 to objstorage.get(): D8138: Update for swh-objstorage >= 2.0.0, D8137: Call objstorage.get() with a HashDict instead of single hash, D8135: rehash: Call objstorage.content_get() with a HashDict instead of single hash, D8127: Call objstorage.content_get() with a HashDict instead of single hash, D8126: Replace Dict[str, bytes] with a TypedDict to represent dicts of hashes, D8122: Fix crash when calling __contains__/get/check/delete with composite obj ids.
Jul 19 2022, 3:16 PM · Object storage
vlorentz triaged T4402: Pass dict of hashes instead of single sha1 to objstorage.get() as Normal priority.
Jul 19 2022, 3:16 PM · Object storage

Jul 1 2022

douardda added a comment to T2309: Add support for other hash algo than sha1 in current objstorage implementation.

do you have in mind to make the actual hash used as primary key in an objstorage a configuration of said storage instance? e.g. create a pathslicer or s3 objstorage using sha256 is just a matter of configuration of the objstorage?

Jul 1 2022, 10:38 AM · Object storage
douardda added a comment to T2309: Add support for other hash algo than sha1 in current objstorage implementation.

do you have in mind to make the actual hash used as primary key in an objstorage a configuration of said storage instance? e.g. create a pathslicer or s3 objstorage using sha256 is just a matter of configuration of the objstorage?

Jul 1 2022, 10:34 AM · Object storage

Jun 27 2022

bchauvet added a revision to T2309: Add support for other hash algo than sha1 in current objstorage implementation: D8029: Start introducing composite ObjId in the interface.
Jun 27 2022, 2:35 PM · Object storage

Jun 21 2022

vlorentz added a parent task for T2309: Add support for other hash algo than sha1 in current objstorage implementation: T3775: Dealing with repositories with contents that produces hash conflicts (example included from GitLab).
Jun 21 2022, 2:41 PM · Object storage
olasd added a revision to T2309: Add support for other hash algo than sha1 in current objstorage implementation: D8008: Set object id when calling objstorage.add.
Jun 21 2022, 2:35 PM · Object storage

May 1 2022

seirl closed T1848: refresh graph dataset export, a subtask of T3085: Complete and updated copy of the archive on S3 (objects+graph), as Resolved.
May 1 2022, 12:08 PM · Roadmap 2022, meta-task, Roadmap 2021, System administration, Object storage

Apr 29 2022

seirl changed the status of T1848: refresh graph dataset export, a subtask of T3085: Complete and updated copy of the archive on S3 (objects+graph), from Open to Work in Progress.
Apr 29 2022, 6:23 PM · Roadmap 2022, meta-task, Roadmap 2021, System administration, Object storage
seirl closed T1743: create a nice landing web page for exported dataset, a subtask of T3085: Complete and updated copy of the archive on S3 (objects+graph), as Resolved.
Apr 29 2022, 6:14 PM · Roadmap 2022, meta-task, Roadmap 2021, System administration, Object storage

Apr 8 2022

anlambert closed T4119: TestRemoteObjStorage::test_content_iterator is failing since werkzeug 2.1.0 release as Resolved by committing rDOBJSdd99e5d64e20: api/server: Fix streaming responses implementation.
Apr 8 2022, 3:10 PM · Object storage
anlambert added a revision to T4119: TestRemoteObjStorage::test_content_iterator is failing since werkzeug 2.1.0 release: D7534: api/server: Fix streaming responses implementation.
Apr 8 2022, 12:15 PM · Object storage

Apr 5 2022

zack changed the status of T1743: create a nice landing web page for exported dataset, a subtask of T3085: Complete and updated copy of the archive on S3 (objects+graph), from Open to Work in Progress.
Apr 5 2022, 1:39 PM · Roadmap 2022, meta-task, Roadmap 2021, System administration, Object storage

Mar 30 2022

anlambert triaged T4119: TestRemoteObjStorage::test_content_iterator is failing since werkzeug 2.1.0 release as Normal priority.
Mar 30 2022, 3:07 PM · Object storage
anlambert created T4119: TestRemoteObjStorage::test_content_iterator is failing since werkzeug 2.1.0 release.
Mar 30 2022, 3:07 PM · Object storage

Mar 25 2022

bchauvet lowered the priority of T3085: Complete and updated copy of the archive on S3 (objects+graph) from High to Low.
Mar 25 2022, 5:28 PM · Roadmap 2022, meta-task, Roadmap 2021, System administration, Object storage

Mar 23 2022

bchauvet added a project to T3085: Complete and updated copy of the archive on S3 (objects+graph): Roadmap 2022.
Mar 23 2022, 4:39 PM · Roadmap 2022, meta-task, Roadmap 2021, System administration, Object storage

Jan 25 2022

vlorentz updated subscribers of T3527: Self-host Software Heritage on grid5000.

@vsellier already did this to benchmark cassandra. it's indeed necessary to see how the backends behave with real loader and vault workloads. (less so for the objstorage, since the workloads should be much more uniform)

Jan 25 2022, 11:30 PM · Object storage
dachary added a comment to T3527: Self-host Software Heritage on grid5000.

I'mt not exactly sure why I thought that would be necessary for benchmarking. In any case... it's not ;-)

Jan 25 2022, 9:53 PM · Object storage
dachary closed T3527: Self-host Software Heritage on grid5000, a subtask of T3432: Add winery backend, as Wontfix.
Jan 25 2022, 9:53 PM · Object storage
dachary closed T3527: Self-host Software Heritage on grid5000 as Wontfix.
Jan 25 2022, 9:53 PM · Object storage
dachary closed T3525: grid5000 tools and documentation, a subtask of T3432: Add winery backend, as Resolved.
Jan 25 2022, 9:52 PM · Object storage
dachary closed T3525: grid5000 tools and documentation as Resolved.
Jan 25 2022, 9:52 PM · Object storage
dachary added a comment to T3525: grid5000 tools and documentation.

The documentation is at:

Jan 25 2022, 9:52 PM · Object storage
dachary closed T3634: Create swh-perfecthash module as Resolved.
Jan 25 2022, 9:51 PM · Object storage
dachary closed T3528: Add winery backend: grid5000 benchmark, a subtask of T3432: Add winery backend, as Resolved.
Jan 25 2022, 9:50 PM · Object storage
dachary closed T3528: Add winery backend: grid5000 benchmark as Resolved.
Jan 25 2022, 9:50 PM · Object storage
dachary added a comment to T3528: Add winery backend: grid5000 benchmark.

It's documented in the winery test environment and was actually able to use the instructions successfully (after a few fixes...). It does work an this can be closed as resolved.

Jan 25 2022, 9:50 PM · Object storage
dachary added a comment to T3432: Add winery backend.

Added a wiki page to be a more accessible version of the benchmark process than the README in the sources.

Jan 25 2022, 9:48 PM · Object storage

Jan 22 2022

dachary changed the status of T3532: IO throttling, a subtask of T3432: Add winery backend, from Open to Work in Progress.
Jan 22 2022, 4:14 PM · Object storage

Dec 16 2021

olasd closed T1954: Up-to-date objstorage mirror on S3, a subtask of T3085: Complete and updated copy of the archive on S3 (objects+graph), as Resolved.
Dec 16 2021, 3:12 PM · Roadmap 2022, meta-task, Roadmap 2021, System administration, Object storage
olasd closed T1954: Up-to-date objstorage mirror on S3 as Resolved.
Dec 16 2021, 3:12 PM · System administration, Object storage

Dec 14 2021

dachary added a comment to D6834: docker: add the swh-winery-db and swh-winery services.

Ah wait I got it, docker/services/swh-winery/entrypoint.sh launches winery itself, not an actual objstorage backend; you're only reusing the scafholding.

Dec 14 2021, 5:35 PM · Object storage
vlorentz accepted D6834: docker: add the swh-winery-db and swh-winery services.

Ah wait I got it, docker/services/swh-winery/entrypoint.sh launches winery itself, not an actual objstorage backend; you're only reusing the scafholding.

Dec 14 2021, 5:03 PM · Object storage
vlorentz updated the summary of D6834: docker: add the swh-winery-db and swh-winery services.
Dec 14 2021, 4:57 PM · Object storage
vlorentz added a comment to D6834: docker: add the swh-winery-db and swh-winery services.

I don't understand why. Currently, we have: client --> objstorage pathslicing backend (port 5003) --> disk.
What you want to do is: client --> objstorage proxy (port 5003) --> objstorage winery backend (port 5012) --> winery --> ceph, right?
(objstorage winery backend is what is launched by docker/services/swh-winery/entrypoint.sh in this diff)

Dec 14 2021, 4:54 PM · Object storage
dachary added a comment to D6834: docker: add the swh-winery-db and swh-winery services.

Instead of defining a new service, could you provide an alternative docker-compose config file? This way, it can be used as to switch all services to use it, just by adding a CLI parameter. eg. we do this to replace the postgres storage backend with cassandra: https://docs.softwareheritage.org/devel/getting-started/using-docker.html#cassandra

Dec 14 2021, 4:17 PM · Object storage
vlorentz added a comment to D6834: docker: add the swh-winery-db and swh-winery services.

Instead of defining a new service, could you provide an alternative docker-compose config file? This way, it can be used as to switch all services to use it, just by adding a CLI parameter. eg. we do this to replace the postgres storage backend with cassandra: https://docs.softwareheritage.org/devel/getting-started/using-docker.html#cassandra

Dec 14 2021, 3:57 PM · Object storage
dachary updated the test plan for D6834: docker: add the swh-winery-db and swh-winery services.
Dec 14 2021, 3:54 PM · Object storage
dachary updated the summary of D6834: docker: add the swh-winery-db and swh-winery services.
Dec 14 2021, 3:53 PM · Object storage
dachary added a comment to D6834: docker: add the swh-winery-db and swh-winery services.

It depends on https://forge.softwareheritage.org/D6796 and will fail until it is merged. It can be tested from sources with an override like this:

Dec 14 2021, 3:52 PM · Object storage
dachary added a project to D6834: docker: add the swh-winery-db and swh-winery services: Object storage.
Dec 14 2021, 3:49 PM · Object storage

Dec 13 2021

dachary updated the task description for T3804: Winery backend server.
Dec 13 2021, 6:23 PM · Object storage
dachary renamed T3804: Winery backend server from Winery backend proxy to Winery backend server.
Dec 13 2021, 6:22 PM · Object storage
olasd added a comment to T3804: Winery backend server.

In practical terms, the two winery objstorage database servers and Ceph itself will be hosted at CEA, while the main ingestion storage / graph storage / ... will remain in Rocquencourt (separated sites, with fairly high bandwidth networking between them).

Dec 13 2021, 6:07 PM · Object storage
vsellier added a watcher for Object storage: vsellier.
Dec 13 2021, 5:58 PM
dachary updated the task description for T3804: Winery backend server.
Dec 13 2021, 5:58 PM · Object storage
dachary updated the task description for T3804: Winery backend server.
Dec 13 2021, 5:58 PM · Object storage
dachary added a subtask for T3432: Add winery backend: T3804: Winery backend server.
Dec 13 2021, 5:54 PM · Object storage
dachary added a parent task for T3804: Winery backend server: T3432: Add winery backend.
Dec 13 2021, 5:54 PM · Object storage
dachary triaged T3804: Winery backend server as Normal priority.
Dec 13 2021, 5:54 PM · Object storage

Dec 12 2021

dachary updated the task description for T3634: Create swh-perfecthash module.
Dec 12 2021, 6:05 AM · Object storage
dachary updated the task description for T3634: Create swh-perfecthash module.
Dec 12 2021, 5:54 AM · Object storage
dachary added a comment to T3634: Create swh-perfecthash module.

@olasd I split the debian packaging in its own task at T3797 so that this task can be closed. I'll let you revert this if you think it is not appropriate. My rationale is that it would be easier to figure out what's left to be done with this one other task. Rather than coming back to this rather overloaded ticket. But it's just a matter of personal taste :-)

Dec 12 2021, 5:51 AM · Object storage
dachary updated the task description for T3634: Create swh-perfecthash module.
Dec 12 2021, 5:48 AM · Object storage
dachary added a parent task for T3797: swh-perfecthash: debian package: T3432: Add winery backend.
Dec 12 2021, 5:48 AM · Object storage
dachary added a subtask for T3432: Add winery backend: T3797: swh-perfecthash: debian package.
Dec 12 2021, 5:48 AM · Object storage
dachary triaged T3797: swh-perfecthash: debian package as Normal priority.
Dec 12 2021, 5:47 AM · Object storage
dachary added a comment to T3634: Create swh-perfecthash module.

The documentation now shows as expected. The previous problems in rendering it were probably because the package was not published.

Dec 12 2021, 5:44 AM · Object storage