Page MenuHomeSoftware Heritage

Object storageFolder
ActivePublic

Members

  • This project does not have any members.
  • View All

Watchers

  • This project does not have any watchers.
  • View All

Recent Activity

May 16 2020

olasd added a comment to T1954: Up-to-date objstorage mirror on S3.

The s3 object copy is now completely caught up with where kafka was when the backfilling of all objects from postgresql ended. This means we're now copying the "newer" objects, and there's pretty much no hits at all on the inventory file anymore.

May 16 2020, 3:45 PM · System administration, Object storage

May 4 2020

douardda added a revision to T2355: Make swh-journal independent from swh-storage or swh-objstorage: D3119: Add skipped_content to the list of accepted objects.
May 4 2020, 5:37 PM · Object storage, Storage manager, Journal

Apr 30 2020

douardda closed T2355: Make swh-journal independent from swh-storage or swh-objstorage as Resolved.

Let's consider this is done now.

Apr 30 2020, 4:09 PM · Object storage, Storage manager, Journal

Apr 29 2020

douardda added a revision to T2355: Make swh-journal independent from swh-storage or swh-objstorage: D3087: Remove the content replayer code.
Apr 29 2020, 1:47 PM · Object storage, Storage manager, Journal

Apr 24 2020

douardda added a revision to T2355: Make swh-journal independent from swh-storage or swh-objstorage: D3062: Move the content of swh/objstorage/__init__.py in swh/objstorage/factory.py.
Apr 24 2020, 3:54 PM · Object storage, Storage manager, Journal
douardda added a revision to T2355: Make swh-journal independent from swh-storage or swh-objstorage: D3056: Deprecate the `config-path` argument of the `swh storage rpc-serve` command.
Apr 24 2020, 11:29 AM · Object storage, Storage manager, Journal

Apr 23 2020

douardda added a revision to T2355: Make swh-journal independent from swh-storage or swh-objstorage: D3058: Adapt journal client loading to swh.journal 0.0.31.
Apr 23 2020, 4:58 PM · Object storage, Storage manager, Journal
olasd added a comment to T1954: Up-to-date objstorage mirror on S3.

I've updated the exclusion file with data from the 2020-04-19 s3 inventory.

Apr 23 2020, 12:40 PM · System administration, Object storage

Apr 22 2020

douardda added a revision to T2355: Make swh-journal independent from swh-storage or swh-objstorage: D3044: Move get_journal_client function to swh.journal.client.
Apr 22 2020, 4:50 PM · Object storage, Storage manager, Journal
ardumont renamed T2355: Make swh-journal independent from swh-storage or swh-objstorage from Make swh-journal independant from swh-storage or swh-objstorage to Make swh-journal independent from swh-storage or swh-objstorage.
Apr 22 2020, 3:50 PM · Object storage, Storage manager, Journal
douardda renamed T2355: Make swh-journal independent from swh-storage or swh-objstorage from Merge parts of swh-journal in swh-storage to Make swh-journal independant from swh-storage or swh-objstorage.
Apr 22 2020, 3:41 PM · Object storage, Storage manager, Journal
douardda added a revision to T2355: Make swh-journal independent from swh-storage or swh-objstorage: D3043: Extract kafka-related pytest fixtures in a pytest plugin module.
Apr 22 2020, 3:38 PM · Object storage, Storage manager, Journal

Apr 14 2020

ardumont added a comment to T2332: Analyze hash collisions.

Remains open because there remain decision to be made
about the few real ones (3) we have so far [1]

Apr 14 2020, 2:08 PM · Object storage, Storage manager
ardumont added a comment to T2332: Analyze hash collisions.

So our high number of falsy hash collisions is fixed thanks to D2977 now \m/.

Apr 14 2020, 2:06 PM · Object storage, Storage manager
douardda added a revision to T2355: Make swh-journal independent from swh-storage or swh-objstorage: D3010: Copy the graph replayer component from swh-journal.
Apr 14 2020, 11:14 AM · Object storage, Storage manager, Journal
douardda added a revision to T2355: Make swh-journal independent from swh-storage or swh-objstorage: D3008: Copy the backfiller component from swh-journal.
Apr 14 2020, 11:14 AM · Object storage, Storage manager, Journal

Apr 9 2020

ardumont updated the task description for T2355: Make swh-journal independent from swh-storage or swh-objstorage.
Apr 9 2020, 4:35 PM · Object storage, Storage manager, Journal

Apr 8 2020

ardumont updated the task description for T2332: Analyze hash collisions.
Apr 8 2020, 4:15 PM · Object storage, Storage manager
ardumont added a comment to T2332: Analyze hash collisions.

An interesting experiment, disabling the proxy buffer storage in the loader nixguix configuration.
And the number of hashcollision dropped to 0 (no new event for that loader since yesterday around 6pm our time).

Apr 8 2020, 10:58 AM · Object storage, Storage manager
vlorentz added a revision to T2332: Analyze hash collisions: D2977: Prevent erroneous HashCollisions by using the same ctime for all rows..
Apr 8 2020, 10:53 AM · Object storage, Storage manager

Apr 3 2020

ardumont added a comment to T2332: Analyze hash collisions.

All in all, this task serves the purpose of being sure those exists.

Apr 3 2020, 7:22 PM · Object storage, Storage manager

Mar 25 2020

olasd added a comment to T2332: Analyze hash collisions.

Finally, we should make sure that the storage implementations reject objects with hashes of the wrong length. I'm /almost/ sure that's the case, but we should be sure of it.

That's the case.

Mar 25 2020, 10:35 AM · Object storage, Storage manager

Mar 24 2020

ardumont added a comment to T2332: Analyze hash collisions.

Finally, we should make sure that the storage implementations reject objects with hashes of the wrong length. I'm /almost/ sure that's the case, but we should be sure of it.

Mar 24 2020, 7:03 PM · Object storage, Storage manager
ardumont added a comment to T2332: Analyze hash collisions.

to be more sure of that, I think we should make sure that all hash data in all exception arguments is hex-encoded unicode strings, rather than bytes objects left for python to repr(); this would circumvent a lot of places where encoding or decoding the data in transfer can go wrong.

Mar 24 2020, 3:04 PM · Object storage, Storage manager
ardumont added a comment to T2332: Analyze hash collisions.

it looks like there's a few actual collisions; seems that they're the known-colliding Google PDFs

Mar 24 2020, 1:05 PM · Object storage, Storage manager
olasd added a comment to T2332: Analyze hash collisions.

I'll write my remarks down here for tracking purposes

Mar 24 2020, 1:00 PM · Object storage, Storage manager
ardumont added a comment to T2332: Analyze hash collisions.

sampled collisions extracted from sentry and storage [1]

Mar 24 2020, 12:02 PM · Object storage, Storage manager
ardumont triaged T2332: Analyze hash collisions as Normal priority.
Mar 24 2020, 12:01 PM · Object storage, Storage manager

Mar 12 2020

douardda triaged T2309: Add support for other hash algo than sha1 in current objstorage implementation as Normal priority.
Mar 12 2020, 1:43 PM · Object storage

Feb 18 2020

vlorentz added a project to T2215: Streaming support everywhere: meta-task.
Feb 18 2020, 4:52 PM · meta-task, Web app, Object storage, Storage manager, Restricted Project

Jan 22 2020

vlorentz added projects to T2215: Streaming support everywhere: Storage manager, Object storage, Web app.
Jan 22 2020, 4:24 PM · meta-task, Web app, Object storage, Storage manager, Restricted Project
vlorentz added a project to T2216: Packing object storage: Object storage.
Jan 22 2020, 4:20 PM · Object storage, Restricted Project

Dec 7 2019

olasd added a comment to T1954: Up-to-date objstorage mirror on S3.

I've grown tired of babysitting this, so I've added systemd notify calls to the journal replayer, allowing us to just use the systemd watchdog to restart hung processes.

Dec 7 2019, 6:37 PM · System administration, Object storage
olasd merged T1899: complete object storage mirror on AWS into T1954: Up-to-date objstorage mirror on S3.
Dec 7 2019, 6:30 PM · System administration, Object storage

Nov 25 2019

olasd added a comment to T1954: Up-to-date objstorage mirror on S3.
In T1954#39027, @zack wrote:

So, the amount of contents on S3 went up fairly quickly during in between Nov 10th and Nov 20th, but then it stopped again, is it expected/normal?

(thanks for the metric, it really helps)

Nov 25 2019, 3:30 PM · System administration, Object storage

Nov 24 2019

zack added a comment to T1954: Up-to-date objstorage mirror on S3.

So, the amount of contents on S3 went up fairly quickly during in between Nov 10th and Nov 20th, but then it stopped again, is it expected/normal?

Nov 24 2019, 8:26 PM · System administration, Object storage

Nov 8 2019

olasd added a comment to T1954: Up-to-date objstorage mirror on S3.

I've added a metric with the S3 objects to https://grafana.softwareheritage.org/d/jScG7g6mk/objstorage-object-counts. There's... "some" work to do still.

Nov 8 2019, 7:12 PM · System administration, Object storage
olasd changed the status of T1954: Up-to-date objstorage mirror on S3 from Open to Work in Progress.

So I've deployed this (by hand for now) on uffizi and it seems to be doing its job.

Nov 8 2019, 11:41 AM · System administration, Object storage

Oct 29 2019

vlorentz updated the task description for T1954: Up-to-date objstorage mirror on S3.
Oct 29 2019, 11:34 AM · System administration, Object storage

Aug 19 2019

vlorentz updated the task description for T1954: Up-to-date objstorage mirror on S3.
Aug 19 2019, 11:47 AM · System administration, Object storage
vlorentz updated the task description for T1954: Up-to-date objstorage mirror on S3.
Aug 19 2019, 11:45 AM · System administration, Object storage
vlorentz triaged T1954: Up-to-date objstorage mirror on S3 as High priority.
Aug 19 2019, 11:44 AM · System administration, Object storage

Jun 19 2019

olasd closed T1823: make DB/FS transactions nest properly as Resolved by committing rDOBJS67197802d5aa: pathslicing: Make sure data is flushed to disk before renaming the tempfile.
Jun 19 2019, 4:11 PM · Object storage, Storage manager
olasd added a revision to T1823: make DB/FS transactions nest properly: D1611: pathslicing: Make sure data is flushed to disk before renaming the tempfile.
Jun 19 2019, 1:46 PM · Object storage, Storage manager

Jun 18 2019

zack triaged T1823: make DB/FS transactions nest properly as High priority.
Jun 18 2019, 12:38 PM · Object storage, Storage manager

May 23 2019

vlorentz triaged T1736: Update the "Archive copies" documentation as Normal priority.
May 23 2019, 11:17 AM · Object storage, Development documentation

Mar 26 2019

olasd added a parent task for T1577: Compare/benchmark objstorage backends : T1608: Write all new objects to azure synchronously.
Mar 26 2019, 6:58 PM · Object storage

Mar 22 2019

seirl placed T805: objstorage: allow use of file-like objects for streaming methods up for grabs.
Mar 22 2019, 1:31 PM · Object storage
seirl triaged T1596: ObjStorage: investigate the per-request CPU overhead of aiohttp as Normal priority.
Mar 22 2019, 1:27 PM · Object storage
seirl triaged T1595: ObjStorage: compression pass-through as Normal priority.
Mar 22 2019, 1:24 PM · Object storage