Page MenuHomeSoftware Heritage
Feed Advanced Search

Sep 22 2020

olasd closed T1048: Clean striped object storages from objects they should not be containing, a subtask of T1046: Stripe local contents between uffizi and banco, as Resolved.
Sep 22 2020, 4:42 PM · Object storage
olasd closed T1048: Clean striped object storages from objects they should not be containing as Resolved.

We've moved on to other filesystems and we're not really planning on wiping the old ones anymore.

Sep 22 2020, 4:42 PM · Object storage
moranegg moved T1736: Update the "Archive copies" documentation from Backlog to developers (docs/devel/) on the Documentation board.
Sep 22 2020, 2:49 PM · Object storage, Documentation

Aug 27 2020

olasd added a comment to T1954: Up-to-date objstorage mirror on S3.

The journal clients copying objects to S3 are blocked on being unable to read messages from kafka.

Aug 27 2020, 12:20 PM · System administration, Object storage

Jul 21 2020

olasd closed T1047: Write all contents synchronously to the ceph cluster as Wontfix.

We're not going to do this in the forseeable future.

Jul 21 2020, 2:27 PM · Object storage
olasd closed T1047: Write all contents synchronously to the ceph cluster, a subtask of T1043: handle the uffizi content store being full, as Wontfix.
Jul 21 2020, 2:27 PM · Object storage

May 16 2020

olasd added a comment to T1954: Up-to-date objstorage mirror on S3.

The s3 object copy is now completely caught up with where kafka was when the backfilling of all objects from postgresql ended. This means we're now copying the "newer" objects, and there's pretty much no hits at all on the inventory file anymore.

May 16 2020, 3:45 PM · System administration, Object storage

May 4 2020

douardda added a revision to T2355: Make swh-journal independent from swh-storage or swh-objstorage: D3119: Add skipped_content to the list of accepted objects.
May 4 2020, 5:37 PM · Object storage, Storage manager, Journal

Apr 30 2020

douardda closed T2355: Make swh-journal independent from swh-storage or swh-objstorage as Resolved.

Let's consider this is done now.

Apr 30 2020, 4:09 PM · Object storage, Storage manager, Journal

Apr 29 2020

douardda added a revision to T2355: Make swh-journal independent from swh-storage or swh-objstorage: D3087: Remove the content replayer code.
Apr 29 2020, 1:47 PM · Object storage, Storage manager, Journal

Apr 24 2020

douardda added a revision to T2355: Make swh-journal independent from swh-storage or swh-objstorage: D3062: Move the content of swh/objstorage/__init__.py in swh/objstorage/factory.py.
Apr 24 2020, 3:54 PM · Object storage, Storage manager, Journal
douardda added a revision to T2355: Make swh-journal independent from swh-storage or swh-objstorage: D3056: Deprecate the `config-path` argument of the `swh storage rpc-serve` command.
Apr 24 2020, 11:29 AM · Object storage, Storage manager, Journal

Apr 23 2020

douardda added a revision to T2355: Make swh-journal independent from swh-storage or swh-objstorage: D3058: Adapt journal client loading to swh.journal 0.0.31.
Apr 23 2020, 4:58 PM · Object storage, Storage manager, Journal
olasd added a comment to T1954: Up-to-date objstorage mirror on S3.

I've updated the exclusion file with data from the 2020-04-19 s3 inventory.

Apr 23 2020, 12:40 PM · System administration, Object storage

Apr 22 2020

douardda added a revision to T2355: Make swh-journal independent from swh-storage or swh-objstorage: D3044: Move get_journal_client function to swh.journal.client.
Apr 22 2020, 4:50 PM · Object storage, Storage manager, Journal
ardumont renamed T2355: Make swh-journal independent from swh-storage or swh-objstorage from Make swh-journal independant from swh-storage or swh-objstorage to Make swh-journal independent from swh-storage or swh-objstorage.
Apr 22 2020, 3:50 PM · Object storage, Storage manager, Journal
douardda renamed T2355: Make swh-journal independent from swh-storage or swh-objstorage from Merge parts of swh-journal in swh-storage to Make swh-journal independant from swh-storage or swh-objstorage.
Apr 22 2020, 3:41 PM · Object storage, Storage manager, Journal
douardda added a revision to T2355: Make swh-journal independent from swh-storage or swh-objstorage: D3043: Extract kafka-related pytest fixtures in a pytest plugin module.
Apr 22 2020, 3:38 PM · Object storage, Storage manager, Journal

Apr 14 2020

ardumont added a comment to T2332: Analyze hash collisions.

Remains open because there remain decision to be made
about the few real ones (3) we have so far [1]

Apr 14 2020, 2:08 PM · Object storage, Storage manager
ardumont added a comment to T2332: Analyze hash collisions.

So our high number of falsy hash collisions is fixed thanks to D2977 now \m/.

Apr 14 2020, 2:06 PM · Object storage, Storage manager
douardda added a revision to T2355: Make swh-journal independent from swh-storage or swh-objstorage: D3010: Copy the graph replayer component from swh-journal.
Apr 14 2020, 11:14 AM · Object storage, Storage manager, Journal
douardda added a revision to T2355: Make swh-journal independent from swh-storage or swh-objstorage: D3008: Copy the backfiller component from swh-journal.
Apr 14 2020, 11:14 AM · Object storage, Storage manager, Journal

Apr 9 2020

ardumont updated the task description for T2355: Make swh-journal independent from swh-storage or swh-objstorage.
Apr 9 2020, 4:35 PM · Object storage, Storage manager, Journal

Apr 8 2020

ardumont updated the task description for T2332: Analyze hash collisions.
Apr 8 2020, 4:15 PM · Object storage, Storage manager
ardumont added a comment to T2332: Analyze hash collisions.

An interesting experiment, disabling the proxy buffer storage in the loader nixguix configuration.
And the number of hashcollision dropped to 0 (no new event for that loader since yesterday around 6pm our time).

Apr 8 2020, 10:58 AM · Object storage, Storage manager
vlorentz added a revision to T2332: Analyze hash collisions: D2977: Prevent erroneous HashCollisions by using the same ctime for all rows..
Apr 8 2020, 10:53 AM · Object storage, Storage manager

Apr 3 2020

ardumont added a comment to T2332: Analyze hash collisions.

All in all, this task serves the purpose of being sure those exists.

Apr 3 2020, 7:22 PM · Object storage, Storage manager

Mar 25 2020

olasd added a comment to T2332: Analyze hash collisions.

Finally, we should make sure that the storage implementations reject objects with hashes of the wrong length. I'm /almost/ sure that's the case, but we should be sure of it.

That's the case.

Mar 25 2020, 10:35 AM · Object storage, Storage manager

Mar 24 2020

ardumont added a comment to T2332: Analyze hash collisions.

Finally, we should make sure that the storage implementations reject objects with hashes of the wrong length. I'm /almost/ sure that's the case, but we should be sure of it.

Mar 24 2020, 7:03 PM · Object storage, Storage manager
ardumont added a comment to T2332: Analyze hash collisions.

to be more sure of that, I think we should make sure that all hash data in all exception arguments is hex-encoded unicode strings, rather than bytes objects left for python to repr(); this would circumvent a lot of places where encoding or decoding the data in transfer can go wrong.

Mar 24 2020, 3:04 PM · Object storage, Storage manager
ardumont added a comment to T2332: Analyze hash collisions.

it looks like there's a few actual collisions; seems that they're the known-colliding Google PDFs

Mar 24 2020, 1:05 PM · Object storage, Storage manager
olasd added a comment to T2332: Analyze hash collisions.

I'll write my remarks down here for tracking purposes

Mar 24 2020, 1:00 PM · Object storage, Storage manager
ardumont added a comment to T2332: Analyze hash collisions.

sampled collisions extracted from sentry and storage [1]

Mar 24 2020, 12:02 PM · Object storage, Storage manager
ardumont triaged T2332: Analyze hash collisions as Normal priority.
Mar 24 2020, 12:01 PM · Object storage, Storage manager

Mar 12 2020

douardda triaged T2309: Add support for other hash algo than sha1 in current objstorage implementation as Normal priority.
Mar 12 2020, 1:43 PM · Object storage

Feb 18 2020

vlorentz added a project to T2215: Streaming support everywhere: meta-task.
Feb 18 2020, 4:52 PM · meta-task, Web app, Object storage, Storage manager, Roadmap 2020

Jan 22 2020

vlorentz added projects to T2215: Streaming support everywhere: Storage manager, Object storage, Web app.
Jan 22 2020, 4:24 PM · meta-task, Web app, Object storage, Storage manager, Roadmap 2020
vlorentz added a project to T2216: Packing object storage: Object storage.
Jan 22 2020, 4:20 PM · Object storage, Roadmap 2020

Dec 7 2019

olasd added a comment to T1954: Up-to-date objstorage mirror on S3.

I've grown tired of babysitting this, so I've added systemd notify calls to the journal replayer, allowing us to just use the systemd watchdog to restart hung processes.

Dec 7 2019, 6:37 PM · System administration, Object storage
olasd merged T1899: complete object storage mirror on AWS into T1954: Up-to-date objstorage mirror on S3.
Dec 7 2019, 6:30 PM · System administration, Object storage

Nov 25 2019

olasd added a comment to T1954: Up-to-date objstorage mirror on S3.
In T1954#39027, @zack wrote:

So, the amount of contents on S3 went up fairly quickly during in between Nov 10th and Nov 20th, but then it stopped again, is it expected/normal?

(thanks for the metric, it really helps)

Nov 25 2019, 3:30 PM · System administration, Object storage

Nov 24 2019

zack added a comment to T1954: Up-to-date objstorage mirror on S3.

So, the amount of contents on S3 went up fairly quickly during in between Nov 10th and Nov 20th, but then it stopped again, is it expected/normal?

Nov 24 2019, 8:26 PM · System administration, Object storage

Nov 8 2019

olasd added a comment to T1954: Up-to-date objstorage mirror on S3.

I've added a metric with the S3 objects to https://grafana.softwareheritage.org/d/jScG7g6mk/objstorage-object-counts. There's... "some" work to do still.

Nov 8 2019, 7:12 PM · System administration, Object storage
olasd changed the status of T1954: Up-to-date objstorage mirror on S3 from Open to Work in Progress.

So I've deployed this (by hand for now) on uffizi and it seems to be doing its job.

Nov 8 2019, 11:41 AM · System administration, Object storage

Oct 29 2019

vlorentz updated the task description for T1954: Up-to-date objstorage mirror on S3.
Oct 29 2019, 11:34 AM · System administration, Object storage

Aug 19 2019

vlorentz updated the task description for T1954: Up-to-date objstorage mirror on S3.
Aug 19 2019, 11:47 AM · System administration, Object storage
vlorentz updated the task description for T1954: Up-to-date objstorage mirror on S3.
Aug 19 2019, 11:45 AM · System administration, Object storage
vlorentz triaged T1954: Up-to-date objstorage mirror on S3 as High priority.
Aug 19 2019, 11:44 AM · System administration, Object storage

Jun 19 2019

olasd closed T1823: make DB/FS transactions nest properly as Resolved by committing rDOBJS67197802d5aa: pathslicing: Make sure data is flushed to disk before renaming the tempfile.
Jun 19 2019, 4:11 PM · Object storage, Storage manager
olasd added a revision to T1823: make DB/FS transactions nest properly: D1611: pathslicing: Make sure data is flushed to disk before renaming the tempfile.
Jun 19 2019, 1:46 PM · Object storage, Storage manager

Jun 18 2019

zack triaged T1823: make DB/FS transactions nest properly as High priority.
Jun 18 2019, 12:38 PM · Object storage, Storage manager

May 23 2019

vlorentz triaged T1736: Update the "Archive copies" documentation as Normal priority.
May 23 2019, 11:17 AM · Object storage, Documentation

Mar 26 2019

olasd added a parent task for T1577: Compare/benchmark objstorage backends : T1608: Write all new objects to azure synchronously.
Mar 26 2019, 6:58 PM · Object storage

Mar 22 2019

seirl placed T805: objstorage: allow use of file-like objects for streaming methods up for grabs.
Mar 22 2019, 1:31 PM · Object storage
seirl triaged T1596: ObjStorage: investigate the per-request CPU overhead of aiohttp as Normal priority.
Mar 22 2019, 1:27 PM · Object storage
seirl triaged T1595: ObjStorage: compression pass-through as Normal priority.
Mar 22 2019, 1:24 PM · Object storage
seirl triaged T1594: ObjStorage: investigate HTTP pipelining issues across all the different backends as Normal priority.
Mar 22 2019, 11:35 AM · Object storage
seirl triaged T1593: ObjStorage: fast mass-retrieval of objects as Normal priority.
Mar 22 2019, 11:33 AM · Object storage

Mar 12 2019

vlorentz removed a project from T1447: Add support for slices when getting objects from the objstorage.: Easy hack.

I didn't know objects are compressed. That indeed makes the issue harder.

Mar 12 2019, 11:51 AM · Object storage
douardda triaged T1577: Compare/benchmark objstorage backends as Normal priority.
Mar 12 2019, 9:53 AM · Object storage
douardda added a comment to T1447: Add support for slices when getting objects from the objstorage..

Same here. Not that much an easy hack. And what is the real life use case that drive this feature request? YAGNI?

Mar 12 2019, 9:46 AM · Object storage

Feb 25 2019

ardumont closed T1533: Make sure api server uses explicit configurations as Resolved.
Feb 25 2019, 12:15 PM · Scheduling utilities, Web app, SWORD deposit, Object storage, Storage manager, Vault
ardumont updated the task description for T1533: Make sure api server uses explicit configurations.
Feb 25 2019, 12:15 PM · Scheduling utilities, Web app, SWORD deposit, Object storage, Storage manager, Vault

Feb 23 2019

ardumont updated the task description for T1533: Make sure api server uses explicit configurations.
Feb 23 2019, 1:58 AM · Scheduling utilities, Web app, SWORD deposit, Object storage, Storage manager, Vault
ardumont added a project to T1533: Make sure api server uses explicit configurations: Scheduling utilities.
Feb 23 2019, 12:43 AM · Scheduling utilities, Web app, SWORD deposit, Object storage, Storage manager, Vault
ardumont updated the task description for T1533: Make sure api server uses explicit configurations.
Feb 23 2019, 12:42 AM · Scheduling utilities, Web app, SWORD deposit, Object storage, Storage manager, Vault

Feb 22 2019

ardumont updated the task description for T1533: Make sure api server uses explicit configurations.
Feb 22 2019, 11:22 AM · Scheduling utilities, Web app, SWORD deposit, Object storage, Storage manager, Vault

Feb 21 2019

ardumont updated the task description for T1533: Make sure api server uses explicit configurations.
Feb 21 2019, 8:34 PM · Scheduling utilities, Web app, SWORD deposit, Object storage, Storage manager, Vault
ardumont updated the task description for T1533: Make sure api server uses explicit configurations.
Feb 21 2019, 2:11 PM · Scheduling utilities, Web app, SWORD deposit, Object storage, Storage manager, Vault

Feb 20 2019

ardumont updated the task description for T1533: Make sure api server uses explicit configurations.
Feb 20 2019, 7:11 PM · Scheduling utilities, Web app, SWORD deposit, Object storage, Storage manager, Vault

Feb 19 2019

ardumont renamed T1533: Make sure api server uses explicit configurations from Use explicit configuration for all modules to Make sure api server uses explicit configurations.
Feb 19 2019, 11:36 AM · Scheduling utilities, Web app, SWORD deposit, Object storage, Storage manager, Vault

Jan 31 2019

vlorentz added a project to T1487: Add a public API endpoint to retrieve a set of files with a given name: Easy hack.
Jan 31 2019, 1:29 PM · Easy hack, Storage manager, Object storage

Jan 21 2019

vlorentz added a project to T1487: Add a public API endpoint to retrieve a set of files with a given name: Storage manager.

The object storage doesn't have content names, so it cannot address this feature as stated.

Jan 21 2019, 4:55 PM · Easy hack, Storage manager, Object storage
zack updated subscribers of T1487: Add a public API endpoint to retrieve a set of files with a given name.

The object storage doesn't have content names, so it cannot address this feature as stated.

Jan 21 2019, 4:50 PM · Easy hack, Storage manager, Object storage
vlorentz added a comment to T1487: Add a public API endpoint to retrieve a set of files with a given name.

A crude script doing this: https://forge.softwareheritage.org/source/snippets/browse/master/vlorentz/get_metadata_files_examples.py

Jan 21 2019, 4:29 PM · Easy hack, Storage manager, Object storage
vlorentz triaged T1487: Add a public API endpoint to retrieve a set of files with a given name as Low priority.
Jan 21 2019, 4:28 PM · Easy hack, Storage manager, Object storage

Dec 19 2018

ardumont added a comment to T803: Indexer - Retrieval error when contents is too big.

I suppose it all depends on the current storage's configuration.

Dec 19 2018, 10:21 AM · Indexer, Object storage

Dec 18 2018

olasd added a comment to T1447: Add support for slices when getting objects from the objstorage..

objstorage_pathslicing manipulates a *gzipped* file object, which means that TTBOMK seek is not supported, and we will have to decompress the complete beginning of the file to get to the range that we really want to read.

Dec 18 2018, 6:36 PM · Object storage
ardumont added a comment to T803: Indexer - Retrieval error when contents is too big.

In the objstorage's pathslicing implementation, there is the get_stream implementation which is not used [1]

Dec 18 2018, 5:10 PM · Indexer, Object storage
vlorentz triaged T1447: Add support for slices when getting objects from the objstorage. as Low priority.
Dec 18 2018, 4:40 PM · Object storage
vlorentz added a subtask for T803: Indexer - Retrieval error when contents is too big: T1446: Add support for slices in Storage.content_get.
Dec 18 2018, 4:35 PM · Indexer, Object storage

Nov 19 2018

vlorentz claimed T1307: Remove mock storages used in tests..
Nov 19 2018, 5:11 PM · Storage manager
vlorentz closed T1306: Write an in-memory backend for swh.storage for tests., a subtask of T1307: Remove mock storages used in tests., as Resolved.
Nov 19 2018, 5:11 PM · Storage manager

Nov 7 2018

vlorentz renamed T1306: Write an in-memory backend for swh.storage for tests. from Write in-memory backends for swh.storage and swh.objstorage for tests. to Write an in-memory backend for swh.storage for tests..
Nov 7 2018, 12:04 PM · Storage manager
ardumont added a comment to T1306: Write an in-memory backend for swh.storage for tests..

objstorage already has an in-memory implementation?

Nov 7 2018, 11:57 AM · Storage manager

Nov 6 2018

olasd added a comment to T1306: Write an in-memory backend for swh.storage for tests..

objstorage already has an in-memory implementation?

Nov 6 2018, 3:43 PM · Storage manager
vlorentz renamed T1306: Write an in-memory backend for swh.storage for tests. from Write an in-memory backend for swh.storage for tests. to Write in-memory backends for swh.storage and swh.objstorage for tests..
Nov 6 2018, 2:40 PM · Storage manager
vlorentz added a subtask for T1307: Remove mock storages used in tests.: T1306: Write an in-memory backend for swh.storage for tests..
Nov 6 2018, 2:40 PM · Storage manager
vlorentz triaged T1307: Remove mock storages used in tests. as Normal priority.
Nov 6 2018, 2:39 PM · Storage manager

Oct 11 2018

ardumont closed D514: tests: Add rados requirements.
Oct 11 2018, 11:24 AM · Object storage
ardumont updated the summary of D514: tests: Add rados requirements.
Oct 11 2018, 11:23 AM · Object storage
vlorentz accepted D514: tests: Add rados requirements.
Oct 11 2018, 11:21 AM · Object storage
ardumont added a project to D514: tests: Add rados requirements: Object storage.
Oct 11 2018, 11:20 AM · Object storage

Aug 3 2018

ftigeot added a comment to T1048: Clean striped object storages from objects they should not be containing.

Given Uffizi already uses 64GB of RAM (more than some physical machines), this should be a no brainer.
I am not sure if this would really improve I/O performance, though.

Aug 3 2018, 3:03 PM · Object storage

Jul 17 2018

olasd changed the status of T1047: Write all contents synchronously to the ceph cluster, a subtask of T1043: handle the uffizi content store being full, from Open to Work in Progress.
Jul 17 2018, 2:15 PM · Object storage
olasd changed the status of T1047: Write all contents synchronously to the ceph cluster from Open to Work in Progress.

The scaffolding to do this has been setup in Puppet. However, our naive ceph objstorage implementation uses around 7-8 times the space that would be used by the objects stored individually, instead of 1.4 times.

Jul 17 2018, 2:15 PM · Object storage
olasd changed the status of T1048: Clean striped object storages from objects they should not be containing, a subtask of T1043: handle the uffizi content store being full, from Open to Work in Progress.
Jul 17 2018, 2:15 PM · Object storage
olasd changed the status of T1048: Clean striped object storages from objects they should not be containing, a subtask of T1044: Write all contents synchronously to azure, from Open to Work in Progress.
Jul 17 2018, 2:15 PM · Object storage
olasd changed the status of T1048: Clean striped object storages from objects they should not be containing, a subtask of T1046: Stripe local contents between uffizi and banco, from Open to Work in Progress.
Jul 17 2018, 2:15 PM · Object storage
olasd changed the status of T1048: Clean striped object storages from objects they should not be containing from Open to Work in Progress.

This has been running for a while. It's quite taxing on uffizi even with nice and ionice set very high: load has been consistently around 45.

Jul 17 2018, 2:15 PM · Object storage