We've moved on to other filesystems and we're not really planning on wiping the old ones anymore.
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Advanced Search
Sep 22 2020
Aug 27 2020
The journal clients copying objects to S3 are blocked on being unable to read messages from kafka.
Jul 21 2020
We're not going to do this in the forseeable future.
May 16 2020
The s3 object copy is now completely caught up with where kafka was when the backfilling of all objects from postgresql ended. This means we're now copying the "newer" objects, and there's pretty much no hits at all on the inventory file anymore.
May 4 2020
Apr 30 2020
Let's consider this is done now.
Apr 29 2020
Apr 24 2020
Apr 23 2020
I've updated the exclusion file with data from the 2020-04-19 s3 inventory.
Apr 22 2020
Apr 14 2020
Remains open because there remain decision to be made
about the few real ones (3) we have so far [1]
So our high number of falsy hash collisions is fixed thanks to D2977 now \m/.
Apr 9 2020
Apr 8 2020
An interesting experiment, disabling the proxy buffer storage in the loader nixguix configuration.
And the number of hashcollision dropped to 0 (no new event for that loader since yesterday around 6pm our time).
Apr 3 2020
All in all, this task serves the purpose of being sure those exists.
Mar 25 2020
In T2332#42825, @ardumont wrote:Finally, we should make sure that the storage implementations reject objects with hashes of the wrong length. I'm /almost/ sure that's the case, but we should be sure of it.
That's the case.
Mar 24 2020
Finally, we should make sure that the storage implementations reject objects with hashes of the wrong length. I'm /almost/ sure that's the case, but we should be sure of it.
to be more sure of that, I think we should make sure that all hash data in all exception arguments is hex-encoded unicode strings, rather than bytes objects left for python to repr(); this would circumvent a lot of places where encoding or decoding the data in transfer can go wrong.
it looks like there's a few actual collisions; seems that they're the known-colliding Google PDFs
I'll write my remarks down here for tracking purposes
sampled collisions extracted from sentry and storage [1]
Mar 12 2020
Feb 18 2020
Jan 22 2020
Dec 7 2019
I've grown tired of babysitting this, so I've added systemd notify calls to the journal replayer, allowing us to just use the systemd watchdog to restart hung processes.
Nov 25 2019
In T1954#39027, @zack wrote:So, the amount of contents on S3 went up fairly quickly during in between Nov 10th and Nov 20th, but then it stopped again, is it expected/normal?
(thanks for the metric, it really helps)
Nov 24 2019
So, the amount of contents on S3 went up fairly quickly during in between Nov 10th and Nov 20th, but then it stopped again, is it expected/normal?
Nov 8 2019
I've added a metric with the S3 objects to https://grafana.softwareheritage.org/d/jScG7g6mk/objstorage-object-counts. There's... "some" work to do still.
So I've deployed this (by hand for now) on uffizi and it seems to be doing its job.
Oct 29 2019
Aug 19 2019
Jun 19 2019
Jun 18 2019
May 23 2019
Mar 26 2019
Mar 22 2019
Mar 12 2019
I didn't know objects are compressed. That indeed makes the issue harder.
Same here. Not that much an easy hack. And what is the real life use case that drive this feature request? YAGNI?
Feb 25 2019
Feb 23 2019
Feb 22 2019
Feb 21 2019
Feb 20 2019
Feb 19 2019
Jan 31 2019
Jan 21 2019
The object storage doesn't have content names, so it cannot address this feature as stated.
The object storage doesn't have content names, so it cannot address this feature as stated.
A crude script doing this: https://forge.softwareheritage.org/source/snippets/browse/master/vlorentz/get_metadata_files_examples.py
Dec 19 2018
I suppose it all depends on the current storage's configuration.
Dec 18 2018
objstorage_pathslicing manipulates a *gzipped* file object, which means that TTBOMK seek is not supported, and we will have to decompress the complete beginning of the file to get to the range that we really want to read.
In the objstorage's pathslicing implementation, there is the get_stream implementation which is not used [1]
Nov 19 2018
Nov 7 2018
objstorage already has an in-memory implementation?
Nov 6 2018
objstorage already has an in-memory implementation?
Oct 11 2018
Aug 3 2018
Given Uffizi already uses 64GB of RAM (more than some physical machines), this should be a no brainer.
I am not sure if this would really improve I/O performance, though.
Jul 17 2018
The scaffolding to do this has been setup in Puppet. However, our naive ceph objstorage implementation uses around 7-8 times the space that would be used by the objects stored individually, instead of 1.4 times.
This has been running for a while. It's quite taxing on uffizi even with nice and ionice set very high: load has been consistently around 45.