Page MenuHomeSoftware Heritage
Feed Advanced Search

Mar 25 2020

vlorentz updated the task description for T2310: Make origin visits immutable.
Mar 25 2020, 1:57 PM · Storage manager, Data Model
olasd added a comment to T2332: Analyze hash collisions.

Finally, we should make sure that the storage implementations reject objects with hashes of the wrong length. I'm /almost/ sure that's the case, but we should be sure of it.

That's the case.

Mar 25 2020, 10:35 AM · Object storage, Storage manager

Mar 24 2020

ardumont added a comment to T2332: Analyze hash collisions.

Finally, we should make sure that the storage implementations reject objects with hashes of the wrong length. I'm /almost/ sure that's the case, but we should be sure of it.

Mar 24 2020, 7:03 PM · Object storage, Storage manager
ardumont added a comment to T2332: Analyze hash collisions.

to be more sure of that, I think we should make sure that all hash data in all exception arguments is hex-encoded unicode strings, rather than bytes objects left for python to repr(); this would circumvent a lot of places where encoding or decoding the data in transfer can go wrong.

Mar 24 2020, 3:04 PM · Object storage, Storage manager
ardumont added a comment to T2332: Analyze hash collisions.

it looks like there's a few actual collisions; seems that they're the known-colliding Google PDFs

Mar 24 2020, 1:05 PM · Object storage, Storage manager
olasd added a comment to T2332: Analyze hash collisions.

I'll write my remarks down here for tracking purposes

Mar 24 2020, 1:00 PM · Object storage, Storage manager
ardumont added a comment to T2332: Analyze hash collisions.

sampled collisions extracted from sentry and storage [1]

Mar 24 2020, 12:02 PM · Object storage, Storage manager
ardumont triaged T2332: Analyze hash collisions as Normal priority.
Mar 24 2020, 12:01 PM · Object storage, Storage manager

Mar 16 2020

vlorentz updated the task description for T2316: Align row deduplication of all _add endpoints on release_add.
Mar 16 2020, 4:23 PM · Easy hack, Storage manager
vlorentz added projects to T2316: Align row deduplication of all _add endpoints on release_add: Storage manager, Easy hack.
Mar 16 2020, 4:21 PM · Easy hack, Storage manager

Mar 12 2020

vlorentz triaged T2310: Make origin visits immutable as Normal priority.
Mar 12 2020, 3:54 PM · Storage manager, Data Model

Mar 10 2020

vlorentz added a revision to T2304: Cassandra storage: Reduce the size of the "secondary lookup tables" for contents: D2796: Store the value of token(partition_key) in content_by_* table, instead of three hashes..
Mar 10 2020, 1:53 PM · Storage manager
vlorentz claimed T2304: Cassandra storage: Reduce the size of the "secondary lookup tables" for contents.
Mar 10 2020, 11:44 AM · Storage manager
ardumont closed D2783: storage(s): Identify and provide the collision hash(es) in HashCollision exception.
Mar 10 2020, 10:54 AM · Storage manager
vlorentz accepted D2783: storage(s): Identify and provide the collision hash(es) in HashCollision exception.

thanks!

Mar 10 2020, 10:49 AM · Storage manager
swh-public-ci added a comment to D2783: storage(s): Identify and provide the collision hash(es) in HashCollision exception.

Build is green
See https://jenkins.softwareheritage.org/job/DSTO/job/tox/1041/ for more details.

Mar 10 2020, 9:34 AM · Storage manager
ardumont updated the diff for D2783: storage(s): Identify and provide the collision hash(es) in HashCollision exception.

Rebase on latest master

Mar 10 2020, 9:28 AM · Storage manager
swh-public-ci added a comment to D2783: storage(s): Identify and provide the collision hash(es) in HashCollision exception.

Build is green
See https://jenkins.softwareheritage.org/job/DSTO/job/tox/1040/ for more details.

Mar 10 2020, 9:20 AM · Storage manager
swh-public-ci added a comment to D2783: storage(s): Identify and provide the collision hash(es) in HashCollision exception.

Build was aborted

Mar 10 2020, 9:13 AM · Storage manager
Harbormaster failed remote builds in B11008: Diff 9937 for D2783: storage(s): Identify and provide the collision hash(es) in HashCollision exception!
Mar 10 2020, 8:43 AM · Storage manager
swh-public-ci added a comment to D2783: storage(s): Identify and provide the collision hash(es) in HashCollision exception.

Build was aborted

Mar 10 2020, 8:43 AM · Storage manager

Mar 9 2020

ardumont updated the diff for D2783: storage(s): Identify and provide the collision hash(es) in HashCollision exception.

Improve collision scenario checks

Mar 9 2020, 6:45 PM · Storage manager
Harbormaster failed remote builds in B11000: Diff 9929 for D2783: storage(s): Identify and provide the collision hash(es) in HashCollision exception!
Mar 9 2020, 6:44 PM · Storage manager
swh-public-ci added a comment to D2783: storage(s): Identify and provide the collision hash(es) in HashCollision exception.

Build was aborted

Mar 9 2020, 6:44 PM · Storage manager
ardumont added a comment to D2783: storage(s): Identify and provide the collision hash(es) in HashCollision exception.

Could you add more assertions to test_content_add_collision and test_content_add_metadata_collision, to check for the new common behavior?

Mar 9 2020, 5:45 PM · Storage manager
vlorentz requested changes to D2783: storage(s): Identify and provide the collision hash(es) in HashCollision exception.

Could you add more assertions to test_content_add_collision and test_content_add_metadata_collision, to check for the new common behavior?

Mar 9 2020, 4:43 PM · Storage manager
ardumont added inline comments to D2783: storage(s): Identify and provide the collision hash(es) in HashCollision exception.
Mar 9 2020, 4:39 PM · Storage manager
ardumont updated the diff for D2783: storage(s): Identify and provide the collision hash(es) in HashCollision exception.

Add coverage on extra conversion step

Mar 9 2020, 4:37 PM · Storage manager
vlorentz added inline comments to D2783: storage(s): Identify and provide the collision hash(es) in HashCollision exception.
Mar 9 2020, 4:17 PM · Storage manager
olasd added inline comments to D2783: storage(s): Identify and provide the collision hash(es) in HashCollision exception.
Mar 9 2020, 4:16 PM · Storage manager
ardumont added inline comments to D2783: storage(s): Identify and provide the collision hash(es) in HashCollision exception.
Mar 9 2020, 4:10 PM · Storage manager
swh-public-ci added a comment to D2783: storage(s): Identify and provide the collision hash(es) in HashCollision exception.

Build is green
See https://jenkins.softwareheritage.org/job/DSTO/job/tox/1033/ for more details.

Mar 9 2020, 2:27 PM · Storage manager
ardumont added inline comments to D2783: storage(s): Identify and provide the collision hash(es) in HashCollision exception.
Mar 9 2020, 2:23 PM · Storage manager
Harbormaster failed remote builds in B10997: Diff 9926 for D2783: storage(s): Identify and provide the collision hash(es) in HashCollision exception!
Mar 9 2020, 2:21 PM · Storage manager
swh-public-ci added a comment to D2783: storage(s): Identify and provide the collision hash(es) in HashCollision exception.

Build was aborted

Mar 9 2020, 2:21 PM · Storage manager
ardumont updated the diff for D2783: storage(s): Identify and provide the collision hash(es) in HashCollision exception.

Align storages to return the list of colliding hashes

Mar 9 2020, 2:21 PM · Storage manager
ardumont updated the diff for D2783: storage(s): Identify and provide the collision hash(es) in HashCollision exception.
  • pgstorage: Return the list of colliding content hashes
  • improve regexp extraction
Mar 9 2020, 1:34 PM · Storage manager
ardumont added inline comments to D2783: storage(s): Identify and provide the collision hash(es) in HashCollision exception.
Mar 9 2020, 1:31 PM · Storage manager
ardumont added inline comments to D2783: storage(s): Identify and provide the collision hash(es) in HashCollision exception.
Mar 9 2020, 1:24 PM · Storage manager
olasd added inline comments to D2783: storage(s): Identify and provide the collision hash(es) in HashCollision exception.
Mar 9 2020, 11:55 AM · Storage manager
vlorentz added inline comments to D2783: storage(s): Identify and provide the collision hash(es) in HashCollision exception.
Mar 9 2020, 11:38 AM · Storage manager
olasd requested changes to D2783: storage(s): Identify and provide the collision hash(es) in HashCollision exception.

As mentioned inline in the pg storage diff, in general we should return /all/ colliding contents that we can find, rather than a single one. So in the end, the exception argument should be a List[Dict[str, bytes]].

Mar 9 2020, 11:35 AM · Storage manager
ardumont added inline comments to D2783: storage(s): Identify and provide the collision hash(es) in HashCollision exception.
Mar 9 2020, 11:15 AM · Storage manager

Mar 8 2020

ardumont added a comment to D2783: storage(s): Identify and provide the collision hash(es) in HashCollision exception.

Build is green

Mar 8 2020, 9:59 AM · Storage manager
swh-public-ci added a comment to D2783: storage(s): Identify and provide the collision hash(es) in HashCollision exception.

Build is green
See https://jenkins.softwareheritage.org/job/DSTO/job/tox/1031/ for more details.

Mar 8 2020, 9:25 AM · Storage manager
Harbormaster failed remote builds in B10993: Diff 9923 for D2783: storage(s): Identify and provide the collision hash(es) in HashCollision exception!
Mar 8 2020, 9:18 AM · Storage manager
swh-public-ci added a comment to D2783: storage(s): Identify and provide the collision hash(es) in HashCollision exception.

Build was aborted

Mar 8 2020, 9:18 AM · Storage manager

Mar 7 2020

ardumont updated the diff for D2783: storage(s): Identify and provide the collision hash(es) in HashCollision exception.

Adapt according to review

Mar 7 2020, 2:17 PM · Storage manager
ardumont updated the summary of D2783: storage(s): Identify and provide the collision hash(es) in HashCollision exception.
Mar 7 2020, 2:06 PM · Storage manager
vlorentz accepted D2783: storage(s): Identify and provide the collision hash(es) in HashCollision exception.

lgtm, but I'd like someone else to review it as well

Mar 7 2020, 9:23 AM · Storage manager

Mar 6 2020

swh-public-ci added a comment to D2783: storage(s): Identify and provide the collision hash(es) in HashCollision exception.

Build is green
See https://jenkins.softwareheritage.org/job/DSTO/job/tox/1029/ for more details.

Mar 6 2020, 11:57 PM · Storage manager
ardumont retitled D2783: storage(s): Identify and provide the collision hash(es) in HashCollision exception from storage: Identify and provide the collision hash(es) in HashCollision exception to storage(s): Identify and provide the collision hash(es) in HashCollision exception.
Mar 6 2020, 11:54 PM · Storage manager
olasd triaged T2304: Cassandra storage: Reduce the size of the "secondary lookup tables" for contents as Normal priority.
Mar 6 2020, 5:19 PM · Storage manager

Feb 28 2020

vlorentz closed T2266: Use python-cassandra 3.21 as Resolved.

caf51a044377cf62f73a02cd6c641d94b4e32c95

Feb 28 2020, 12:03 PM · Storage manager

Feb 26 2020

ardumont closed T2239: storage: kafka issue: Can't pickle <class 'cimpl.KafkaException'>: import of module 'cimpl' failed as Resolved.

I think so (latest storage version deployed).

Feb 26 2020, 10:48 AM · Storage manager

Feb 25 2020

olasd added a comment to T2239: storage: kafka issue: Can't pickle <class 'cimpl.KafkaException'>: import of module 'cimpl' failed.

@vlorentz I guess this could be closed?

Feb 25 2020, 11:16 AM · Storage manager

Feb 18 2020

vlorentz added a project to T2215: Streaming support everywhere: meta-task.
Feb 18 2020, 4:52 PM · meta-task, Web app, Object storage, Storage manager, Roadmap 2020
vlorentz updated the task description for T2290: Implement origin_metadata endpoints in swh/storage/cassandra/.
Feb 18 2020, 4:50 PM · Easy hack, Storage manager
vlorentz renamed T2291: Implement metadata_provider endpoints in swh/storage/cassandra/ from T2290: Implement metadata_provider endpoints in swh/storage/cassandra/ to Implement metadata_provider endpoints in swh/storage/cassandra/.
Feb 18 2020, 4:50 PM · Easy hack, Storage manager
vlorentz triaged T2291: Implement metadata_provider endpoints in swh/storage/cassandra/ as Normal priority.
Feb 18 2020, 4:49 PM · Easy hack, Storage manager
vlorentz added projects to T2290: Implement origin_metadata endpoints in swh/storage/cassandra/: Storage manager, Easy hack.
Feb 18 2020, 4:48 PM · Easy hack, Storage manager

Feb 17 2020

vlorentz added a comment to T2239: storage: kafka issue: Can't pickle <class 'cimpl.KafkaException'>: import of module 'cimpl' failed.

python-cassandra too has some un-unpicklable errors:

Feb 17 2020, 12:14 PM · Storage manager

Feb 14 2020

vlorentz triaged T2287: Improve code in BufferingProxyStorage as Normal priority.
Feb 14 2020, 6:02 PM · Easy hack, Storage manager

Feb 6 2020

ardumont closed T2185: Make webapp0 use Cassandra as storage backend., a subtask of T1892: Cassandra as a storage backend, as Resolved.
Feb 6 2020, 12:31 PM · meta-task, Storage manager
ardumont closed T2185: Make webapp0 use Cassandra as storage backend. as Resolved.
Feb 6 2020, 12:31 PM · Storage manager

Feb 5 2020

vlorentz added a revision to T2239: storage: kafka issue: Can't pickle <class 'cimpl.KafkaException'>: import of module 'cimpl' failed: D2627: In case of errors, return a simple dictionary instead of pickled exception..
Feb 5 2020, 5:12 PM · Storage manager
vlorentz claimed T2239: storage: kafka issue: Can't pickle <class 'cimpl.KafkaException'>: import of module 'cimpl' failed.
Feb 5 2020, 4:18 PM · Storage manager
vlorentz added a comment to T2239: storage: kafka issue: Can't pickle <class 'cimpl.KafkaException'>: import of module 'cimpl' failed.

Relatedly, errors raised by tenacity cannot be pickled because they contain a Lock: https://github.com/jd/tenacity/issues/147

Feb 5 2020, 4:17 PM · Storage manager
ardumont changed the status of T2185: Make webapp0 use Cassandra as storage backend., a subtask of T1892: Cassandra as a storage backend, from Open to Work in Progress.
Feb 5 2020, 4:05 PM · meta-task, Storage manager
ardumont changed the status of T2185: Make webapp0 use Cassandra as storage backend. from Open to Work in Progress.
Feb 5 2020, 4:05 PM · Storage manager
ardumont added a comment to T2185: Make webapp0 use Cassandra as storage backend..

storage02.euwest.azure exposes a rpc server using cassandra as storage backend.
webapp0 has been updated to use it.

Feb 5 2020, 4:04 PM · Storage manager
ardumont closed T2183: Switch webapp0 to use swh-search instead of postgresql search. as Resolved.
Feb 5 2020, 3:02 PM · Archive search, Storage manager
ardumont closed T2183: Switch webapp0 to use swh-search instead of postgresql search., a subtask of T2185: Make webapp0 use Cassandra as storage backend., as Resolved.
Feb 5 2020, 3:02 PM · Storage manager
ardumont closed T2183: Switch webapp0 to use swh-search instead of postgresql search., a subtask of T2182: Switch production swh-web to use swh-search instead of postgresql search., as Resolved.
Feb 5 2020, 3:02 PM · System administration, Archive search, Storage manager
ardumont added a project to T2266: Use python-cassandra 3.21: Storage manager.
Feb 5 2020, 1:50 PM · Storage manager

Feb 3 2020

vlorentz added a comment to T2186: Merge swh-storage-cassandra in swh-storage master.

thx

Feb 3 2020, 10:38 PM · Storage manager
ardumont added a comment to T2186: Merge swh-storage-cassandra in swh-storage master.

and swh-storage debian package built [1] (passing the cassandra tests ;)

Feb 3 2020, 5:45 PM · Storage manager
vlorentz closed T2186: Merge swh-storage-cassandra in swh-storage master, a subtask of T2185: Make webapp0 use Cassandra as storage backend., as Resolved.
Feb 3 2020, 1:33 PM · Storage manager
vlorentz closed T2186: Merge swh-storage-cassandra in swh-storage master as Resolved.
Feb 3 2020, 1:33 PM · Storage manager

Jan 30 2020

zack added a comment to T2262: Deal with IRIs.

I'm fine with switching to IRIs in the doc, just please expand what it means on first use (with a mention like "they are like URIs but"), as I don't think the acronym is that well-known yet, especially in the US.

Jan 30 2020, 5:47 PM · Storage manager, Data Model
vlorentz triaged T2262: Deal with IRIs as Normal priority.
Jan 30 2020, 4:26 PM · Storage manager, Data Model

Jan 29 2020

ardumont renamed T2211: Go beyond git expressivity from Go beyound git expressivity to Go beyond git expressivity.
Jan 29 2020, 6:43 PM · Mercurial loader, Storage manager, Data Model, Roadmap 2020
ardumont closed T2243: Add Debian package python3-cassandra as Resolved.

done indeed.

Jan 29 2020, 3:49 PM · Storage manager
vlorentz closed T2184: Replay origins to ElasticSearch's "origin" index as Resolved.
Jan 29 2020, 1:39 PM · Archive search, Storage manager
vlorentz closed T2184: Replay origins to ElasticSearch's "origin" index, a subtask of T2183: Switch webapp0 to use swh-search instead of postgresql search., as Resolved.
Jan 29 2020, 1:39 PM · Archive search, Storage manager

Jan 27 2020

ardumont added a comment to T2183: Switch webapp0 to use swh-search instead of postgresql search..

It's deployed btw.

Jan 27 2020, 12:01 PM · Archive search, Storage manager

Jan 24 2020

ardumont closed T2167: Deploy swh-search, a subtask of T1910: Redesign origin search using a dedicated component (swh-search), as Resolved.
Jan 24 2020, 1:46 PM · Archive search, Storage manager
ardumont closed T2167: Deploy swh-search, a subtask of T2183: Switch webapp0 to use swh-search instead of postgresql search., as Resolved.
Jan 24 2020, 1:46 PM · Archive search, Storage manager
ardumont changed the status of T2167: Deploy swh-search, a subtask of T1910: Redesign origin search using a dedicated component (swh-search), from Open to Work in Progress.
Jan 24 2020, 9:38 AM · Archive search, Storage manager
ardumont changed the status of T2167: Deploy swh-search, a subtask of T2183: Switch webapp0 to use swh-search instead of postgresql search., from Open to Work in Progress.
Jan 24 2020, 9:38 AM · Archive search, Storage manager
ardumont added a comment to T2185: Make webapp0 use Cassandra as storage backend..

Cool, looks like this is all ready within our code base:

Jan 24 2020, 9:38 AM · Storage manager
ardumont renamed T1910: Redesign origin search using a dedicated component (swh-search) from Redesign origin search using a dedicated component to Redesign origin search using a dedicated component (swh-search).
Jan 24 2020, 9:28 AM · Archive search, Storage manager

Jan 23 2020

olasd closed T546: Update debian loader to register origin_visit's state, a subtask of T534: Add completion information to softwareheritage.origin_visit table, as Resolved.
Jan 23 2020, 2:11 PM · Storage manager
olasd closed T757: Memory leak in swh.storage.api.server as Wontfix.

Considering the age of the bug report and how many underlying libraries have been upgraded, we can reopen this when we notice it again.

Jan 23 2020, 2:10 PM · Storage manager
douardda added a comment to T757: Memory leak in swh.storage.api.server.

Is this still "a thing"?

Jan 23 2020, 1:58 PM · Storage manager

Jan 22 2020

vlorentz claimed T2214: Scale-out graph and database storage in production.
Jan 22 2020, 4:46 PM · meta-task, Roadmap 2022, Roadmap 2021, Storage manager
vlorentz changed the status of T2214: Scale-out graph and database storage in production from Open to Work in Progress.
Jan 22 2020, 4:46 PM · meta-task, Roadmap 2022, Roadmap 2021, Storage manager
vlorentz added projects to T2211: Go beyond git expressivity: Data Model, Storage manager, Mercurial loader.
Jan 22 2020, 4:38 PM · Mercurial loader, Storage manager, Data Model, Roadmap 2020
vlorentz added a project to T2214: Scale-out graph and database storage in production: Storage manager.
Jan 22 2020, 4:24 PM · meta-task, Roadmap 2022, Roadmap 2021, Storage manager
vlorentz added projects to T2215: Streaming support everywhere: Storage manager, Object storage, Web app.
Jan 22 2020, 4:24 PM · meta-task, Web app, Object storage, Storage manager, Roadmap 2020