Page MenuHomeSoftware Heritage

olasd (Nicolas Dandrimont)
UserAdministrator

Projects (8)

User Details

User Since
Sep 7 2015, 3:25 PM (295 w, 4 d)
Roles
Administrator

Recent Activity

Fri, Apr 30

olasd requested review of D5657: Spool large packfiles to disk instead of consuming tons of memory.
Fri, Apr 30, 8:28 PM
olasd closed D5591: Remove row locking from SQL queries.
Fri, Apr 30, 8:18 PM
olasd committed rDSCHbab557e9f669: Remove row locking from SQL queries (authored by olasd).
Remove row locking from SQL queries
Fri, Apr 30, 8:18 PM
olasd updated the diff for D5591: Remove row locking from SQL queries.

Rebase

Fri, Apr 30, 8:14 PM
olasd closed D5614: tarball: properly normalize perms for all extracted files.
Fri, Apr 30, 8:13 PM
olasd committed rDCORE7d42035a13a2: tarball: properly normalize perms for all extracted files (authored by olasd).
tarball: properly normalize perms for all extracted files
Fri, Apr 30, 8:13 PM
olasd added a reverting change for rDSNIP81ed5d0ed0ac: migrate_extid.py: Initial commit, works on the first 7% of revisions: rDSNIP0ecc948af537: Partially revert "migrate_extid.py: Initial commit, works on the first 7% of….
Fri, Apr 30, 8:12 PM
olasd committed rDSNIP0ecc948af537: Partially revert "migrate_extid.py: Initial commit, works on the first 7% of… (authored by olasd).
Partially revert "migrate_extid.py: Initial commit, works on the first 7% of…
Fri, Apr 30, 8:12 PM
olasd added a comment to D5614: tarball: properly normalize perms for all extracted files.

If we want to remove perms normalization from the tarball loader, then we need to discuss that in a separate task. This fixes and introduces tests for the behavior that was intended by the original code, which was buggy.

Fri, Apr 30, 8:12 PM
olasd accepted D5650: identifiers: Expose manifest computation.

D5652 adds tests for the manifest format, so I don't think you need to do that here.

Fri, Apr 30, 5:39 PM

Thu, Apr 29

olasd accepted D5645: Declare icinga checks in the yaml configuration.

Don't forget to drop the notice calls ;)

Thu, Apr 29, 2:45 PM

Tue, Apr 27

olasd committed rDSNIP712374841484: Add takedown-related stub scripts (authored by olasd).
Add takedown-related stub scripts
Tue, Apr 27, 7:13 PM
olasd created P1022 kafka-consumer-template.conf.
Tue, Apr 27, 5:32 PM
olasd accepted D5619: templates/swh-pipeline: Prevent BrokenPipe when checking tox sphinx env.
Tue, Apr 27, 12:09 PM

Mon, Apr 26

olasd accepted D5617: templates/swh-pipeline: Add stage to build sphinx documentation.

Please mount the shared cache directory in the image to avoid re-downloading all of PyPI from the internet on every build. I'd suggest duplicating the existing includes/agent-docker.groovy.j2 to a new agent-docker-sphinx.groovy.j2 file, to add the options relevant for this build (for instance, I don't know how useful mounting a tmpdir on /tmp would be).

Mon, Apr 26, 7:17 PM
olasd committed rSPSITE1bcaa33d16ec: Bump git loader concurrency to 6 per worker (authored by olasd).
Bump git loader concurrency to 6 per worker
Mon, Apr 26, 7:04 PM
olasd requested review of D5614: tarball: properly normalize perms for all extracted files.
Mon, Apr 26, 6:11 PM
olasd requested review of D5591: Remove row locking from SQL queries.
Mon, Apr 26, 12:58 PM
olasd created P1016 storage.yml.
Mon, Apr 26, 11:30 AM
olasd created P1015 Command-Line Input.
Mon, Apr 26, 11:19 AM

Fri, Apr 23

olasd closed T2471: NPM package angular-ts-manage fails to be properly loaded as Resolved.

Now deployed in prod:

Fri, Apr 23, 1:41 PM · Core Loader
olasd added a comment to T2471: NPM package angular-ts-manage fails to be properly loaded.

New swh.loader.core deployed in staging.

Fri, Apr 23, 1:20 PM · Core Loader
olasd added a comment to D5585: Remove most of the README content and point to the Developer setup page.

I'd probably keep the new package repo instructions in here as they have no counterpart in the developer docs; Move that to the bottom, maybe?

I think this draft is about complement this ^ and needs to be reworked and merged into an actual doc [1]

And there is a dedicated task for it btw [2]

[1] https://hedgedoc.softwareheritage.org/CaqENEY3RJi19idS8MggWw

[2] T3179

Fri, Apr 23, 1:18 PM
olasd added a comment to D5585: Remove most of the README content and point to the Developer setup page.

I'd probably keep the new package repo instructions in here as they have no counterpart in the developer docs; Move that to the bottom, maybe?

Fri, Apr 23, 11:58 AM

Mon, Apr 19

olasd added a comment to T2602: Investigate how to upgrade the schema of the Cassandra storage.

For the harder cases, that involve changes to the PK, we could do something like this:

  • create a new table with a new name (eg. revision_v[n+1]; like we do in swh-search except Cassandra does not support aliases)
  • start an extra storage backend, that reads from that table instead of the old one (eg. revision_v[n]), and also reads from all the other tables as usual
  • have a multiplexing storage proxy (like we have for the objstorage), that queries this new backend (which reads from v[n+1]), and falls back to the old backend (which reads from v[n])
Mon, Apr 19, 1:59 PM · Storage manager
olasd claimed T1954: Up-to-date objstorage mirror on S3.

Some partitions have reached the tail of the journal and everything is still running smoothly, yay.

Mon, Apr 19, 12:08 PM · System administration, Object storage
olasd added a parent task for T2003: Content replayer may try to copy objects before they are available from an objstorage: T1954: Up-to-date objstorage mirror on S3.
Mon, Apr 19, 12:07 PM · Journal
olasd added a subtask for T1954: Up-to-date objstorage mirror on S3: T2003: Content replayer may try to copy objects before they are available from an objstorage.
Mon, Apr 19, 12:07 PM · System administration, Object storage
olasd closed T2003: Content replayer may try to copy objects before they are available from an objstorage, a subtask of T1914: Keep mirror of contents on S3 up to date, as Resolved.
Mon, Apr 19, 12:06 PM · Mirror, Datasets
olasd closed T2003: Content replayer may try to copy objects before they are available from an objstorage as Resolved.

So D5246 has landed a while ago. The s3 object copy process has now caught up on some partitions and I can confirm that the copy of the latest added objects happens without any race condition.

Mon, Apr 19, 12:06 PM · Journal

Fri, Apr 16

olasd added inline comments to D5550: tox: Add a sphinx environment to check sane doc build.
Fri, Apr 16, 7:46 PM
olasd triaged T3259: Gracefully handle a client closing the connection in the middle of a response being streamed as Normal priority.
Fri, Apr 16, 6:34 PM · Graph service

Wed, Apr 14

olasd added a comment to D5503: backend: Open endpoints to peek/grab tasks with any priority.

Sure, you can keep this implementation, that's why the diff was accepted in the first place.

Wed, Apr 14, 10:45 AM

Tue, Apr 13

olasd accepted D5503: backend: Open endpoints to peek/grab tasks with any priority.

Could you deduplicate swh_scheduler_peek_any_ready_priority_tasks and swh_scheduler_peek_tasks_with_priority?

Tue, Apr 13, 6:52 PM
olasd accepted D5491: contributing/code-review: Fix broken references.
Tue, Apr 13, 12:28 PM
olasd accepted D5477: Rewrite the architecture overview.

Very, very nice! Thanks.

Tue, Apr 13, 12:28 PM
olasd requested changes to D5491: contributing/code-review: Fix broken references.

Please fix the coding-guidelines ref as well.

Tue, Apr 13, 12:22 PM
olasd added a comment to D5488: Define high level load-git-high task.

I don't understand why this is needed. Aren't we be able to explicitly send instances of the existing swh.loader.git.tasks.UpdateGitRepository task to a separate queue, and have a celery process consume the "regular" tasks from that queue directly?

Tue, Apr 13, 12:07 PM
olasd added inline comments to D5486: Declare new service worker to consume save code now queues.
Tue, Apr 13, 10:49 AM

Mon, Apr 12

olasd added a comment to T1954: Up-to-date objstorage mirror on S3.

The process has been restarted and is well ongoing (we have 800 million objects left to copy, at around 500 ops, so the ETA until reaching the tail of the log is around 3 weeks now).

Mon, Apr 12, 5:11 PM · System administration, Object storage
olasd claimed T3085: Complete and updated copy of the archive on S3 (objects+graph).
Mon, Apr 12, 5:04 PM · meta-task, Roadmap 2021, System administration, Object storage
olasd added a comment to T3246: Document takedown request processing workflow.

Knobs to adjust the visibility of origins in the archive and in the web API

Mon, Apr 12, 4:52 PM · Archive content
olasd added a comment to T3087: Implement support for takedown notices (infra, admin tools, workflow).

Are we planning to add a way to notify the mirrors of the takedown notices ?

Mon, Apr 12, 4:43 PM · meta-task, Roadmap 2021, Web app
olasd triaged T3246: Document takedown request processing workflow as Normal priority.
Mon, Apr 12, 4:33 PM · Archive content
olasd updated the task description for T3245: List all the objects that should be impacted by a given takedown request.
Mon, Apr 12, 4:24 PM · Storage manager
olasd changed the status of T3245: List all the objects that should be impacted by a given takedown request from Open to Work in Progress.
Mon, Apr 12, 4:24 PM · Storage manager
olasd placed T3087: Implement support for takedown notices (infra, admin tools, workflow) up for grabs.
Mon, Apr 12, 4:15 PM · meta-task, Roadmap 2021, Web app
olasd accepted D5483: Puppetize the elasticsearch closing index script.

I'm sure some of the assumptions behind this script are going to break, eventually. But for now it's better than nothing!

Mon, Apr 12, 4:00 PM
olasd accepted D5467: tox: Check swh-identify can run even if Dulwich isn't installed.
Mon, Apr 12, 1:18 PM
olasd created P1004 elasticsearch index freeze/close.
Mon, Apr 12, 10:31 AM
olasd accepted D5479: network: Remove network interface on deprecated VLAN210 network.
Mon, Apr 12, 10:14 AM

Fri, Apr 9

olasd added a comment to D5466: swh-identify: Hide tracebacks if Click or Dulwich is not installed.

I'd rephrase the suggestions to "Please install 'swh.model[cli]' for full functionality.", rather than try to guess what the user did. They likely got the swh-identify script from installing swh.model as a dependency of something else!

Fri, Apr 9, 2:42 PM
olasd added a comment to D5467: tox: Check swh-identify can run even if Dulwich isn't installed.

Jenkins runs the py3 tox environment explicitly, so this won't get run by CI. The same issue exists with the previously added "identify" environment.

Fri, Apr 9, 2:40 PM

Thu, Apr 8

olasd closed T3224: Implement blocklist support in swh.search as Resolved.
Thu, Apr 8, 7:27 PM · Archive search
olasd closed T3224: Implement blocklist support in swh.search, a subtask of T1099: support origin and SWHID blocklist for archive search and browse, as Resolved.
Thu, Apr 8, 7:27 PM · General, Web app
olasd added a comment to T3224: Implement blocklist support in swh.search.

And this is now available in production.

Thu, Apr 8, 7:27 PM · Archive search
olasd added a comment to T3224: Implement blocklist support in swh.search.

This has now been deployed and tested in staging with a canary origin (github.com/olasd/Pythagore). Time to deploy in production.

Thu, Apr 8, 6:55 PM · Archive search
olasd closed D5465: Add basic support for an origin blocklist.
Thu, Apr 8, 5:37 PM
olasd committed rDSEAebee5d1ba6b3: Add basic support for an origin blocklist (authored by olasd).
Add basic support for an origin blocklist
Thu, Apr 8, 5:37 PM
olasd claimed T3087: Implement support for takedown notices (infra, admin tools, workflow).
Thu, Apr 8, 5:11 PM · meta-task, Roadmap 2021, Web app
olasd requested review of D5465: Add basic support for an origin blocklist.
Thu, Apr 8, 5:05 PM
olasd added a revision to T3224: Implement blocklist support in swh.search: D5465: Add basic support for an origin blocklist.
Thu, Apr 8, 5:03 PM · Archive search
olasd changed the status of T3224: Implement blocklist support in swh.search from Open to Work in Progress.
Thu, Apr 8, 4:55 PM · Archive search
olasd renamed T1099: support origin and SWHID blocklist for archive search and browse from support origin and SWHID blacklist for archive search and browse to support origin and SWHID blocklist for archive search and browse.
Thu, Apr 8, 4:51 PM · General, Web app

Apr 7 2021

olasd added a comment to T3084: Fast track save code now requests.

Operationally, there's two axes we can play with:

Apr 7 2021, 4:31 PM · System administration, Web app

Apr 6 2021

olasd added a comment to T2513: Copy metadata on revisions to the extrinsic metadata storage.
2021-04-06 20:54:44,962 __main__     ERROR    Could not parse revision metadata 00c6e2fe046dee3b5ef629f74f4801345840e70a
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/swh/storage/migrate_extrinsic_metadata.py", line 1161, in main
    handle_row(row, storage, deposit_cur, dry_run)
  File "/usr/lib/python3/dist-packages/swh/storage/migrate_extrinsic_metadata.py", line 843, in handle_row
    assert "id" in actual_metadata or "title" in actual_metadata
AssertionError
Apr 6 2021, 11:12 PM · Metadata workflow, Roadmap 2020
olasd added a comment to T2513: Copy metadata on revisions to the extrinsic metadata storage.
2021-04-06 20:19:19,898 __main__     ERROR    Could not parse revision metadata 00959a167bd98452c98ce73382f4b42179d53d32
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/swh/storage/migrate_extrinsic_metadata.py", line 1161, in main
    handle_row(row, storage, deposit_cur, dry_run)
  File "/usr/lib/python3/dist-packages/swh/storage/migrate_extrinsic_metadata.py", line 979, in handle_row
    storage, row["id"], metadata["original_artifact"][0]["filename"]
  File "/usr/lib/python3/dist-packages/swh/storage/migrate_extrinsic_metadata.py", line 265, in pypi_origin_from_filename
    project_name = pypi_project_from_filename(filename)
  File "/usr/lib/python3/dist-packages/swh/storage/migrate_extrinsic_metadata.py", line 256, in pypi_project_from_filename
    assert match, original_filename
AssertionError: pypops-201408-r4.tar.gz
Apr 6 2021, 10:21 PM · Metadata workflow, Roadmap 2020
olasd added a comment to T2513: Copy metadata on revisions to the extrinsic metadata storage.

(I've also noticed dry_run was = True, so I fixed that as well :P)

Apr 6 2021, 8:12 PM · Metadata workflow, Roadmap 2020
olasd added a comment to T2513: Copy metadata on revisions to the extrinsic metadata storage.

Tail of log:

Apr 6 2021, 7:57 PM · Metadata workflow, Roadmap 2020
olasd claimed T2513: Copy metadata on revisions to the extrinsic metadata storage.

The script is now running on getty.

Apr 6 2021, 7:07 PM · Metadata workflow, Roadmap 2020
olasd closed D5430: Add sha512 as a valid field in dsc metadata.
Apr 6 2021, 6:38 PM
olasd committed rDLDBASE47b7254f6159: Add sha512 as a valid field in dsc metadata (authored by olasd).
Add sha512 as a valid field in dsc metadata
Apr 6 2021, 6:38 PM
olasd updated the diff for D5430: Add sha512 as a valid field in dsc metadata.

Add comment for the sha512 field

Apr 6 2021, 6:38 PM
olasd closed T3143: Migrate revision metadata to extid in the storage, a subtask of T3139: Make loaders stop reading from the 'metadata' column of the 'revision' table, as Resolved.
Apr 6 2021, 4:53 PM · Core Loader
olasd closed T3143: Migrate revision metadata to extid in the storage as Resolved.

The migration script has now run to completion (took around a week).

Apr 6 2021, 4:53 PM · System administration, Storage manager, Core Loader
olasd requested review of D5430: Add sha512 as a valid field in dsc metadata.
Apr 6 2021, 4:50 PM
olasd added a revision to T3143: Migrate revision metadata to extid in the storage: D5430: Add sha512 as a valid field in dsc metadata.
Apr 6 2021, 4:48 PM · System administration, Storage manager, Core Loader

Apr 2 2021

olasd accepted D5408: requirements-swh-dev.txt: Fix order.
Apr 2 2021, 1:44 PM

Mar 30 2021

olasd changed the status of T3143: Migrate revision metadata to extid in the storage, a subtask of T3139: Make loaders stop reading from the 'metadata' column of the 'revision' table, from Open to Work in Progress.
Mar 30 2021, 7:43 PM · Core Loader
olasd changed the status of T3143: Migrate revision metadata to extid in the storage from Open to Work in Progress.
Mar 30 2021, 7:43 PM · System administration, Storage manager, Core Loader
olasd added a comment to T3143: Migrate revision metadata to extid in the storage.

I've deployed the extid schema changes on all storages, and I've started the migration script on getty.

Mar 30 2021, 7:42 PM · System administration, Storage manager, Core Loader
olasd accepted D5386: buffer: Add support for 'extid'.
Mar 30 2021, 5:45 PM

Mar 29 2021

olasd accepted D5363: extid: remove unicity on (extid_type, extid) and (target_type, target).
Mar 29 2021, 2:32 PM

Mar 25 2021

olasd added a comment to D5333: Python testing style guide: remove irrelevant items, and link to an external guide.

+1 from me, this guide is great.

Mar 25 2021, 1:02 PM
olasd accepted D5330: loaders: Move retry proxy to the bottom of the stack.
Mar 25 2021, 12:57 PM

Mar 23 2021

olasd raised a concern with rDLDBASEe9a8f986809c: package loaders: define extid types.
Mar 23 2021, 6:09 PM
olasd accepted D5290: Deduplicate resolve_revision_from across package loaders.
Mar 23 2021, 4:24 PM
olasd accepted D5290: Deduplicate resolve_revision_from across package loaders.

Not quite sure about the split between BasePackageInfo / BaseManifestPackageInfo (and I really don't like the new name).

Mar 23 2021, 3:12 PM
olasd accepted D5289: archive, cran: Replace 'artifact_identity' with extid to detect known packages.
Mar 23 2021, 3:01 PM
olasd added a comment to T2686: Use hashes for all kafka keys.

The following objects remain:

Mar 23 2021, 2:47 PM · Data Model, Storage manager
olasd added a comment to D5290: Deduplicate resolve_revision_from across package loaders.

Shouldn't the extid() methods all return a tuple (extid_type, extid_value) rather than a plain bytes value? I can imagine a point where, for the same loader, we might want to change the extid_type, and the current implementation wouldn't be able to distinguish them.

Mar 23 2021, 2:41 PM
olasd closed T3019: Add an index for raw_extrinsic_metadata.id in swh.storage.postgresql, a subtask of T3018: Allow querying raw_extrinsic_metadata by hash in swh-storage, as Resolved.
Mar 23 2021, 2:31 PM · Storage manager, Extrinsic metadata
olasd closed T3019: Add an index for raw_extrinsic_metadata.id in swh.storage.postgresql as Resolved.

After a lot of back and forth, and the release of swh.model v2.3.0 and swh.storage v0.26.0, this is now all done and deployed in staging and production.

Mar 23 2021, 2:31 PM · Storage manager, Extrinsic metadata
olasd closed T3019: Add an index for raw_extrinsic_metadata.id in swh.storage.postgresql, a subtask of T3022: Deduplicate RawExtrinsicMetadata by hash instead of a subset of their fields, as Resolved.
Mar 23 2021, 2:31 PM · Storage manager, Extrinsic metadata
olasd closed T3022: Deduplicate RawExtrinsicMetadata by hash instead of a subset of their fields, a subtask of T2703: Use intrinsic identifiers/hashes for RawExtrinsicMetadata objects, as Resolved.
Mar 23 2021, 2:25 PM · Data Model, Storage manager, Extrinsic metadata
olasd closed T3022: Deduplicate RawExtrinsicMetadata by hash instead of a subset of their fields as Resolved.

After the release of swh.model v2, this is now done.

Mar 23 2021, 2:25 PM · Storage manager, Extrinsic metadata
olasd closed T2520: Setup dedicated kafka cluster on new rocquencourt hardware as Resolved.

The missing topic (raw_extrinsic_metadata) has been handled as part of the migration in T3019. Closing.

Mar 23 2021, 2:24 PM · System administration

Mar 22 2021

olasd accepted D5030: raw_extrinsic_metadata: Make (target, authority_id, discovery_date, fetcher_id) non-unique.
Mar 22 2021, 12:56 PM
olasd added inline comments to D5030: raw_extrinsic_metadata: Make (target, authority_id, discovery_date, fetcher_id) non-unique.
Mar 22 2021, 12:23 PM
olasd added a project to T3152: Show README content when it is a symlink : Easy hack.
Mar 22 2021, 11:47 AM · Easy hack, Web app