Page MenuHomeSoftware Heritage

olasd (Nicolas Dandrimont)
UserAdministrator

Projects (8)

User Details

User Since
Sep 7 2015, 3:25 PM (307 w, 1 d)
Roles
Administrator

Recent Activity

Today

olasd closed D6023: Implement storage of the ExtID.extid_version field.
Tue, Jul 27, 4:51 PM
olasd committed rDSTO7a380458f4ab: Implement storage of the ExtID.extid_version field (authored by olasd).
Implement storage of the ExtID.extid_version field
Tue, Jul 27, 4:51 PM

Fri, Jul 23

olasd added a comment to T3233: Missing Apollo 11 virtual AGC repository from Google Code.

The project.json and a couple of source archives have been downloaded during our early archival of google code, however I don't see a dump of the svn repository.

Fri, Jul 23, 7:38 PM · SVN Loader
olasd added a comment to T3418: Decide a consistent policy on having multiple archived objects for the same extid.

For now I went the simplest way I could think of, which is:

Fri, Jul 23, 5:47 PM · Storage manager, Mercurial loader
olasd requested review of D6023: Implement storage of the ExtID.extid_version field.
Fri, Jul 23, 5:45 PM
olasd added a revision to T3418: Decide a consistent policy on having multiple archived objects for the same extid: D6023: Implement storage of the ExtID.extid_version field.
Fri, Jul 23, 5:38 PM · Storage manager, Mercurial loader
olasd closed D6019: Add an extid_version field to ExtIDs.
Fri, Jul 23, 4:12 PM
olasd committed rDMOD1545ef77e36d: Add an extid_version field to ExtIDs (authored by olasd).
Add an extid_version field to ExtIDs
Fri, Jul 23, 4:12 PM
olasd added a comment to D6019: Add an extid_version field to ExtIDs.

Relatedly to the need (not this diff), I'm unclear on how to deal with the data
migration now. For current extid clashes (which is the source of the task afaiui), how
can we determine which extid is the oldest one and which are more recent (if more than 2
extids)? [1]

Maybe we actually won't have anything to do in regards to migration.

We could adapt the loader so it detects the clash. That is, the loader finds more than
one version for the extid but without any extid-version. Then considers them moot [1],
bumps a new extid-version to 1 or something (thus recomputing the hashes for those
origins). And maybe with this, the loaders get unstuck and we are back on track.

[1] because it's anterior to this diff and we can't really detect what's what currently
(see caveats described in the task for the other discarded implementation).

Fri, Jul 23, 4:10 PM
olasd updated the diff for D6019: Add an extid_version field to ExtIDs.

Add backwards-compat test

Fri, Jul 23, 4:08 PM
olasd closed D6018: Only record last_visited and last_successful in origin_visit_stats.
Fri, Jul 23, 3:58 PM
olasd committed rDSCH87e66faa300d: Only record last_visited and last_successful in origin_visit_stats (authored by olasd).
Only record last_visited and last_successful in origin_visit_stats
Fri, Jul 23, 3:58 PM
olasd accepted D6017: test: Refactor assert_visit_stats_ok to ignore_fields.
Fri, Jul 23, 3:56 PM
olasd requested review of D6019: Add an extid_version field to ExtIDs.
Fri, Jul 23, 12:16 PM
olasd added a revision to T3418: Decide a consistent policy on having multiple archived objects for the same extid: D6019: Add an extid_version field to ExtIDs.
Fri, Jul 23, 12:14 PM · Storage manager, Mercurial loader
olasd requested review of D6018: Only record last_visited and last_successful in origin_visit_stats.
Fri, Jul 23, 12:00 PM
olasd closed D6016: Always flush kafka producers in tests.
Fri, Jul 23, 10:32 AM
olasd committed rDSEA122d7caf6577: Always flush kafka producers in tests (authored by olasd).
Always flush kafka producers in tests
Fri, Jul 23, 10:32 AM
olasd updated the diff for D6016: Always flush kafka producers in tests.

Rebase

Fri, Jul 23, 10:14 AM

Thu, Jul 22

olasd closed D5956: Introduce new scheduling policy to grab origins without last update.
Thu, Jul 22, 2:22 PM
olasd committed rDSCHd58776ab0b41: Introduce new scheduling policy to grab origins without last update (authored by ardumont).
Introduce new scheduling policy to grab origins without last update
Thu, Jul 22, 2:22 PM
olasd committed rDSCH825e8cfe7d24: grab_next_visits: make the handling of CTEs more modular (authored by olasd).
grab_next_visits: make the handling of CTEs more modular
Thu, Jul 22, 2:22 PM
olasd accepted D5917: journal_client: Only check last_* fields for some permutation tests.

Ack (with the view that this will be refactored away in a few diffs' time)

Thu, Jul 22, 12:40 PM
olasd updated the diff for D5956: Introduce new scheduling policy to grab origins without last update.

make the handling of CTEs more modular

Thu, Jul 22, 12:26 PM
olasd commandeered D5956: Introduce new scheduling policy to grab origins without last update.
Thu, Jul 22, 12:26 PM
olasd accepted D5950: journal_client: Compute next position for origin visit.
Thu, Jul 22, 11:46 AM
olasd accepted D5919: Start handling of recurrent loading tasks in scheduler.
Thu, Jul 22, 11:29 AM · Scheduling utilities

Wed, Jul 21

olasd changed the status of T3433: Deploy swh.search v0.10 on staging from Open to Work in Progress.
Wed, Jul 21, 5:28 PM · System administration, Archive search
olasd added a comment to T3433: Deploy swh.search v0.10 on staging.

We've done the following:

Wed, Jul 21, 5:28 PM · System administration, Archive search
olasd committed rSPMW8316b6d4bd67: Add BlockAndNuke extension to all mediawikis (authored by olasd).
Add BlockAndNuke extension to all mediawikis
Wed, Jul 21, 3:30 PM
olasd requested review of D6016: Always flush kafka producers in tests.
Wed, Jul 21, 3:18 PM
olasd closed D6014: journal_client: use origin_visit_status.type instead of origin_visit.
Wed, Jul 21, 3:12 PM
olasd committed rDSEAd58705a0eb8e: journal_client: use origin_visit_status.type instead of origin_visit (authored by olasd).
journal_client: use origin_visit_status.type instead of origin_visit
Wed, Jul 21, 3:12 PM
olasd requested review of D6014: journal_client: use origin_visit_status.type instead of origin_visit.
Wed, Jul 21, 3:07 PM
olasd closed D6012: Store snapshot_ids as hexadecimal strings, not bytes.
Wed, Jul 21, 2:23 PM
olasd committed rDSEAffad9b1f31d4: Store snapshot_ids as hexadecimal strings, not bytes (authored by olasd).
Store snapshot_ids as hexadecimal strings, not bytes
Wed, Jul 21, 2:23 PM
olasd closed D6011: Add more varied tests for journal_client.fetch_last_revision_release_date.
Wed, Jul 21, 2:23 PM
olasd committed rDSEA0fee7bb45f78: Add more varied tests for journal_client.fetch_last_revision_release_date (authored by olasd).
Add more varied tests for journal_client.fetch_last_revision_release_date
Wed, Jul 21, 2:23 PM
olasd requested review of D6012: Store snapshot_ids as hexadecimal strings, not bytes.
Wed, Jul 21, 1:29 PM
olasd requested review of D6011: Add more varied tests for journal_client.fetch_last_revision_release_date.
Wed, Jul 21, 1:20 PM
olasd committed rDSEA32cb812a42f3: Add logging for search terms in debug mode (authored by olasd).
Add logging for search terms in debug mode
Wed, Jul 21, 1:13 PM
olasd committed rSPSITE5c1f2a6d9af7: Both swh.search journal clients need a storage config now (authored by olasd).
Both swh.search journal clients need a storage config now
Wed, Jul 21, 11:50 AM
olasd committed rSPSITEca257509e78a: Update swh.search to v0.10.0 in staging (authored by olasd).
Update swh.search to v0.10.0 in staging
Wed, Jul 21, 11:45 AM

Tue, Jul 20

olasd triaged T3435: Provide web API endpoints for object lookups by ExtID as Wishlist priority.
Tue, Jul 20, 12:04 PM · Web app
olasd added a comment to T3104: Persistent readonly perfect hash table.

In the global read index, I would consider storing, for each object, alongside the shard id, the length and offset of the object (which are comparatively cheap to store). This way, the per-shard index only gets used for standalone operation, which would (probably?) be an edge case. As I don't really grasp the specifics of the hashing algorithm, I have a hard time understanding how the actual performance will look like. log2(number objects in bucket) can still be a fairly small number of reads compared to 2 reads and a potentially expensive hashing algorithm?

Tue, Jul 20, 10:43 AM · Object storage

Mon, Jul 19

olasd added a comment to T3104: Persistent readonly perfect hash table.

With the available explanations I don't understand how this format works.

Mon, Jul 19, 3:05 PM · Object storage

Tue, Jul 13

olasd updated the task description for T3408: Provide read-only access to production servers.
Tue, Jul 13, 4:19 PM · System administration
olasd added a comment to T3408: Provide read-only access to production servers.

And swh-graph too, please

Tue, Jul 13, 4:16 PM · System administration
olasd accepted D5989: pattern: Use URL network location as instance name when not provided.

Thanks!

Tue, Jul 13, 1:09 PM

Mon, Jul 12

olasd added a comment to T3311: Use .gitmodules to discover origins.

I also wonder if we have a somewhat common approach to handle the SVN externals as well.

Mon, Jul 12, 3:48 PM · Archive coverage, Git loader
olasd added a comment to T3311: Use .gitmodules to discover origins.

I think this is worthwhile in general, at least for repositories that are still live.

Mon, Jul 12, 3:47 PM · Archive coverage, Git loader
olasd accepted D5973: Reorganize code.
Mon, Jul 12, 9:35 AM
olasd accepted D5972: Improve `ProvenanceStorageInterface` for testing purposes.
Mon, Jul 12, 9:34 AM

Fri, Jul 9

olasd changed the status of T3403: Use forge URL network location as default lister instance name, a subtask of T3127: Compute and display distribution of origins by forge, from Open to Work in Progress.
Fri, Jul 9, 3:37 PM · Metrics/monitoring, Web app, Roadmap 2021, meta-task
olasd changed the status of T3403: Use forge URL network location as default lister instance name from Open to Work in Progress.

I've updated the listers with no credentials:

Fri, Jul 9, 3:37 PM · Scheduling utilities, Lister
olasd added a comment to D5977: Make PyPI lister incremental and complete in regards to last_update.
Fri, Jul 9, 12:11 PM · Scheduling utilities, Origin-Pypi
olasd accepted D5977: Make PyPI lister incremental and complete in regards to last_update.

Looks good, thanks a lot!

Fri, Jul 9, 11:57 AM · Scheduling utilities, Origin-Pypi

Thu, Jul 8

olasd committed R245:e9fda5cbb6d7: First mob programming iteration (authored by olasd).
First mob programming iteration
Thu, Jul 8, 2:13 PM
olasd committed R245:7a68aefdfaf7: Update template to latest version (authored by olasd).
Update template to latest version
Thu, Jul 8, 12:29 PM
olasd committed R245:892e6270f784: Initialize template (authored by olasd).
Initialize template
Thu, Jul 8, 12:26 PM

Wed, Jul 7

olasd added a comment to D5977: Make PyPI lister incremental and complete in regards to last_update.
In D5977#153851, @olasd wrote:

Looking at the size of the changelog (2-ish million entries for 50k-ish pages means 50-ish requests), I /think/ the lister could always be running in incremental mode, rather than having to maintain two modes in the long run.

Wed, Jul 7, 6:04 PM · Scheduling utilities, Origin-Pypi
olasd added a comment to D5977: Make PyPI lister incremental and complete in regards to last_update.

From my quick testing, I have noticed that the changelog methods only returns a limited number of results, so we would need to iterate calls.

Wed, Jul 7, 6:02 PM · Scheduling utilities, Origin-Pypi

Mon, Jul 5

olasd accepted D5936: Add an endpoint to list metadata authorities for a given SWHID.

After clarification of the rationale on IRC, LGTM (with some marginal changes), thanks.

Mon, Jul 5, 4:40 PM

Wed, Jun 30

olasd added a comment to T3418: Decide a consistent policy on having multiple archived objects for the same extid.

The "mapping version field" is the most fleshed out proposal as it would be my preference. My rationale for it against changing extid_type for backwards incompatible changes is that the extid_type is a property of the external artifact, while the mapping version is a property of our archiving infrastructure.

Wed, Jun 30, 6:55 PM · Storage manager, Mercurial loader
olasd triaged T3418: Decide a consistent policy on having multiple archived objects for the same extid as Unbreak Now! priority.
Wed, Jun 30, 6:49 PM · Storage manager, Mercurial loader

Tue, Jun 29

olasd created P1084 extids with multiple swhids.
Tue, Jun 29, 11:24 PM
olasd accepted D5945: Fix tests when the umask is not 022.
Tue, Jun 29, 1:06 PM

Jun 23 2021

olasd added a comment to T3127: Compute and display distribution of origins by forge.

As @olasd said in a previous comment, even if we compute the metrics, we will miss counters about origins not tight to a lister
(googlecode and gitorious for instance). So I am thinking again about an hybrid approach using the swh-counters metrics
implemented yersteday which gives a rough estimation on the number of origins by network location (as visit statuses are not
processed, only origins) and the scheduler metrics.

Jun 23 2021, 9:16 PM · Metrics/monitoring, Web app, Roadmap 2021, meta-task
olasd accepted D5913: jobs/swh-packages: Add swh-py-template build jobs.

Wow, I'm surprised that the templated garbage in setup.py still yields a working package :-)

Jun 23 2021, 2:02 PM
olasd added a comment to T3127: Compute and display distribution of origins by forge.

Sorry @anlambert, I was late at Monday's meeting and I completely missed this in your weekly plan, I would have pointed this out earlier.

Jun 23 2021, 12:04 PM · Metrics/monitoring, Web app, Roadmap 2021, meta-task
olasd closed D5903: Add a (longer) specific cooldown for failed origin visits.
Jun 23 2021, 11:29 AM
olasd closed D5904: Add a specific cooldown for notfound origins.
Jun 23 2021, 11:29 AM
olasd committed rDSCH651ddcc6cec8: Add a (longer) specific cooldown for failed origin visits (authored by olasd).
Add a (longer) specific cooldown for failed origin visits
Jun 23 2021, 11:29 AM
olasd committed rDSCHed818702c49c: Add a specific cooldown for notfound origins (authored by olasd).
Add a specific cooldown for notfound origins
Jun 23 2021, 11:29 AM
olasd closed D5901: Make the origin visit scheduling cooldown configurable.
Jun 23 2021, 11:29 AM
olasd committed rDSCHce8608d1f888: Make the origin visit scheduling cooldown configurable (authored by olasd).
Make the origin visit scheduling cooldown configurable
Jun 23 2021, 11:29 AM
olasd updated the diff for D5904: Add a specific cooldown for notfound origins.

rebase

Jun 23 2021, 11:14 AM
olasd updated the diff for D5903: Add a (longer) specific cooldown for failed origin visits.

rebase

Jun 23 2021, 11:14 AM
olasd updated the diff for D5901: Make the origin visit scheduling cooldown configurable.

rebase

Jun 23 2021, 11:14 AM

Jun 21 2021

olasd requested review of D5904: Add a specific cooldown for notfound origins.
Jun 21 2021, 6:06 PM
olasd updated the diff for D5903: Add a (longer) specific cooldown for failed origin visits.

Use parametrization to test the logic of multiple cooldowns

Jun 21 2021, 6:00 PM
olasd added inline comments to D5903: Add a (longer) specific cooldown for failed origin visits.
Jun 21 2021, 5:59 PM
olasd requested review of D5903: Add a (longer) specific cooldown for failed origin visits.
Jun 21 2021, 5:41 PM
olasd updated the diff for D5901: Make the origin visit scheduling cooldown configurable.

Only vary the backoff test timestamp by 1 second

Jun 21 2021, 5:36 PM
olasd accepted D5899: swh-model: get SWHID from Content/Directory objects in from_disk.

Nice, thanks.

Jun 21 2021, 5:09 PM
olasd updated the diff for D5901: Make the origin visit scheduling cooldown configurable.

Update backoff checking logic to vary the grab timestamp instead of the configured backoff

Jun 21 2021, 5:07 PM
olasd requested review of D5901: Make the origin visit scheduling cooldown configurable.
Jun 21 2021, 4:49 PM
olasd closed D5900: Drop duplicate docstring from backend.
Jun 21 2021, 4:35 PM
olasd committed rDSCH9e1b4145fe17: Drop duplicate docstring from backend (authored by olasd).
Drop duplicate docstring from backend
Jun 21 2021, 4:35 PM
olasd requested review of D5900: Drop duplicate docstring from backend.
Jun 21 2021, 4:05 PM
olasd added a revision to T2444: Implement the scheduling policy for the recurrent visit scheduler: D5809: Direct scheduling of origin visits in celery.
Jun 21 2021, 4:04 PM · Sprint 2021 01, Scheduling utilities
olasd closed T2454: Stop creating tasks directly in listers as Resolved.

This has all landed during the early 2021 sprint.

Jun 21 2021, 4:03 PM · Lister
olasd closed T2453: Use the scheduler "lister api" in listers as Resolved.

This has all been landed during the sprint at the beginning of 2021.

Jun 21 2021, 4:03 PM · Lister
olasd closed T2453: Use the scheduler "lister api" in listers, a subtask of T2454: Stop creating tasks directly in listers, as Resolved.
Jun 21 2021, 4:03 PM · Lister
olasd triaged T3399: Improve PyPI lister to pull last update information when running incrementally as Normal priority.
Jun 21 2021, 2:48 PM · Origin-Pypi, Lister
olasd added a comment to T2345: Improve handling of recurrent loading tasks in scheduler.

Summary of the data available in the listed_origins table, broken down by lister and "known state" of origins:

Jun 21 2021, 2:27 PM · Sprint 2021 01, Archive coverage, Scheduling utilities
olasd closed T3336: Deploy swh.loader.mercurial 2.1 in staging, a subtask of T3337: Deploy swh.loader.mercurial 1.0 in production, as Resolved.
Jun 21 2021, 12:06 PM · System administration, Mercurial loader
olasd closed T3336: Deploy swh.loader.mercurial 2.1 in staging as Resolved.

Now that the branch structure has landed, I've deployed this latest version. After some cleanup of the duplicate extids left over from an earlier deployment, everything seems to be fine and the loader is ready for production.

Jun 21 2021, 12:06 PM · System administration, Mercurial loader
olasd renamed T3336: Deploy swh.loader.mercurial 2.1 in staging from Deploy swh.loader.mercurial 1.1 in staging to Deploy swh.loader.mercurial 2.1 in staging.
Jun 21 2021, 12:05 PM · System administration, Mercurial loader
olasd closed T3341: Move real-time discussion away from Freenode as Resolved.

The migration is complete. All staff have joined the new channels and most have registered an account on libera.chat.

Jun 21 2021, 12:02 PM · Community Building