Page MenuHomeSoftware Heritage

olasd (Nicolas Dandrimont)
UserAdministrator

Projects (8)

User Details

User Since
Sep 7 2015, 3:25 PM (285 w, 3 d)
Roles
Administrator

Recent Activity

Today

olasd closed D5144: Handle BufferErrors in writer.kafka.Producer.
Thu, Feb 25, 6:48 PM
olasd committed rDJNL458a405afb01: test_kafka_writer: remove useless fixtures (authored by olasd).
test_kafka_writer: remove useless fixtures
Thu, Feb 25, 6:48 PM
olasd committed rDJNL6001a28f7d81: Handle BufferErrors in writer.kafka.Producer (authored by olasd).
Handle BufferErrors in writer.kafka.Producer
Thu, Feb 25, 6:48 PM
olasd closed D5148: Hardcode the use of the tcp transport for GitHub origins.
Thu, Feb 25, 6:46 PM
olasd committed rDLDG342f8fde2560: Hardcode the use of the tcp transport for GitHub origins (authored by olasd).
Hardcode the use of the tcp transport for GitHub origins
Thu, Feb 25, 6:46 PM
olasd committed rDLDG61afbc56b035: Stop processing packfiles before sending objects (authored by olasd).
Stop processing packfiles before sending objects
Thu, Feb 25, 6:46 PM
olasd closed D5147: Stop processing packfiles before sending objects.
Thu, Feb 25, 6:46 PM
olasd committed rDLDG5e434d6f6e1c: Drop unused get_fetch_history_result methods (authored by olasd).
Drop unused get_fetch_history_result methods
Thu, Feb 25, 6:46 PM
olasd updated the diff for D5144: Handle BufferErrors in writer.kafka.Producer.

rebase

Thu, Feb 25, 6:43 PM
olasd updated the diff for D5148: Hardcode the use of the tcp transport for GitHub origins.

add inline comment suggested by @vlorentz

Thu, Feb 25, 6:40 PM
olasd added inline comments to D5147: Stop processing packfiles before sending objects.
Thu, Feb 25, 5:55 PM
olasd requested review of D5148: Hardcode the use of the tcp transport for GitHub origins.
Thu, Feb 25, 5:54 PM
olasd requested review of D5147: Stop processing packfiles before sending objects.
Thu, Feb 25, 5:52 PM
olasd requested review of D5144: Handle BufferErrors in writer.kafka.Producer.
Thu, Feb 25, 3:14 PM
olasd added a revision to T3000: Investigate scheduler journal client discrepancies: D5144: Handle BufferErrors in writer.kafka.Producer.
Thu, Feb 25, 3:11 PM · Scheduling utilities

Yesterday

olasd added a comment to T3000: Investigate scheduler journal client discrepancies.

After the full backfill:

Wed, Feb 24, 12:00 PM · Scheduling utilities
olasd added a comment to T3000: Investigate scheduler journal client discrepancies.

The backfill is now complete, and the scheduler journal client has processed its backlog of messages.

Wed, Feb 24, 9:58 AM · Scheduling utilities

Tue, Feb 23

olasd committed rSPSITE127322a80177: Create a clearly-defined-full database on db1.staging (authored by olasd).
Create a clearly-defined-full database on db1.staging
Tue, Feb 23, 4:47 PM
olasd committed rSPSITE72dc2030c49d: Use `password` instead of `passwd` as dict key for pgpass generation (authored by olasd).
Use `password` instead of `passwd` as dict key for pgpass generation
Tue, Feb 23, 4:47 PM
olasd closed D5130: Create a clearly-defined-full database on db1.staging.
Tue, Feb 23, 4:47 PM
olasd committed rSPSITEa3052e46e432: Give olasd .pgpass and .pg_service.conf on clearly-defined (authored by olasd).
Give olasd .pgpass and .pg_service.conf on clearly-defined
Tue, Feb 23, 4:47 PM
olasd committed rSPSITEa198d66632e5: postgresql server: allow overriding the db password (authored by olasd).
postgresql server: allow overriding the db password
Tue, Feb 23, 4:47 PM
olasd requested review of D5130: Create a clearly-defined-full database on db1.staging.
Tue, Feb 23, 4:15 PM
olasd added a comment to D5127: Fix qualifier parsing and add tests.
In D5127#129703, @olasd wrote:

Should we also add some form of (un)escaping of origin qualifiers?

Only escaping

Tue, Feb 23, 12:18 PM
olasd added a comment to D5127: Fix qualifier parsing and add tests.

Rather than the "it was completely broken" comment (which it wasn't really, afaict), could you say what really happened in this commit? (afaict, you've added the unescaping of path qualifiers, and you've fixed the serialization of lines qualifiers).

Tue, Feb 23, 11:46 AM
olasd accepted D5121: Deduplicate parsing/unparsing tests of the new SWHID classes.

Very nice.

Tue, Feb 23, 11:41 AM
olasd accepted D5116: QualifiedSWHID: Replace the 'qualifiers' dict with statically defined attributes.

via D5127

Tue, Feb 23, 11:31 AM
olasd added inline comments to D5118: Add new class ExtendedSWHID as an alternative to SWHID/QualifiedSWHID.
Tue, Feb 23, 11:29 AM
olasd accepted D5120: Deduplicate code between CoreSWHID, QualifiedSWHID, and ExtendedSWHID.

I guess mypy properly checks that the type of the <SWHIDClass>.object_type attribute matches the type parameter of the _BaseSWHID base class, and we don't need to add a test for that?

Tue, Feb 23, 11:27 AM
olasd accepted D5118: Add new class ExtendedSWHID as an alternative to SWHID/QualifiedSWHID.

This looks fine.

Tue, Feb 23, 11:16 AM
olasd added a comment to D5116: QualifiedSWHID: Replace the 'qualifiers' dict with statically defined attributes.

Ah, one more change I think of when looking at D5118: please add a test to check that all values of the SWHID_QUALIFIERS list correspond to an attribute of QualifiedSWHID

Tue, Feb 23, 11:11 AM
olasd accepted D5117: Use dict instead of temporary SWHID when parsing {Core,Qualified}SWHID..

Thanks!

Tue, Feb 23, 10:59 AM
olasd closed T3063: Servers using the public swh network gateway can't reach inria's ntp servers as Resolved.

After checking with @ardumont I turned cron back on on worker[0-2].staging, and ran puppet on them.

Tue, Feb 23, 9:56 AM · System administration
olasd requested changes to D5116: QualifiedSWHID: Replace the 'qualifiers' dict with statically defined attributes.

Looks good overall but a small issue has been found.

Tue, Feb 23, 9:50 AM

Mon, Feb 22

olasd added a comment to T3000: Investigate scheduler journal client discrepancies.

I had an immediate resurgence of the issue when restarting the backfill of the missing chunks, so I hot-patched a retry on BufferErrors in swh.journal.writer.kafka. Glancing at sentry, we never had this issue in prod. The hot patch seems to work; I'll write tests on that and submit it as a diff tomorrow.

Mon, Feb 22, 8:26 PM · Scheduling utilities
olasd added a comment to T3000: Investigate scheduler journal client discrepancies.

After landing rDSCHecab745a5f20, we got partway there, but we still had only 98m origins in the visit stats table.

Mon, Feb 22, 7:30 PM · Scheduling utilities
olasd added a comment to T3063: Servers using the public swh network gateway can't reach inria's ntp servers.

Looks like all servers (in icinga) have recovered their time sync, except for worker{0,1,2} in staging (on which cron seems to be off?)

Mon, Feb 22, 6:59 PM · System administration
olasd added a comment to T3063: Servers using the public swh network gateway can't reach inria's ntp servers.

(yes, I cheated and Just Pushed the diff to get the changes to show up here...)

Mon, Feb 22, 6:23 PM · System administration
olasd committed rSPSITE6c22456c231b: Add an icinga check for ntp synchronization on all linux hosts (authored by olasd).
Add an icinga check for ntp synchronization on all linux hosts
Mon, Feb 22, 6:19 PM
olasd committed rSPSITE538e422ff5cf: Add the public debian ntp pool to the sesi_rocquencourt subnet (authored by olasd).
Add the public debian ntp pool to the sesi_rocquencourt subnet
Mon, Feb 22, 6:19 PM
olasd closed D5126: Improve NTP handling:.
Mon, Feb 22, 6:19 PM
olasd committed rSPSITE266a7a5798c2: Drop specific NTP configuration on subnets behind the new firewall (authored by olasd).
Drop specific NTP configuration on subnets behind the new firewall
Mon, Feb 22, 6:19 PM
olasd requested review of D5126: Improve NTP handling:.
Mon, Feb 22, 6:17 PM
olasd added a revision to T3063: Servers using the public swh network gateway can't reach inria's ntp servers: D5126: Improve NTP handling:.
Mon, Feb 22, 6:17 PM · System administration
olasd added a comment to T3063: Servers using the public swh network gateway can't reach inria's ntp servers.

I've added a NTP dashboard to grafana from the data already collected by the prometheus node exporter.

Mon, Feb 22, 5:53 PM · System administration
olasd added a comment to T3063: Servers using the public swh network gateway can't reach inria's ntp servers.

In general I think we should migrate machines to the default ntp pool of our distributor ([0-3].debian.pool.ntp.org).

Mon, Feb 22, 4:54 PM · System administration

Wed, Feb 17

olasd added a comment to T3052: Reducing Ceph bluestore_min_alloc_size from 64K to 4K.

If the size of the object was known to the reader of the object store it would be a great way to develop storage strategies depending on the object size. So far I assumed the reader does not have that information and is therefore unable to figure out which object storage to use based on that information but maybe I missed something?

Wed, Feb 17, 1:18 PM · Object storage
olasd accepted D5088: core.loader: Merge Loader into BaseLoader.

We need to make sure that other loaders really don't use the *args, **kwargs, but yeah, that looks fine.

Wed, Feb 17, 9:40 AM

Tue, Feb 16

olasd added a comment to T3052: Reducing Ceph bluestore_min_alloc_size from 64K to 4K.

Maybe it would make sense to consider putting the very small objects (e.g. those <= the min alloc size) into a 3 or 4-way mirrored pool instead of an erasure coded pool;

Tue, Feb 16, 10:40 PM · Object storage
olasd added a comment to T3054: Scale out object storage design.

@zack, very good point about having a target for the "time to first byte when reading an object".

Tue, Feb 16, 10:24 PM · Object storage
olasd added a comment to T3054: Scale out object storage design.

Here's the output of the following query, which computes exact aggregates for objects smaller than the size boundaries of the original quartiles:

Tue, Feb 16, 9:46 PM · Object storage
olasd accepted D5071: Unify loader instantiation.

I wonder what would break if the new methods were just put in BaseLoader and the PackageLoader was made to inherit BaseLoader

I don't think anything would break. I'm just not sure the "indirection" would be clear in terms of code readability...

I was wrong. mypy is not happy.

Some signatures would need changing, notably the load, prepare, prepare_origin_visit to drop the spurious (i think) {*args, **kwargs} we are declaring.
(I don't think they are still used any more now, aside for the extra logging arguments)

But still, i'd be confortable if we go that way to do it in another diff (maybe).

Tue, Feb 16, 6:57 PM · Core Loader
olasd added a comment to D5071: Unify loader instantiation.

@ardumont points out that the base PackageLoader doesn't inherit from BaseLoader, which explains the new (common) base class. I think the new class could just as well be next to BaseLoader, and doesn't warrant the introduction of a pattern module.

Tue, Feb 16, 6:07 PM · Core Loader
olasd added a comment to D5071: Unify loader instantiation.

This is great, thanks!

Tue, Feb 16, 5:33 PM · Core Loader
olasd closed D5085: Switch anonymized replayer test to use pytest parametrization.
Tue, Feb 16, 5:17 PM
olasd committed rDSTOcc3eb4b175b4: Switch anonymized replayer test to use pytest parametrization (authored by olasd).
Switch anonymized replayer test to use pytest parametrization
Tue, Feb 16, 5:17 PM
olasd requested review of D5085: Switch anonymized replayer test to use pytest parametrization.
Tue, Feb 16, 5:16 PM

Mon, Feb 15

olasd accepted D4762: Add ORC exporter.
Mon, Feb 15, 5:44 PM

Fri, Feb 12

olasd closed D5068: Make origin_visit_stats_get return results from all pages.
Fri, Feb 12, 3:32 PM
olasd committed rDSCHecab745a5f20: Make origin_visit_stats_get return results from all pages (authored by olasd).
Make origin_visit_stats_get return results from all pages
Fri, Feb 12, 3:32 PM
olasd closed D5067: journal client: Filter out status messages without type.
Fri, Feb 12, 3:32 PM
olasd closed D5066: Simplify max_date().
Fri, Feb 12, 3:31 PM
olasd committed rDSCH86ada443d290: journal client: Filter out status messages without type (authored by olasd).
journal client: Filter out status messages without type
Fri, Feb 12, 3:31 PM
olasd committed rDSCHcdb1775f2474: Simplify max_date() (authored by olasd).
Simplify max_date()
Fri, Feb 12, 3:31 PM
olasd updated subscribers of T3034: generalize usage of SWHID for referencing SWH archive objects.

So, there's still a few separate issues in this task which I'll try to spell out (at least for my sake) :

Fri, Feb 12, 1:40 PM · Data Model

Thu, Feb 11

olasd requested review of D5068: Make origin_visit_stats_get return results from all pages.
Thu, Feb 11, 7:44 PM
olasd requested review of D5067: journal client: Filter out status messages without type.
Thu, Feb 11, 7:41 PM
olasd added a revision to T3000: Investigate scheduler journal client discrepancies: D5068: Make origin_visit_stats_get return results from all pages.
Thu, Feb 11, 7:40 PM · Scheduling utilities
olasd requested review of D5066: Simplify max_date().
Thu, Feb 11, 7:27 PM

Tue, Feb 9

olasd accepted D5045: test_replay: Fix hang since confluent-kafka 1.6 release.

Perfect, thank you!

Tue, Feb 9, 4:01 PM
olasd accepted D5048: pytest_plugin: Prevent possible hang in consumer fixture destruction.
Tue, Feb 9, 3:22 PM

Mon, Feb 8

olasd added a comment to T3000: Investigate scheduler journal client discrepancies.

I'm completely stumped.

Mon, Feb 8, 7:57 PM · Scheduling utilities
olasd requested changes to D5008: identifiers: Properly define the behavior of raw_extrinsic_metadata on negative timestamps..
Mon, Feb 8, 4:06 PM
olasd accepted D5038: model: Remove override of RawExtrinsicMetadata.unique_key(), so it now returns the hash..
Mon, Feb 8, 3:58 PM
olasd added a comment to D5009: storage_tests: recompute ids when evolving RawExtrinsicMetadata objects..

Wouldn't it make sense to add an evolved(**kwargs) method to our BaseModel objects (or an evolve function in swh.model.model) to wrap this operation?

Mon, Feb 8, 2:30 PM
olasd accepted D5008: identifiers: Properly define the behavior of raw_extrinsic_metadata on negative timestamps..

I still find slightly unfortunate that this logic is separate from the one used for revisions/releases, but I guess it makes sense as the "input types" are so dissimilar.

Mon, Feb 8, 2:25 PM
olasd added a comment to T2345: Improve handling of recurrent loading tasks in scheduler.

Here's my understanding of the status of the migration to the next generation scheduler as of today:

Mon, Feb 8, 12:01 PM · Sprint 2021 01, Archive coverage, Scheduling utilities

Thu, Feb 4

olasd closed D5017: buffer: ensure objects are added in topological order.
Thu, Feb 4, 7:32 PM
olasd committed rDSTOefd8815b8908: buffer: ensure objects are flushed in topological order (authored by olasd).
buffer: ensure objects are flushed in topological order
Thu, Feb 4, 7:32 PM
olasd closed D5018: Return an accurate summary from buffer's flush() method.
Thu, Feb 4, 7:32 PM
olasd closed D5015: buffer: add support for snapshots.
Thu, Feb 4, 7:32 PM
olasd committed rDSTO1526107b12eb: Return an accurate summary from buffer's flush() method (authored by olasd).
Return an accurate summary from buffer's flush() method
Thu, Feb 4, 7:32 PM
olasd committed rDSTO5b3e6c9f70f9: buffer: add support for snapshots (authored by olasd).
buffer: add support for snapshots
Thu, Feb 4, 7:32 PM
olasd closed D5013: buffer: add type annotations for tests.
Thu, Feb 4, 7:32 PM
olasd committed rDSTO18967ed4a5f2: buffer: add type annotations for tests (authored by olasd).
buffer: add type annotations for tests
Thu, Feb 4, 7:32 PM
olasd updated the diff for D5017: buffer: ensure objects are added in topological order.

Flip with D5018; Flush in the right order

Thu, Feb 4, 7:19 PM
olasd updated the diff for D5018: Return an accurate summary from buffer's flush() method.

Rebase and factor out the summary update function

Thu, Feb 4, 7:18 PM
olasd added inline comments to D5017: buffer: ensure objects are added in topological order.
Thu, Feb 4, 7:02 PM
olasd added a comment to T3012: Check all objects in the production storage/journal have a correct hash.

This is a duplicate of T75, the history of which would probably be useful to take into account (I suspect it can be closed).

Thu, Feb 4, 6:11 PM · Journal, Storage manager
olasd requested review of D5018: Return an accurate summary from buffer's flush() method.
Thu, Feb 4, 3:31 PM
olasd requested review of D5017: buffer: ensure objects are added in topological order.
Thu, Feb 4, 3:22 PM
olasd requested review of D5015: buffer: add support for snapshots.
Thu, Feb 4, 2:46 PM
olasd requested review of D5013: buffer: add type annotations for tests.
Thu, Feb 4, 2:40 PM

Wed, Feb 3

olasd closed D5003: celery: acknowledge tasks as soon as they're received.
Wed, Feb 3, 10:52 PM
olasd committed rDSCH14feab952380: celery: acknowledge tasks as soon as they're received (authored by olasd).
celery: acknowledge tasks as soon as they're received
Wed, Feb 3, 10:52 PM
olasd committed rSPSITEa792c0b15d56: Don't use the systemd backport on saam, prevents proper boot (authored by olasd).
Don't use the systemd backport on saam, prevents proper boot
Wed, Feb 3, 10:25 PM
olasd added a comment to D5003: celery: acknowledge tasks as soon as they're received.

So I've hotpatched this in production, and it seems to have cleared the current blockage, which is nice.

Wed, Feb 3, 10:16 PM
olasd updated subscribers of T3025: git loaders are getting oom-killed repeatedly in prod.

After mulling this over with @zack, and looking at the starved worker logs for a while, I suspect that we're also being bitten by our (early, early) choice of using celery acks_late, which only acknowledges tasks when they're done: when a worker is OOM-killed, it will never send task acknowledgements to rabbitmq, which will keep re-sending it the tasks.

Wed, Feb 3, 8:16 PM · Git loader, System administration
olasd requested review of D5003: celery: acknowledge tasks as soon as they're received.
Wed, Feb 3, 8:14 PM
olasd added a revision to T3025: git loaders are getting oom-killed repeatedly in prod: D5003: celery: acknowledge tasks as soon as they're received.
Wed, Feb 3, 8:11 PM · Git loader, System administration