Page MenuHomeSoftware Heritage
Feed Advanced Search

Sep 24 2021

douardda requested review of D6336: Naive attempt to add support for dsn url config style for production db.
Sep 24 2021, 12:18 PM
douardda requested review of D6335: Wrap long lines in the README file.
Sep 24 2021, 12:18 PM
douardda accepted D6330: Deprecate identifiers.py.

fine for me

Sep 24 2021, 9:28 AM
douardda accepted D6332: Move SWHID-related tests to test_swhids.py.

lgtm

Sep 24 2021, 9:26 AM
douardda accepted D6333: Add module-level docstrings..

thx a lot

Sep 24 2021, 9:25 AM

Sep 23 2021

douardda accepted D6322: Add bazaar as supported revision type.

LGTM

Sep 23 2021, 10:17 AM

Sep 22 2021

douardda created P1173 (An Untitled Masterwork).
Sep 22 2021, 6:15 PM
douardda added a comment to D6316: opam: Share opam root directory even on multiple instances.

You may use fcntl.flock for this

I mean using an empty (lock) file in the opam_root directory.

Sep 22 2021, 2:14 PM
douardda added a comment to D6316: opam: Share opam root directory even on multiple instances.

You may use fcntl.flock for this

Sep 22 2021, 2:12 PM
douardda added a comment to D6316: opam: Share opam root directory even on multiple instances.

Following what i said in the loader diff, i'm actually closing this.
Ack on the lock folder but i won't attend to it immediately.

[1] D6318

As i was wrong in my implementation of the loader implementation and @aleo made me realize, i've fixed it.
So now that lister diff becomes relevant again, so claimed it back.

I think there was already a problem before, but since we have now more chance to hit it, I'd really like the opam_init process to lock the directory when running opam commands.

It's a great idea but i've no idea how to actually do that though.

Maybe adding --safe flag [1] during the command that actually list the packages would be enough instead.
I've actually added that for the loader [2] (for the command that also read information)

[1]

--safe, --readonly
    Make sure nothing will be automatically updated or rewritten. Useful for calling from completion scripts, for example. Will fail whenever such an operation is needed ; also avoids waiting for locks, skips interactive
    questions and overrides the $OPAMDEBUG variable. This is equivalent to set environment variable $OPAMSAFE.

[2] D6318

Sep 22 2021, 2:11 PM
douardda accepted D6308: Add a documentation page to list the services urls.

LGTM (not checked everything is accurate nor there are obvious missing services, but it's a huge improvement as is, thx)

Sep 22 2021, 11:42 AM · System administration
douardda added a comment to T1805: Public API v2.

Items 5, 6, 7 aka pagination, auth and batches - I believe these come naturally with item 4 (specification wise)

They don't. OpenAPI is a specification to describe APIs, and it contains absolutely nothing about pagination or batches.

Sep 22 2021, 11:36 AM · meta-task, Web app
douardda added a comment to D6316: opam: Share opam root directory even on multiple instances.

I think there was already a problem before, but since we have now more chance to hit it, I'd really like the opam_init process to lock the directory when running opam commands.

Sep 22 2021, 11:16 AM
douardda added inline comments to D6133: maven-lister: initialise lister..
Sep 22 2021, 11:08 AM
douardda added a comment to D6133: maven-lister: initialise lister..

It would be nice to have a README fil in swh/lister/maven/tests/data explaining what the data files are, where they come from, how they have been generated, etc.

Sep 22 2021, 10:52 AM
douardda added a comment to D6165: Add new RabbitMQ-based client/server API.

What is the reason for this change? Is it more efficient assign requests to workers based on ID rather than randomly?

Sep 22 2021, 10:48 AM
douardda added inline comments to D6308: Add a documentation page to list the services urls.
Sep 22 2021, 10:43 AM · System administration
douardda accepted D6317: opam: Initialize opam root directory outside the constructor.
Sep 22 2021, 10:38 AM
douardda accepted D6300: Capture missing revision <-> hgnode-id scenario in a xfail test.
Sep 22 2021, 10:37 AM

Sep 21 2021

douardda added a comment to D6133: maven-lister: initialise lister..

some more :-)

Sep 21 2021, 11:38 AM
douardda added inline comments to D6133: maven-lister: initialise lister..
Sep 21 2021, 11:14 AM
douardda added inline comments to D6310: opam: Move the state initialization into the get_pages method.
Sep 21 2021, 11:04 AM

Sep 20 2021

douardda accepted D6306: opam: Allow defining where to actually install the opam_root folder.

LGTM, but how is the new opam_root option expected to be set (in production I mean)?

Sep 20 2021, 4:46 PM
douardda requested changes to D6133: maven-lister: initialise lister..

I'm not done yet but here is first review on my side.

Sep 20 2021, 4:33 PM
douardda closed T1510: Have a look at openAPI and decide whether we want to follow these specs, a subtask of T1805: Public API v2, as Resolved.
Sep 20 2021, 11:54 AM · meta-task, Web app
douardda closed T1510: Have a look at openAPI and decide whether we want to follow these specs as Resolved.
Sep 20 2021, 11:54 AM · Web app
douardda closed T2196: Batch APIs as Wontfix.

not useful as a dedicated task, see T1805 for the main discussion one on this subject

Sep 20 2021, 11:54 AM · Roadmap 2020
douardda closed T2196: Batch APIs, a subtask of T2194: Archive Integration (Web API), as Wontfix.
Sep 20 2021, 11:54 AM · Roadmap 2021, meta-task
douardda requested changes to D6300: Capture missing revision <-> hgnode-id scenario in a xfail test.

I don't understand what exactly is (not) tested here. What does "anomad-d" stand for BTW?

Sep 20 2021, 9:59 AM
douardda accepted D6220: Added test only method info in the interface doc strings.
Sep 20 2021, 9:49 AM

Sep 16 2021

douardda added inline comments to D6281: converters: Recompute hashes and check they match the originals.
Sep 16 2021, 5:26 PM
douardda committed rDENV57ad032071ff: docker: document some useful kafka management commands in the README file (authored by douardda).
docker: document some useful kafka management commands in the README file
Sep 16 2021, 4:15 PM
douardda closed D6277: Improve docker/README a bit.
Sep 16 2021, 4:15 PM
douardda committed rDENVe24535cc0064: docker: wrap long cli command lines in the README file (authored by douardda).
docker: wrap long cli command lines in the README file
Sep 16 2021, 4:15 PM
douardda updated the diff for D6277: Improve docker/README a bit.

fix indentation (tab->ws) and a few typos

Sep 16 2021, 4:10 PM
douardda requested review of D6277: Improve docker/README a bit.
Sep 16 2021, 11:00 AM

Sep 14 2021

douardda committed rDENVb0f07795ddff: docker: Document how to consume kafka topics from the host (authored by douardda).
docker: Document how to consume kafka topics from the host
Sep 14 2021, 11:40 AM
douardda closed D6248: docker: allow kafka to be consumed from the host.
Sep 14 2021, 11:40 AM
douardda committed rDENVf612427f663d: docker: allow kafka to be consumed from the host (authored by douardda).
docker: allow kafka to be consumed from the host
Sep 14 2021, 11:40 AM
douardda closed D6247: Commit kafka messages which offset has reach the high limit.

closed by 94be817f869409c64415b181824071d2998e33d5

Sep 14 2021, 11:38 AM
douardda closed D6246: Add a JournalClientOffsetRanges.unsubscribe() method.

closed by a3c1f39013bae1a6982140d51d8bb443dc1b5c9c

Sep 14 2021, 11:37 AM
douardda updated the diff for D6248: docker: allow kafka to be consumed from the host.

Keep port 5092 exposed on host

Sep 14 2021, 11:35 AM
douardda committed rDDATASET94be817f8694: Commit kafka messages which offset has reach the high limit (authored by douardda).
Commit kafka messages which offset has reach the high limit
Sep 14 2021, 11:23 AM
douardda committed rDDATASETa3c1f39013ba: Add a JournalClientOffsetRanges.unsubscribe() method (authored by douardda).
Add a JournalClientOffsetRanges.unsubscribe() method
Sep 14 2021, 11:22 AM
douardda added inline comments to D6248: docker: allow kafka to be consumed from the host.
Sep 14 2021, 11:21 AM

Sep 13 2021

douardda committed rDDATASET0425bdea0789: Fix a missing f-string prefix (authored by douardda).
Fix a missing f-string prefix
Sep 13 2021, 5:17 PM
douardda updated the diff for D6248: docker: allow kafka to be consumed from the host.

Add a bit of documentation in the README file on how to consume kafka from the host

Sep 13 2021, 5:13 PM
douardda requested review of D6248: docker: allow kafka to be consumed from the host.
Sep 13 2021, 4:51 PM
douardda abandoned D6234: Add a --reset option to export_graph cli tool.

It's not worth the trouble, and there is a better solution (server-side)

Sep 13 2021, 4:23 PM
douardda added a comment to D6234: Add a --reset option to export_graph cli tool.

You could also add a command in swh-dataset's entrypoint.sh that calls whatever Kafka's script does

Sep 13 2021, 4:20 PM
douardda added a comment to D6234: Add a --reset option to export_graph cli tool.

So either I kill this diff or it stays "intricate" with the setup of the consumer (so the whole journalprocessor.py)

Note: this feature is mainly useful for testing purpose IMHO, so I suppose it's not that critical to keep it, I just find it handy when "playing" with swh dataset export

Meh. How much easier does it make testing, compared to using Kafka's CLI (from the linked comment)?

Sep 13 2021, 4:11 PM
douardda updated the diff for D6234: Add a --reset option to export_graph cli tool.

rebase

Sep 13 2021, 4:05 PM
douardda requested review of D6247: Commit kafka messages which offset has reach the high limit.
Sep 13 2021, 4:04 PM
douardda abandoned D6235: Commit kafka messages wich offset has reach the high limit.

in favor of D6247 because phab/arcanist won't let me update this later any more (sorry)

Sep 13 2021, 4:04 PM
douardda requested review of D6246: Add a JournalClientOffsetRanges.unsubscribe() method.
Sep 13 2021, 4:02 PM
douardda committed rDDATASET358d84938d01: Reduce the size of the progress bar (authored by douardda).
Reduce the size of the progress bar
Sep 13 2021, 3:33 PM
douardda closed D6233: Make sure the progress bar for the export reaches 100%.
Sep 13 2021, 3:33 PM
douardda committed rDDATASET47713ee38c94: Make sure the progress bar for the export reaches 100% (authored by douardda).
Make sure the progress bar for the export reaches 100%
Sep 13 2021, 3:33 PM
douardda committed rDDATASET2760e322af7c: Simplify the lo/high partition offset computation (authored by douardda).
Simplify the lo/high partition offset computation
Sep 13 2021, 3:33 PM
douardda committed rDDATASETd07b2a632256: Explicitly close the temporary kafka consumer in `get_offsets` (authored by douardda).
Explicitly close the temporary kafka consumer in `get_offsets`
Sep 13 2021, 3:33 PM
douardda closed D6232: Simplify the lo/high partition offset computation.
Sep 13 2021, 3:33 PM
douardda committed rDDATASETe47a3db1287b: Use proper signature for JournalClientOffsetRanges.process() (authored by douardda).
Use proper signature for JournalClientOffsetRanges.process()
Sep 13 2021, 3:33 PM
douardda updated the diff for D6233: Make sure the progress bar for the export reaches 100%.

attempt to trick phab/arcanist

Sep 13 2021, 3:31 PM
douardda updated the diff for D6234: Add a --reset option to export_graph cli tool.

rebase

Sep 13 2021, 3:15 PM
douardda updated the diff for D6235: Commit kafka messages wich offset has reach the high limit.

Rebase (remove D6234 from dependencies)

Sep 13 2021, 3:14 PM
douardda added a comment to D6234: Add a --reset option to export_graph cli tool.

Can we keep the reset stuff outside the journalprocessor.py logic? It's already complex enough

I'll give it a try

Sep 13 2021, 2:59 PM

Sep 10 2021

douardda updated the diff for D6235: Commit kafka messages wich offset has reach the high limit.

rebase, fix typos, squash revisions

Sep 10 2021, 5:54 PM
douardda updated the diff for D6234: Add a --reset option to export_graph cli tool.

rebase and fix --reset help messsage

Sep 10 2021, 5:52 PM
douardda updated the diff for D6233: Make sure the progress bar for the export reaches 100%.

rebase

Sep 10 2021, 5:52 PM
douardda updated the diff for D6232: Simplify the lo/high partition offset computation.

Add an explicit "skipped" message if a nothin is to be consumed for a topic

Sep 10 2021, 5:51 PM
douardda added inline comments to D6235: Commit kafka messages wich offset has reach the high limit.
Sep 10 2021, 5:42 PM
douardda added a comment to D6234: Add a --reset option to export_graph cli tool.

Can we keep the reset stuff outside the journalprocessor.py logic? It's already complex enough

Sep 10 2021, 5:37 PM
douardda added a comment to D6235: Commit kafka messages wich offset has reach the high limit.

lags reported by cmak was completely inconsistent

only because you have a small dataset, right?
With a larger one, the last batch of each partition should have a negligeable size.

Sep 10 2021, 5:26 PM
douardda added a comment to D6235: Commit kafka messages wich offset has reach the high limit.

There's a bunch of typos in your commit/diff msg: "wich", "oef", "ony", "ALL offsets that needs to be", "stash" -> "squash"


this is necessary to ensure these messages are committed in kafka,
otherwise, since the (considered) empty partition is unsubscribed from,
it never gets committed in JournalClient.handle_messages() (since this
later only commit assigned partitions).

Why is this a problem?

Sep 10 2021, 4:35 PM

Sep 9 2021

douardda updated the summary of D6235: Commit kafka messages wich offset has reach the high limit.
Sep 9 2021, 6:01 PM
douardda requested review of D6235: Commit kafka messages wich offset has reach the high limit.
Sep 9 2021, 6:01 PM
douardda requested review of D6234: Add a --reset option to export_graph cli tool.
Sep 9 2021, 5:58 PM
douardda updated the diff for D6233: Make sure the progress bar for the export reaches 100%.

add forgotten revision: Reduce the size of the progress bar

Sep 9 2021, 5:56 PM
douardda requested review of D6233: Make sure the progress bar for the export reaches 100%.
Sep 9 2021, 5:56 PM
douardda requested review of D6232: Simplify the lo/high partition offset computation.
Sep 9 2021, 5:54 PM
douardda accepted D6215: docker/conf: Fix search journal client configurations.
Sep 9 2021, 10:11 AM
douardda requested changes to D6220: Added test only method info in the interface doc strings.

Please use imperative style in the got commit message
https://chris.beams.io/posts/git-commit/

Sep 9 2021, 9:03 AM

Sep 3 2021

douardda abandoned D5648: Add a bit of logging in the buffer proxy storage.
Sep 3 2021, 10:50 AM
douardda abandoned D4920: Randomize last_update in generated ListedOrigins in fill_test_data.
Sep 3 2021, 10:49 AM

Sep 1 2021

douardda added a comment to T3542: Decide what metadata we want to / can collect from GitHub.

do we need the "list of forks" if we keep the "fork of what"? I mean these are the 2 ends of the fork relation, right?

Sep 1 2021, 12:06 PM · Origin-GitHub, Extrinsic metadata

Aug 30 2021

douardda added a comment to T3487: Installation of the new provenance server.

yes the idea is to have a beefy enough machine to perform full-size experiments on, that can then be (part of) the production infrastructure dedicated to the provenance index.

Aug 30 2021, 11:28 AM · System administration

Aug 13 2021

douardda accepted D6087: Remove shell scripts from setup.py.
Aug 13 2021, 4:54 PM
douardda added inline comments to D5818: send-to-celery: Add more options to allow scheduling of edge case origins.
Aug 13 2021, 3:35 PM
douardda added a comment to D6084: Rename PostgreSQL backend and code styling.

For the fix of revision_get, there should be a test.

The test is coming later from @jayeshv mongodb branch.

Aug 13 2021, 12:15 PM
douardda resigned from D6084: Rename PostgreSQL backend and code styling.
Aug 13 2021, 12:05 PM
douardda requested changes to D6084: Rename PostgreSQL backend and code styling.

Please don't mix fixes with codestyling/renaming revisions in a single diff, it makes the review much harder.

Aug 13 2021, 11:35 AM
douardda added a comment to T3444: 26/07/2021: Unstuck infrastructure outage then post-mortem.

And we could also use zfs-backed thin provisionning for the / of workers to save storage space (and possibly help to ensure consistency of deployed workers... not extra convinced of this later point)

Aug 13 2021, 10:31 AM · System administration
douardda added a comment to T3444: 26/07/2021: Unstuck infrastructure outage then post-mortem.

but that requires some more storage on hypervisors we currently don't have

Don't the hypervisors also serve as OSDs? We could just get a disk per hypervisor (partially?) out of the ceph cluster and use it for the workers' /tmp, or even their whole disk.

Aug 13 2021, 10:25 AM · System administration
douardda accepted D6073: bytes_to_str: Format strings directly, instead of constructing ExtendedSWHID.

but anyway, it looks fine to me

Aug 13 2021, 9:55 AM
douardda added inline comments to D6073: bytes_to_str: Format strings directly, instead of constructing ExtendedSWHID.
Aug 13 2021, 9:54 AM
douardda added a comment to T3444: 26/07/2021: Unstuck infrastructure outage then post-mortem.

one other improvement may be to modify a bit the profile of the workers (to reduce the load on the ceph cluster):

  • lower the replication factor for workers' volumes (or even use local storage, but that requires some more storage on hypervisors we currently don't have),
  • (probably not very relevant but) stop having swap on workers (since this swap end up being on the ceph volume, so replicated etc.) (oh this has been done already, good)
Aug 13 2021, 9:24 AM · System administration

Aug 12 2021

douardda added inline comments to D6073: bytes_to_str: Format strings directly, instead of constructing ExtendedSWHID.
Aug 12 2021, 1:57 PM
douardda accepted D6071: Revisited history graph implementation.
Aug 12 2021, 10:04 AM
douardda added a comment to D6071: Revisited history graph implementation.
  • the use of newly introduced as_dict() methods seems unrelated here; unless I'm mistaken, the purpose if this change is better assertion reports by pytest on failure; if so, it should be presented as this in a dedicated revision

This method is only used for test purposes but it doesn't make sense without the refactoring (the complete HistoryGraph class was not even present prior to the refactoring),

Aug 12 2021, 10:04 AM

Aug 11 2021

douardda requested changes to D6071: Revisited history graph implementation.
Aug 11 2021, 12:37 PM