Page MenuHomeSoftware Heritage

douardda (David Douard)
User

User Details

User Since
Jul 10 2018, 12:38 PM (132 w, 6 d)

Recent Activity

Today

douardda added a comment to D4909: simulator: add lister simulation.

Yes, but that's not inconsistent as we can discover origins that we didn't know about.

Mon, Jan 25, 12:23 PM
douardda added a comment to D4909: simulator: add lister simulation.

I'm really not sure to understand what the simulated model looks like in the end. Do I get it right that, including this diff:

Mon, Jan 25, 12:04 PM

Fri, Jan 22

douardda updated the diff for D4923: Simulation: allow to export results in a csv file.

rebased

Fri, Jan 22, 4:20 PM
douardda updated the diff for D4921: Make plotting optional in simulator cli command.

s/-H/-P/

Fri, Jan 22, 4:20 PM
douardda added inline comments to D4921: Make plotting optional in simulator cli command.
Fri, Jan 22, 4:19 PM
douardda added a comment to D4927: lister.docs: add a lister template for the new API.

thanks. I think however, given its purpose, this example code should be heavily commented: each constant (eg. MyPageType) and each method should be commented (not docstrings but comments exaplaining what the method/variable is used for).

Fri, Jan 22, 4:15 PM · Sprint 2021 01, Lister
douardda accepted D4912: grab_next_visits: don't re-schedule visits too fast.

Not very fond of this "one week => dead" embedded in there, but meh.

Fri, Jan 22, 3:37 PM
douardda accepted D4916: Run simulator tests on all known scheduling policies.
Fri, Jan 22, 3:33 PM
douardda accepted D4915: simulator: record visit metrics alongside scheduler metrics.
Fri, Jan 22, 3:33 PM
douardda accepted D4910: Construct grab_next_visits query arguments incrementally.

ok, but it would have been nice to have an explanation of why this is necessary in the commit message.

Fri, Jan 22, 3:29 PM
douardda accepted D4911: Allow overriding the timestamp of grab_next_visits.
Fri, Jan 22, 3:27 PM
douardda added a comment to D4920: Randomize last_update in generated ListedOrigins in fill_test_data.

why not (cli option), but why (keep it deterministic)?

  1. reproducibility, so we can run the simulator twice with different code, and be sure that differences in behavior are not caused by randomness
Fri, Jan 22, 2:27 PM
douardda closed D4919: Add a --num-origins option to the fill-test-data cli command.
Fri, Jan 22, 2:12 PM
douardda committed rDSCH86b255544c5d: Add a --num-origins option to the fill-test-data cli command (authored by douardda).
Add a --num-origins option to the fill-test-data cli command
Fri, Jan 22, 2:12 PM
douardda updated the diff for D4919: Add a --num-origins option to the fill-test-data cli command.

rebased

Fri, Jan 22, 2:12 PM
douardda closed D4922: Simulation: log at infol level recorded metrics.
Fri, Jan 22, 2:10 PM
douardda committed rDSCHabb513ca7d09: Simulation: log at info level recorded metrics (authored by douardda).
Simulation: log at info level recorded metrics
Fri, Jan 22, 2:10 PM
douardda updated the diff for D4922: Simulation: log at infol level recorded metrics.

rebased

Fri, Jan 22, 2:10 PM
douardda added a comment to D4920: Randomize last_update in generated ListedOrigins in fill_test_data.

I'd like to keep the simulator deterministic. What about adding a CLI option with a seed?

why not (cli option), but why (keep it deterministic)?

Also, a given seed will not be enough here: there is also the maxts = int(utcnow().timestamp()) that will kill the deterministic property...

Fri, Jan 22, 12:28 PM
douardda added a comment to D4920: Randomize last_update in generated ListedOrigins in fill_test_data.

I'd like to keep the simulator deterministic. What about adding a CLI option with a seed?

why not (cli option), but why (keep it deterministic)?

Fri, Jan 22, 12:27 PM
douardda added a comment to D4920: Randomize last_update in generated ListedOrigins in fill_test_data.

I'd like to keep the simulator deterministic. What about adding a CLI option with a seed?

Fri, Jan 22, 12:25 PM
douardda updated the diff for D4923: Simulation: allow to export results in a csv file.

rebased

Fri, Jan 22, 12:23 PM
douardda updated the summary of D4923: Simulation: allow to export results in a csv file.
Fri, Jan 22, 12:23 PM
douardda updated the summary of D4923: Simulation: allow to export results in a csv file.
Fri, Jan 22, 12:22 PM
douardda retitled D4921: Make plotting optional in simulator cli command from Make plotting histograms optional in simulator cli command to Make plotting optional in simulator cli command.
Fri, Jan 22, 12:21 PM
douardda updated the diff for D4921: Make plotting optional in simulator cli command.

rebase on D4916

Fri, Jan 22, 12:21 PM
douardda requested review of D4920: Randomize last_update in generated ListedOrigins in fill_test_data.
Fri, Jan 22, 11:40 AM
douardda updated the diff for D4923: Simulation: allow to export results in a csv file.

rebased

Fri, Jan 22, 11:36 AM
douardda updated the diff for D4921: Make plotting optional in simulator cli command.

kill unnedded dependency on D4920

Fri, Jan 22, 11:27 AM
douardda updated the summary of D4921: Make plotting optional in simulator cli command.
Fri, Jan 22, 11:26 AM
douardda updated the diff for D4922: Simulation: log at infol level recorded metrics.

with the commit...

Fri, Jan 22, 11:22 AM
douardda updated the diff for D4922: Simulation: log at infol level recorded metrics.

type + vorentz' comment

Fri, Jan 22, 11:21 AM
douardda updated the summary of D4922: Simulation: log at infol level recorded metrics.
Fri, Jan 22, 11:19 AM
douardda accepted D4877: npm: Reimplement lister using new Lister API.
Fri, Jan 22, 11:08 AM
douardda added inline comments to D4909: simulator: add lister simulation.
Fri, Jan 22, 11:06 AM
douardda added inline comments to D4909: simulator: add lister simulation.
Fri, Jan 22, 11:05 AM
douardda requested review of D4923: Simulation: allow to export results in a csv file.
Fri, Jan 22, 11:01 AM
douardda requested review of D4922: Simulation: log at infol level recorded metrics.
Fri, Jan 22, 10:59 AM
douardda accepted D4899: Add scheduling policy for already visited origins with known last update.

lgtm

Fri, Jan 22, 10:58 AM
douardda requested review of D4921: Make plotting optional in simulator cli command.
Fri, Jan 22, 10:57 AM
douardda requested review of D4919: Add a --num-origins option to the fill-test-data cli command.
Fri, Jan 22, 10:52 AM

Thu, Jan 21

douardda added inline comments to D4895: Add a successive_visits counter to OriginVisitStats.
Thu, Jan 21, 9:57 AM
douardda closed D4894: Simplify journal client tests.
Thu, Jan 21, 9:55 AM
douardda committed rDSCHffe2aed2fa32: Simplify journal client tests (authored by douardda).
Simplify journal client tests
Thu, Jan 21, 9:55 AM

Wed, Jan 20

douardda updated the diff for D4895: Add a successive_visits counter to OriginVisitStats.

rebased

Wed, Jan 20, 6:04 PM
douardda updated the diff for D4894: Simplify journal client tests.

rebased

Wed, Jan 20, 6:03 PM
douardda added a reverting change for rDSCHb03d978241a6: Make sure swh.scheduler.cli.journal is loaded in test_cli_journal.py: rDSCHc7b740cafa64: Revert "Make sure swh.scheduler.cli.journal is loaded in test_cli_journal.py".
Wed, Jan 20, 6:02 PM
douardda committed rDSCHc7b740cafa64: Revert "Make sure swh.scheduler.cli.journal is loaded in test_cli_journal.py" (authored by douardda).
Revert "Make sure swh.scheduler.cli.journal is loaded in test_cli_journal.py"
Wed, Jan 20, 6:02 PM
douardda closed D4893: Make the max_date() helper function accept *dates as argument.
Wed, Jan 20, 5:29 PM
douardda committed rDSCHc386fdf3b9fc: Make the max_date() helper function accept *dates as argument (authored by douardda).
Make the max_date() helper function accept *dates as argument
Wed, Jan 20, 5:29 PM
douardda added inline comments to D4877: npm: Reimplement lister using new Lister API.
Wed, Jan 20, 3:37 PM
douardda requested review of D4895: Add a successive_visits counter to OriginVisitStats.
Wed, Jan 20, 12:49 PM
douardda added a comment to D4891: model: Allow new status values not_found and failed to OriginVisitStatus.

This makes me wonder if we shouldn't add an explicit failed status too, while we're at it, for explicit failures that couldn't generate a partial snapshot.

I'd be fine with that indeed.

I think we entertain the idea with @douardda and @vsellier

Wed, Jan 20, 12:48 PM
douardda requested review of D4894: Simplify journal client tests.
Wed, Jan 20, 12:46 PM
douardda accepted D4889: Add a cli for the scheduler metrics update endpoint.

lgtm

Wed, Jan 20, 12:45 PM
douardda requested review of D4893: Make the max_date() helper function accept *dates as argument.
Wed, Jan 20, 12:44 PM
douardda committed rDSCHb03d978241a6: Make sure swh.scheduler.cli.journal is loaded in test_cli_journal.py (authored by douardda).
Make sure swh.scheduler.cli.journal is loaded in test_cli_journal.py
Wed, Jan 20, 12:20 PM
douardda accepted D4880: Implement some basic aggregated metrics on listed origins.

Looks ok to me. I'd like however to have a description of implemented metrics in the commit message (and in the documentation, but this may come later)

Wed, Jan 20, 10:31 AM
douardda closed D4881: Move the `last_scheduled` ts from ListedOrigin to OriginVisitStatus.
Wed, Jan 20, 10:05 AM
douardda committed rDSCHf8627a96fed6: Move the `last_scheduled` ts from ListedOrigin to OriginVisitStatus (authored by douardda).
Move the `last_scheduled` ts from ListedOrigin to OriginVisitStatus
Wed, Jan 20, 10:05 AM

Tue, Jan 19

douardda updated the diff for D4881: Move the `last_scheduled` ts from ListedOrigin to OriginVisitStatus.

rebased

Tue, Jan 19, 5:49 PM
douardda closed D4885: Make the journal-client cli subcommand automagically loaded.
Tue, Jan 19, 5:48 PM
douardda committed rDSCH0a32a31195f1: Make the journal-client cli subcommand automagically loaded (authored by douardda).
Make the journal-client cli subcommand automagically loaded
Tue, Jan 19, 5:48 PM
douardda requested review of D4885: Make the journal-client cli subcommand automagically loaded.
Tue, Jan 19, 4:21 PM
douardda updated the diff for D4881: Move the `last_scheduled` ts from ListedOrigin to OriginVisitStatus.

Use olasd's version os the SQL query

Tue, Jan 19, 4:14 PM
douardda added inline comments to D4881: Move the `last_scheduled` ts from ListedOrigin to OriginVisitStatus.
Tue, Jan 19, 3:50 PM
douardda added inline comments to D4881: Move the `last_scheduled` ts from ListedOrigin to OriginVisitStatus.
Tue, Jan 19, 3:46 PM
douardda added inline comments to D4856: Introduce scaffolding for a scheduler simulator.
Tue, Jan 19, 3:43 PM
douardda accepted D4856: Introduce scaffolding for a scheduler simulator.

overall looks good to me, but it could benefit from more comments and explanations. Not easy to get in as is.

Tue, Jan 19, 3:32 PM
douardda added inline comments to D4856: Introduce scaffolding for a scheduler simulator.
Tue, Jan 19, 3:28 PM
douardda closed D4739: Add a 'Data model and specifications' section in the main index.
Tue, Jan 19, 3:21 PM
douardda committed rDDOC26329c513452: Add a 'Data model and specifications' section in the main index (authored by douardda).
Add a 'Data model and specifications' section in the main index
Tue, Jan 19, 3:21 PM
douardda updated the diff for D4739: Add a 'Data model and specifications' section in the main index.

rebse

Tue, Jan 19, 3:20 PM
douardda added inline comments to D4856: Introduce scaffolding for a scheduler simulator.
Tue, Jan 19, 2:58 PM
douardda added inline comments to D4856: Introduce scaffolding for a scheduler simulator.
Tue, Jan 19, 2:56 PM
douardda updated the diff for D4881: Move the `last_scheduled` ts from ListedOrigin to OriginVisitStatus.

forget the migration script

Tue, Jan 19, 2:51 PM
douardda requested review of D4881: Move the `last_scheduled` ts from ListedOrigin to OriginVisitStatus.
Tue, Jan 19, 2:51 PM
douardda added a revision to T2444: Implement the scheduling policy for the recurrent visit scheduler: D4881: Move the `last_scheduled` ts from ListedOrigin to OriginVisitStatus.
Tue, Jan 19, 2:49 PM · Sprint 2021 01, Scheduling utilities
douardda accepted D4879: requirements: Make swh.journal and optional dependency.

lgtm

Tue, Jan 19, 1:49 PM

Mon, Jan 18

douardda added a comment to T2974: Define (and implement) scheduler performance metrics.

thanks, looks a good starting point.

Mon, Jan 18, 4:36 PM · Sprint 2021 01, Archive coverage, Scheduling utilities

Fri, Jan 15

douardda created P922 (An Untitled Masterwork).
Fri, Jan 15, 11:44 AM

Wed, Jan 13

douardda triaged T2970: Make swh-journal tests not depend on swh-model any more as Normal priority.
Wed, Jan 13, 4:48 PM · Journal
douardda created T2970: Make swh-journal tests not depend on swh-model any more.
Wed, Jan 13, 4:48 PM · Journal
douardda claimed T2964: Adapt origin_visit_status_(get|add) api to deal with the visit_type.
Wed, Jan 13, 11:39 AM · Storage manager, Sprint 2021 01
douardda moved T2964: Adapt origin_visit_status_(get|add) api to deal with the visit_type from todo to in-progress on the Sprint 2021 01 board.
Wed, Jan 13, 11:39 AM · Storage manager, Sprint 2021 01
douardda changed the status of T2964: Adapt origin_visit_status_(get|add) api to deal with the visit_type, a subtask of T2443: Implement a bulk-queryable cache of latest visits for use by the recurrent visit scheduler, from Open to Work in Progress.
Wed, Jan 13, 11:39 AM · Sprint 2021 01, Scheduling utilities
douardda changed the status of T2964: Adapt origin_visit_status_(get|add) api to deal with the visit_type from Open to Work in Progress.
Wed, Jan 13, 11:39 AM · Storage manager, Sprint 2021 01
douardda closed D4826: Allow to use the JAVA_HOME environment for cassandra tests.
Wed, Jan 13, 11:05 AM
douardda committed rDSTO728c3eeaae84: Allow to use the JAVA_HOME environment for cassandra tests (authored by douardda).
Allow to use the JAVA_HOME environment for cassandra tests
Wed, Jan 13, 11:05 AM
douardda triaged T2969: Make tests compatible with hypothesis 6 as Normal priority.
Wed, Jan 13, 11:04 AM
douardda updated the diff for D4698: [WIP][RFC] Add support for ExtID in the storage.

rebase

Wed, Jan 13, 11:01 AM
douardda updated the diff for D4826: Allow to use the JAVA_HOME environment for cassandra tests.

rebase

Wed, Jan 13, 10:57 AM
douardda closed D4850: Enforce hypothesis <6 to prevent test breakage.
Wed, Jan 13, 10:55 AM
douardda committed rDSTO30945a5890d4: Enforce hypothesis <6 to prevent test breakage (authored by douardda).
Enforce hypothesis <6 to prevent test breakage
Wed, Jan 13, 10:55 AM
douardda requested review of D4850: Enforce hypothesis <6 to prevent test breakage.
Wed, Jan 13, 10:52 AM
douardda updated the diff for D4826: Allow to use the JAVA_HOME environment for cassandra tests.

Add a paragraph in the README file

Wed, Jan 13, 9:45 AM
douardda closed D4825: Make the CREATE_TABLES_QUERIES in cassandra/schema.py an explicit list.
Wed, Jan 13, 9:44 AM
douardda committed rDSTO74e6f58e2f9b: Make the CREATE_TABLES_QUERIES in cassandra/schema.py an explicit list (authored by douardda).
Make the CREATE_TABLES_QUERIES in cassandra/schema.py an explicit list
Wed, Jan 13, 9:44 AM

Tue, Jan 12

douardda added a project to T2961: Add a "not_found" value in OriginVisitStatus.status: Sprint 2021 01.
Tue, Jan 12, 11:24 AM · Sprint 2021 01, Data Model
douardda triaged T2961: Add a "not_found" value in OriginVisitStatus.status as Normal priority.
Tue, Jan 12, 11:24 AM · Sprint 2021 01, Data Model