Page MenuHomeSoftware Heritage

Update the the listing and loading scheduling architecture doc
ClosedPublic

Authored by douardda on Apr 25 2022, 4:01 PM.

Details

Summary

There are titles on most of the svg objects, unfortunately these are not visible in sphinx...

Diff Detail

Repository
rDDOC Development documentation
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build has FAILED

Patch application report for D7647 (id=27670)

Rebasing onto b90babd909...

First, rewinding head to replay your work on top of it...
Applying: Update the the listing and loading scheduling architecture doc
Changes applied before test
commit 86bd52b70a490bfe867f352170a4a2b57982b072
Author: David Douard <david.douard@sdfa3.org>
Date:   Mon Apr 25 15:57:34 2022 +0200

    Update the the listing and loading scheduling architecture doc

Link to build: https://jenkins.softwareheritage.org/job/DDOC/job/build-on-diff/158/
See console output for more information: https://jenkins.softwareheritage.org/job/DDOC/job/build-on-diff/158/console

Harbormaster returned this revision to the author for changes because remote builds failed.Apr 25 2022, 4:10 PM
Harbormaster failed remote builds in B28727: Diff 27670!

Build has FAILED

Patch application report for D7647 (id=27679)

Rebasing onto b90babd909...

Current branch diff-target is up to date.
Changes applied before test
commit bc13d85a4bac5c02987a53244fc4821b95d7d5fa
Author: David Douard <david.douard@sdfa3.org>
Date:   Mon Apr 25 15:57:34 2022 +0200

    Update the the listing and loading scheduling architecture doc

Link to build: https://jenkins.softwareheritage.org/job/DDOC/job/build-on-diff/159/
See console output for more information: https://jenkins.softwareheritage.org/job/DDOC/job/build-on-diff/159/console

Harbormaster returned this revision to the author for changes because remote builds failed.Apr 25 2022, 5:53 PM
Harbormaster failed remote builds in B28738: Diff 27679!

Build has FAILED

Patch application report for D7647 (id=27679)

Rebasing onto b90babd909...

Current branch diff-target is up to date.
Changes applied before test
commit bc13d85a4bac5c02987a53244fc4821b95d7d5fa
Author: David Douard <david.douard@sdfa3.org>
Date:   Mon Apr 25 15:57:34 2022 +0200

    Update the the listing and loading scheduling architecture doc

Link to build: https://jenkins.softwareheritage.org/job/DDOC/job/build-on-diff/160/
See console output for more information: https://jenkins.softwareheritage.org/job/DDOC/job/build-on-diff/160/console

Build is green

Patch application report for D7647 (id=27712)

Rebasing onto b90babd909...

Current branch diff-target is up to date.
Changes applied before test
commit bcca817ce5b8b6de45756254597942f614b2aa05
Author: David Douard <david.douard@sdfa3.org>
Date:   Mon Apr 25 15:57:34 2022 +0200

    Update the the listing and loading scheduling architecture doc

See https://jenkins.softwareheritage.org/job/DDOC/job/build-on-diff/161/ for more details.

could you show the two images in the diff's description?

docs/architecture/overview.rst
69–78

You are using the syntax for the definition list, but in these two items you start a sentence in the title and finish it in the definition.

You should probably add a line break before "tasks" in both titles.

I have a few nits, but overall the text looks good to me, thank you!

Would you mind adding a rendered version of the plantuml diagram to the diff description?

docs/architecture/overview.rst
51–52

I think that sentence is backwards: the scheduler only pulls SWH visit information from the journal, update information is pulled by the lister.

116
145

Fichier archi scheduler:

UML 1: listing

UML 2: loading

General rendering:


Thanks. On the diagram, it looks like 9' is causing 12 (via 10 and 11), which is not true (yet! D7332 isn't agreed upon) and could be very confusing.

This is because 9' writes to the origin_visit table, but 10/11/12 are based on the origin table/topic.

Could you split swh-storage in two, to avoid this misunderstanding?

The rest LGTM

douardda marked 2 inline comments as done.

typos and better rst syntax as suggested by vlorentz

Note: the rephrasing needed in the Journal section is stil to be done.

Thanks. On the diagram, it looks like 9' is causing 12 (via 10 and 11), which is not true (yet! D7332 isn't agreed upon) and could be very confusing.

This is because 9' writes to the origin_visit table, but 10/11/12 are based on the origin table/topic.

Could you split swh-storage in two, to avoid this misunderstanding?

To clear the misunderstanding maybe the label for 10 could be Consume final "origin visit status" entries?

You could split the "visit scheduler" part of the scheduler api into two lines (listed origins + visit stats) but that's going into a lot of detail.

Maybe the celery box could use another column for the save code now tasks (and their dedicated celery workers) that still go through the task-based scheduler.

Build is green

Patch application report for D7647 (id=27776)

Rebasing onto b90babd909...

Current branch diff-target is up to date.
Changes applied before test
commit d5138338570d7bf4bd8ed92d80fa2384a4cd6744
Author: David Douard <david.douard@sdfa3.org>
Date:   Mon Apr 25 15:57:34 2022 +0200

    Update the the listing and loading scheduling architecture doc

See https://jenkins.softwareheritage.org/job/DDOC/job/build-on-diff/162/ for more details.

In D7647#200113, @olasd wrote:

Thanks. On the diagram, it looks like 9' is causing 12 (via 10 and 11), which is not true (yet! D7332 isn't agreed upon) and could be very confusing.

This is because 9' writes to the origin_visit table, but 10/11/12 are based on the origin table/topic.

Could you split swh-storage in two, to avoid this misunderstanding?

To clear the misunderstanding maybe the label for 10 could be Consume final "origin visit status" entries?

IMHO it's a detail; generally speaking, we do update visits info/stats from this origin-visit-status topic; the fact we are only using some of these messages is a minor detail.

You could split the "visit scheduler" part of the scheduler api into two lines (listed origins + visit stats) but that's going into a lot of detail.

Maybe the celery box could use another column for the save code now tasks (and their dedicated celery workers) that still go through the task-based scheduler.

I find this diagram already too complex. I'd prefer not to add more details.

I agree it could be useful to add a section for "save code now" (and save forge now)

docs/architecture/overview.rst
69–78

I did that but I wasn't convinced by the result. but I'm not convinced by this version either...

Current:

Proposed:

Both are fine for me, so I'll go for 2.

rephrase (journal usage by the scheduler)

Build has FAILED

Patch application report for D7647 (id=27786)

Rebasing onto b90babd909...

Current branch diff-target is up to date.
Changes applied before test
commit b6c59c46014215196e79d9b4377b9061e323dcfd
Author: David Douard <david.douard@sdfa3.org>
Date:   Mon Apr 25 15:57:34 2022 +0200

    Update the the listing and loading scheduling architecture doc

Link to build: https://jenkins.softwareheritage.org/job/DDOC/job/build-on-diff/164/
See console output for more information: https://jenkins.softwareheritage.org/job/DDOC/job/build-on-diff/164/console

Build has FAILED

Patch application report for D7647 (id=27786)

Rebasing onto b90babd909...

Current branch diff-target is up to date.
Changes applied before test
commit b6c59c46014215196e79d9b4377b9061e323dcfd
Author: David Douard <david.douard@sdfa3.org>
Date:   Mon Apr 25 15:57:34 2022 +0200

    Update the the listing and loading scheduling architecture doc

Link to build: https://jenkins.softwareheritage.org/job/DDOC/job/build-on-diff/165/
See console output for more information: https://jenkins.softwareheritage.org/job/DDOC/job/build-on-diff/165/console

I think the separate mechanic for save code now is much more than a detail; it's the main reason we've kept the old task scheduler.

But this diff is definitely a big step up from the status quo, and good enough to be published as is; we can improve on it later!

This revision is now accepted and ready to land.Apr 26 2022, 6:12 PM
This revision was landed with ongoing or failed builds.Apr 26 2022, 6:14 PM
This revision was automatically updated to reflect the committed changes.