Page MenuHomeSoftware Heritage

lister.core: Remove origin creation from lister behavior
ClosedPublic

Authored by ardumont on Jun 13 2019, 2:26 PM.

Details

Summary

Prior to this commit, lister did create origins as well in the archive
(swh-storage's db). Now, we plan to only schedule those origins for ingestion
(and we let the loaders do the actual creation).

Test Plan
tox

Then swh-docker-dev through multiple runs.

First run:

  • checkout origin/master (without this diff)
  • Add dummy/witness new task type loader-gnu (for the listing to actually succeed, implementation detail of the current lister/loader's status)
  • Run gnu lister through scheduler
  • Check swh-scheduler db (new tasks added)
  • Check swh-storage db (new origins)
  • Check swh-lister db (filled-in cache)

Second run:

  • checkout this diff
  • Add dummy/witness new task type loader-gnu
  • Run gnu lister through scheduler
  • Check swh-scheduler db (new tasks added)
  • Check swh-storage db (no new origins as this behavior got removed here so \m/)
  • Check swh-lister db (still filled-in cache)

Diff Detail

Repository
rDLS Listers
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

ardumont retitled this revision from lister.core: Schedule only missing tasks to lister.core: Remove origin creation from lister behavior.Jun 13 2019, 2:38 PM
douardda added inline comments.
swh/lister/core/lister_base.py
460

why delete this line from the docstring? Looks like it's still an expected argument to the method.

This revision now requires changes to proceed.Jun 13 2019, 2:41 PM
swh/lister/core/lister_base.py
460

It's a mistep.
I actually added it back.

I'm also trying to clarify some docstrings.

swh/lister/core/lister_base.py
387

I think there is no need of this method anymore

swh/lister/core/lister_base.py
387

indeed.

New:

  • lister_base: Remove no longer used method
  • lister_base: Clarify docstrings

Rework commit message

  • lister.core: Stop creating origins when scheduling tasks
swh/lister/core/lister_base.py
473

As @olasd pointed out, this still feels convoluted.

It seems that the current implementation of this method tries to avoid duplication in scheduling tasks.
This is or at least should already be taken care of either at the listing time or even at the scheduling time (through the create_tasks scheduler api call).

This needs some refactoring which i don't want to drag within that diff right now though.
The current goal is to deploy this soon and to trigger the listing of gnu and some phabricator instances ;)

This revision is now accepted and ready to land.Jun 13 2019, 3:39 PM

Plug diff to master branch

Squash commits that logically belong together

swh/lister/core/lister_base.py
473

The current goal is to deploy this soon and to trigger the listing of gnu and some phabricator instances ;)

as in T1797

This revision was automatically updated to reflect the committed changes.