Page MenuHomeSoftware Heritage

listers: Open cli to allow a first listing pass on new forge instance
ClosedPublic

Authored by ardumont on Wed, Aug 28, 12:06 PM.

Details

Summary

The output of that listing results in scheduled tasks with policy 'oneshot' and some priority (well depending on the input cli call).
Thus ingesting those faster and without manual intervention as we currently do [1]

Example use case (for now is for small forges):

$ export SWH_CONFIG_FILENAME=/etc/softwareheritage/lister.yml
$ swh lister run --lister gitlab \
                 --priority high \
                 --db-url postgresql://postgres@localhost:5432/swh-listers \
                 api_baseurl=https://gitlab.ow2.org/api/v4/

Related T1919

[1] along the lines of psql update "the listed tasks with priority to high", then wait for it to finish somehow, and then at some point, remove that priority with psql update yet again... tedious!

Also, to explicit this.
This does not replace a usual deployment which triggers recurring tasks with no priority.
This is to ease a first pass on new (small?) forges without having to plunge within the scheduler and drag a whole new task
(we discussed about some refactoring, with schema impacts, on the scheduler but that's for another time).

Test Plan
doco up -d swh-lister

Then trigger the sample cli use case and check the scheduler/lister db:

 policy  | priority |                                              arguments                                          $
---------+----------+-------------------------------------------------------------------------------------------------$
 oneshot | high     | {"args": ["https://gitlab.ow2.org/sat4j/sat4j.git"], "kwargs": {}}
 oneshot | high     | {"args": ["https://gitlab.ow2.org/stamp/unit-test-ampli.git"], "kwargs": {}}
 oneshot | high     | {"args": ["https://gitlab.ow2.org/stamp/conf-test-ampli.git"], "kwargs": {}}
 ...

A usual deployment would have seen the policy set to recurring and the priority
to null.

Note:

docker-compose.override.yml:

version: '2'

services:
  swh-listers-db:
    ports:
      - "5432:5432"

  swh-scheduler-db:
    ports:
      - "5433:5432"

  swh-lister:
    volumes:
      - "$SWH_ENVIRONMENT_HOME/swh-lister:/src/swh-lister"

Diff Detail

Repository
rDLS Listers
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

ardumont created this revision.Wed, Aug 28, 12:06 PM
ardumont edited the summary of this revision. (Show Details)Wed, Aug 28, 12:16 PM
ardumont added a project: Lister.
ardumont retitled this revision from listers: Allow to override policy and priority for scheduled tasks to listers: Open cli to allow a first listing pass on new forge instance.Wed, Aug 28, 1:48 PM
ardumont edited the summary of this revision. (Show Details)

Looking into how to add tests to the cli.
That should not block a first review pass though ;)

Looking into how to add tests to the cli.

Take a look at the index or scheduler cli

That should not block a first review pass though ;)

Doesn't it?

ardumont updated this revision to Diff 6448.Wed, Aug 28, 2:56 PM
  • cli: Bootstrap tests on cli

At least on new_lister method which instantiates listers.

douardda added inline comments.Wed, Aug 28, 2:57 PM
swh/lister/cli.py
31 ↗(On Diff #6439)

naming this function 'new_lister' is a bit weird, since this actually is a kind of factory. In other modules, we use "get_xxx" (get_storage, get_objstorage. etc.) for "similar" functions.

177 ↗(On Diff #6439)

For the sake of consistency, use the same 'postgres://lister' value as the 'list' command here.

douardda added inline comments.Wed, Aug 28, 2:59 PM
swh/lister/cli.py
205 ↗(On Diff #6448)

This description is a bit confusing (IMHO).

First it should make it clear that this command does NOT schedule a lister task, but instead it runs the lister directly.

Then, it should be made crystal clear that the priority and policy concern the loader tasks this lister schedules only.

Thanks for your feedback all ;)

swh/lister/cli.py
177 ↗(On Diff #6439)

Right!

205 ↗(On Diff #6448)

Yes. I'll improve on it.

Also, i'm wondering whether to open the policy at all.
As a user, i clearly want to use it solely with oneshot for now.

I just want to bypass the manual interventions i need to do all the time when triggering a new forge instance (cgit, gitlab, phabricator so far).

What do you think?

douardda added inline comments.Wed, Aug 28, 3:06 PM
swh/lister/cli.py
204 ↗(On Diff #6448)

Also: I'm not very fond of the subcommand name "list" (as in "swh lister list") ... replace this "list" by "scrape" may be? Or simply "run" since we do run a lister here.

ardumont updated this revision to Diff 6449.Wed, Aug 28, 3:20 PM

Adapt according to reviews/discussion:

  • cli: Align --db-url's default value with the list subcommand
  • cli: Unify new_lister method name to get_lister
  • cli: Rename 'list' subcommand to 'run'
  • cli: Remove the --policy parameter
  • cli: Improve the 'run' subcommand dostring
ardumont edited the summary of this revision. (Show Details)Wed, Aug 28, 3:21 PM
ardumont edited the summary of this revision. (Show Details)Wed, Aug 28, 3:23 PM
ardumont edited the summary of this revision. (Show Details)
ardumont added inline comments.
swh/lister/cli.py
204 ↗(On Diff #6448)

Clearer, thx!

This revision is now accepted and ready to land.Wed, Aug 28, 3:57 PM
ardumont updated this revision to Diff 6451.Wed, Aug 28, 4:30 PM

Rework commits (squash some)

This revision was automatically updated to reflect the committed changes.
ardumont edited the summary of this revision. (Show Details)Thu, Aug 29, 10:59 AM
ardumont edited the summary of this revision. (Show Details)