Page MenuHomeSoftware Heritage

pattern: Make lister flush regularly origins to scheduler
ClosedPublic

Authored by ardumont on Jan 28 2021, 4:55 PM.

Details

Summary

As origins is a generator, the previous behavior would try to consume the overall
generator to send the records.

This groups and sends batch of 100 origins to the scheduler for writing.

Related to T3003#57551

Test Plan

tox

Plus running the launchpad lister which now runs and writes data alongside the listing
instead of failing without writing anything.

From 0 before to growing

swh-scheduler=> select now(), count(*) from listed_origins lo inner join listers l on lo.lister_id=l.id and l.name='launchpad' and l.instance_name='launchpad';
              now              | count
-------------------------------+-------
 2021-01-28 15:54:50.063613+00 | 18000
(1 row)

Diff Detail

Repository
rDLS Listers
Branch
master
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 18860
Build 29220: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 29219: arc lint + arc unit

Event Timeline

Build is green

Patch application report for D4965 (id=17720)

Rebasing onto f862004700...

Current branch diff-target is up to date.
Changes applied before test
commit 0ad37740d9d7cfa4a7d75f5c8d5d7568396c1abf
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Thu Jan 28 16:51:56 2021 +0100

    pattern: Make lister flush regularly origins to scheduler
    
    As origins is a generator, the previous behavior would try to consume the overall
    generator to send the records.
    
    This groups and sends batch of 100 origins to the scheduler for writing.
    
    Related to T3003

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/221/ for more details.

anlambert added a subscriber: anlambert.

Looks good to me !

Nevertheless, errors like T3003#57551 can still appear if there is duplicate origins in the sent list.

This revision is now accepted and ready to land.Jan 28 2021, 5:05 PM

Nevertheless, errors like T3003#57551 can still appear if there is duplicate origins in the sent list.

yes, indeed and it still happens (that's another task ;) but now we have the other origins flushed already.