Page MenuHomeSoftware Heritage

indexing-lister: Allow to define flush packet size
ClosedPublic

Authored by ardumont on Jun 23 2019, 9:35 AM.

Details

Summary

Prior to this commit, indexing lister instances were flushing every packet of

  1. This can now be defined per sub classes.

For the bitbucket lister, as the number of repositories grew from 10 per page to
100 per page, that enlarged the time frame between flushes.

Depends on D1638

Test Plan

tox

Diff Detail

Repository
rDLS Listers
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Rebase on latest D1634's commit content

ardumont retitled this revision from indexing-lister: Allow to define flush packet db size to indexing-lister: Allow to define flush packet size.Jun 24 2019, 5:49 PM
vlorentz added a subscriber: vlorentz.
vlorentz added inline comments.
swh/lister/bitbucket/lister.py
32

I'm confused by this comment. Prior behavior of what? (I can deduce IndexingLister because it's in the same diff, but it won't make sense afterward.) And why does the Bitbucket lister need to override this behavior?

This revision now requires changes to proceed.Jun 26 2019, 10:24 AM
swh/lister/bitbucket/lister.py
32

Because i changed the packet size returned by the api from 10 repositories (too small) to 100 repositories (a tad better) for the bitbucket listing.

so 2 iterations of 100 repositories, i already have the 200 repositories to flush in db.
If i kept the original indexing lister, i would have changed the behavior to flush every 2000 repositories.

vlorentz added inline comments.
swh/lister/core/indexing_lister.py
18

This argument should have a short docstring

This revision is now accepted and ready to land.Jun 26 2019, 10:33 AM
This revision was automatically updated to reflect the committed changes.