Page MenuHomeSoftware Heritage

luigi: Add LocalExport task
ClosedPublic

Authored by vlorentz on Nov 21 2022, 3:51 PM.

Details

Summary

It allows other packages (eg. swh-graph) to depend on the presence of the local
dataset, with a configurable way to obtain it if missing

Depends on D8864.

Diff Detail

Repository
rDDATASET Datasets
Branch
master
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 32882
Build 51532: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 51531: arc lint + arc unit

Event Timeline

Build is green

Patch application report for D8865 (id=31954)

Could not rebase; Attempt merge onto 23853dbfac...

Updating 23853db..e4df585
Fast-forward
 swh/dataset/luigi.py | 125 +++++++++++++++++++++++++++++++++++----------------
 1 file changed, 86 insertions(+), 39 deletions(-)
Changes applied before test
commit e4df585f8dd66aa3bca0be967ef79cd6fa8a7c0a
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Nov 21 15:47:03 2022 +0100

    luigi: Add LocalExport task
    
    It allows other packages (eg. swh-graph) to depend on the presence of the local
    dataset, with a configurable way to obtain it if missing

commit b39436e38be5fefe16c92d6553845cd113bafd14
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Nov 21 15:42:33 2022 +0100

    luigi: Remove copies of stamp files to/from S3
    
    They are only useful while exporting the dataset -- after the export is
    finished, meta.json is good enough and stamp files only save a couple
    of minutes when only some objects types are needed (ie. never in practice)

See https://jenkins.softwareheritage.org/job/DDATASET/job/tests-on-diff/166/ for more details.

ardumont added a subscriber: ardumont.

lgtm but fix the typo in the docstring cli parameter ;)

swh/dataset/luigi.py
504

typo here, maybe this is more correct?

This revision is now accepted and ready to land.Nov 23 2022, 10:23 AM
This revision was landed with ongoing or failed builds.Nov 24 2022, 4:12 PM
This revision was automatically updated to reflect the committed changes.

Build is green

Patch application report for D8865 (id=32024)

Could not rebase; Attempt merge onto 23853dbfac...

Updating 23853db..0bf9c88
Fast-forward
 swh/dataset/luigi.py | 125 +++++++++++++++++++++++++++++++++++----------------
 1 file changed, 86 insertions(+), 39 deletions(-)
Changes applied before test
commit 0bf9c88d9604184b55735541b890797a890a9182
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Nov 21 15:47:03 2022 +0100

    luigi: Add LocalExport task
    
    It allows other packages (eg. swh-graph) to depend on the presence of the local
    dataset, with a configurable way to obtain it if missing

commit b39436e38be5fefe16c92d6553845cd113bafd14
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Nov 21 15:42:33 2022 +0100

    luigi: Remove copies of stamp files to/from S3
    
    They are only useful while exporting the dataset -- after the export is
    finished, meta.json is good enough and stamp files only save a couple
    of minutes when only some objects types are needed (ie. never in practice)

See https://jenkins.softwareheritage.org/job/DDATASET/job/tests-on-diff/168/ for more details.