Page MenuHomeSoftware Heritage

luigi: Add LocalExport task
ClosedPublic

Authored by vlorentz on Nov 21 2022, 3:51 PM.

Details

Summary

It allows other packages (eg. swh-graph) to depend on the presence of the local
dataset, with a configurable way to obtain it if missing

Depends on D8864.

Diff Detail

Repository
rDDATASET Datasets
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build is green

Patch application report for D8865 (id=31954)

Could not rebase; Attempt merge onto 23853dbfac...

Updating 23853db..e4df585
Fast-forward
 swh/dataset/luigi.py | 125 +++++++++++++++++++++++++++++++++++----------------
 1 file changed, 86 insertions(+), 39 deletions(-)
Changes applied before test
commit e4df585f8dd66aa3bca0be967ef79cd6fa8a7c0a
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Nov 21 15:47:03 2022 +0100

    luigi: Add LocalExport task
    
    It allows other packages (eg. swh-graph) to depend on the presence of the local
    dataset, with a configurable way to obtain it if missing

commit b39436e38be5fefe16c92d6553845cd113bafd14
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Nov 21 15:42:33 2022 +0100

    luigi: Remove copies of stamp files to/from S3
    
    They are only useful while exporting the dataset -- after the export is
    finished, meta.json is good enough and stamp files only save a couple
    of minutes when only some objects types are needed (ie. never in practice)

See https://jenkins.softwareheritage.org/job/DDATASET/job/tests-on-diff/166/ for more details.

ardumont added a subscriber: ardumont.

lgtm but fix the typo in the docstring cli parameter ;)

swh/dataset/luigi.py
503

typo here, maybe this is more correct?

This revision is now accepted and ready to land.Nov 23 2022, 10:23 AM
This revision was landed with ongoing or failed builds.Nov 24 2022, 4:12 PM
This revision was automatically updated to reflect the committed changes.

Build is green

Patch application report for D8865 (id=32024)

Could not rebase; Attempt merge onto 23853dbfac...

Updating 23853db..0bf9c88
Fast-forward
 swh/dataset/luigi.py | 125 +++++++++++++++++++++++++++++++++++----------------
 1 file changed, 86 insertions(+), 39 deletions(-)
Changes applied before test
commit 0bf9c88d9604184b55735541b890797a890a9182
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Nov 21 15:47:03 2022 +0100

    luigi: Add LocalExport task
    
    It allows other packages (eg. swh-graph) to depend on the presence of the local
    dataset, with a configurable way to obtain it if missing

commit b39436e38be5fefe16c92d6553845cd113bafd14
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Nov 21 15:42:33 2022 +0100

    luigi: Remove copies of stamp files to/from S3
    
    They are only useful while exporting the dataset -- after the export is
    finished, meta.json is good enough and stamp files only save a couple
    of minutes when only some objects types are needed (ie. never in practice)

See https://jenkins.softwareheritage.org/job/DDATASET/job/tests-on-diff/168/ for more details.