Page MenuHomeSoftware Heritage

luigi: Add DownloadFromS3 task
ClosedPublic

Authored by vlorentz on Thu, Nov 10, 4:41 PM.

Details

Summary

This will allow running swh-graph tasks easily on machines that didn't
export the graph themselves.

Diff Detail

Repository
rDDATASET Datasets
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build is green

Patch application report for D8832 (id=31839)

Could not rebase; Attempt merge onto 058e568492...

Updating 058e568..6640b70
Fast-forward
 swh/dataset/luigi.py | 117 +++++++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 104 insertions(+), 13 deletions(-)
Changes applied before test
commit 6640b70bbffae25113af73e06ecafde8ef4a779a
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Nov 10 16:41:12 2022 +0100

    luigi: Add DownloadFromS3 task
    
    This will allow running swh-graph tasks easily on machines that didn't
    export the graph themselves.

commit 418cf1837e26e48f529f79afc4613d55a14060cf
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Nov 10 16:39:13 2022 +0100

    luigi: Make Format and ObjectType public
    
    Other tasks will import them in order to depend on tasks defined here

See https://jenkins.softwareheritage.org/job/DDATASET/job/tests-on-diff/162/ for more details.

ardumont added a subscriber: ardumont.

one question inline.

swh/dataset/luigi.py
436

Will that trigger an upload to s3 task?

If so, I gather this task is triggered from the appropriate node (with data and s3 access, etc...), right?

This revision is now accepted and ready to land.Mon, Nov 14, 2:02 PM

make the right parameter significant

swh/dataset/luigi.py
425

What does that do?

swh/dataset/luigi.py
436

Luigi tasks cannot be triggered remotely.

This requirements means that the task will run if it is properly configured, or the whole workflow will fail if it is not.

Build is green

Patch application report for D8832 (id=31869)

Could not rebase; Attempt merge onto 058e568492...

Updating 058e568..23853db
Fast-forward
 swh/dataset/luigi.py | 117 +++++++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 104 insertions(+), 13 deletions(-)
Changes applied before test
commit 23853dbfacd49aba0da526023e736a42cf4c328a
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Nov 10 16:41:12 2022 +0100

    luigi: Add DownloadFromS3 task
    
    This will allow running swh-graph tasks easily on machines that didn't
    export the graph themselves.

commit 418cf1837e26e48f529f79afc4613d55a14060cf
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Nov 10 16:39:13 2022 +0100

    luigi: Make Format and ObjectType public
    
    Other tasks will import them in order to depend on tasks defined here

See https://jenkins.softwareheritage.org/job/DDATASET/job/tests-on-diff/163/ for more details.

swh/dataset/luigi.py
425

significant=False means that if two tasks are called with different values for this parameter but equal values for all others, then Luigi will consider one of the tasks to be redundant and won't run it.

ardumont added inline comments.
swh/dataset/luigi.py
425

neat

436

neat too.

This revision was automatically updated to reflect the committed changes.