Page MenuHomeSoftware Heritage

tasks: Simplify implementation and make visit_date parameter optional
ClosedPublic

Authored by anlambert on Apr 27 2022, 11:06 AM.

Details

Summary

Recent changes in swh-scheduler add new parameters to the celery tasks
produced from swh.scheduler.model.ListedOrigin instances.

So ensure to handle any new parameters by not hardcoding the expected
ones in task signatures.

Rename date parameter to visit_date in from disk loader tasks and
make it non mandatory.

Add new tests checking task parameters produced from ListedOrigin
instances do no raise error when attempting to create a git loader.

Related to T4187

Diff Detail

Repository
rDLDG Git loader
Branch
simplify-tasks
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 28872
Build 45126: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 45125: arc lint + arc unit

Event Timeline

Build is green

Patch application report for D7694 (id=27811)

Rebasing onto b34d31e600...

Current branch diff-target is up to date.
Changes applied before test
commit 005b5f476a43a8616fc3ea850d6535f8f8e79639
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Wed Apr 27 11:01:12 2022 +0200

    tasks: Simplify implementation and add tests for listed origins
    
    Recent changes in swh-scheduler add new parameters to the celery tasks
    produced from swh.scheduler.model.ListedOrigin instances.
    
    So ensure to handle any new parameters by not hardcoding the expected
    ones in task signatures.
    
    Also remove explicit setting of visit_date if none has been provided
    as task parameter as it is already handled in loader constructor.
    
    Add new tests checking task parameters produced from ListedOrigin
    instances do no raise error when attempting to create a git loader.
    
    Related to T4187

See https://jenkins.softwareheritage.org/job/DLDG/job/tests-on-diff/190/ for more details.

Remove *args, it's a footgun to allow unnamed arguments over the scheduler/Celery.

Also remove explicit setting of visit_date if none has been provided
as task parameter as it is already handled in loader constructor.

Where? All I see is that it expects a datetime: https://forge.softwareheritage.org/source/swh-loader-git/browse/master/swh/loader/git/from_disk.py$99

Remove *args, it's a footgun to allow unnamed arguments over the scheduler/Celery.

Ack, will do the change on the svn loader too then.

Where? All I see is that it expects a datetime: https://forge.softwareheritage.org/source/swh-loader-git/browse/master/swh/loader/git/from_disk.py$99

What I mean is that we never use the visit_date argument for celery tasks, the visit date value is set as current one in swh.loader.core.loader.BaseLoader constructor.
So there is no need to parse a string date in tasks implementation.

Remove unnamed arguments use in celery tasks

Build is green

Patch application report for D7694 (id=27812)

Rebasing onto b34d31e600...

Current branch diff-target is up to date.
Changes applied before test
commit dbd991ecb361a7482da65ff6a0b4cbeba91978d5
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Wed Apr 27 11:01:12 2022 +0200

    tasks: Simplify implementation and add tests for listed origins
    
    Recent changes in swh-scheduler add new parameters to the celery tasks
    produced from swh.scheduler.model.ListedOrigin instances.
    
    So ensure to handle any new parameters by not hardcoding the expected
    ones in task signatures.
    
    Also remove explicit setting of visit_date if none has been provided
    as task parameter as it is already handled in loader constructor.
    
    Add new tests checking task parameters produced from ListedOrigin
    instances do no raise error when attempting to create a git loader.
    
    Related to T4187

See https://jenkins.softwareheritage.org/job/DLDG/job/tests-on-diff/191/ for more details.

The date is indeed optional, but it must be parsed if provided.

Actually, I think it would make sense to keep it mandatory from Celery tasks; as visit dates when loading from disk should be the date the local git repo was downloaded, not the date it was loaded.

Restore parsing of visit_date parameter for from disk loader tasks

Actually, I think it would make sense to keep it mandatory from Celery tasks; as visit dates when loading from disk should be the date the local git repo was downloaded, not the date it was loaded.

I did not make the visit_date parameter mandatory but restore its parsing in the tasks for from disk loaders.

Build is green

Patch application report for D7694 (id=27813)

Rebasing onto b34d31e600...

Current branch diff-target is up to date.
Changes applied before test
commit 8d3e3dd7f23f83444f3a4266656a2be41af55fcb
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Wed Apr 27 11:01:12 2022 +0200

    tasks: Simplify implementation and add tests for listed origins
    
    Recent changes in swh-scheduler add new parameters to the celery tasks
    produced from swh.scheduler.model.ListedOrigin instances.
    
    So ensure to handle any new parameters by not hardcoding the expected
    ones in task signatures.
    
    Also remove explicit setting of visit_date if none has been provided
    as task parameter as it is already handled in loader constructor.
    
    Add new tests checking task parameters produced from ListedOrigin
    instances do no raise error when attempting to create a git loader.
    
    Related to T4187

See https://jenkins.softwareheritage.org/job/DLDG/job/tests-on-diff/192/ for more details.

Thanks. Could you mention the behavior change in the commit/diff title?

Thanks. Could you mention the behavior change in the commit/diff title?

I updated the commit message about the date parameter changes.

Build is green

Patch application report for D7694 (id=27814)

Rebasing onto b34d31e600...

Current branch diff-target is up to date.
Changes applied before test
commit 4b8144a03071180a17d8ac24ad8d6187546eb10c
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Wed Apr 27 11:01:12 2022 +0200

    tasks: Simplify implementation and add tests for listed origins
    
    Recent changes in swh-scheduler add new parameters to the celery tasks
    produced from swh.scheduler.model.ListedOrigin instances.
    
    So ensure to handle any new parameters by not hardcoding the expected
    ones in task signatures.
    
    Rename date parameter to visit_date in from disk loader tasks and
    make it non mandatory.
    
    Add new tests checking task parameters produced from ListedOrigin
    instances do no raise error when attempting to create a git loader.
    
    Related to T4187

See https://jenkins.softwareheritage.org/job/DLDG/job/tests-on-diff/193/ for more details.

Sorry to be annoying about this, but it should be in the title. eg. tasks: Simplify implementation and make visit_date optional; the change in behavior is more important than tests when reading history

Update commit message title

anlambert retitled this revision from tasks: Simplify implementation and add tests for listed origins to tasks: Simplify implementation and make visit_date parameter optional.Apr 27 2022, 1:17 PM

Build is green

Patch application report for D7694 (id=27816)

Rebasing onto b34d31e600...

Current branch diff-target is up to date.
Changes applied before test
commit 05242cd461fdefd9013228b61a9318fe5c1520f6
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Wed Apr 27 11:01:12 2022 +0200

    tasks: Simplify implementation and make visit_date parameter optional
    
    Recent changes in swh-scheduler add new parameters to the celery tasks
    produced from swh.scheduler.model.ListedOrigin instances.
    
    So ensure to handle any new parameters by not hardcoding the expected
    ones in task signatures.
    
    Rename date parameter to visit_date in from disk loader tasks and
    make it non mandatory.
    
    Add new tests checking task parameters produced from ListedOrigin
    instances do no raise error when attempting to create a git loader.
    
    Related to T4187

See https://jenkins.softwareheritage.org/job/DLDG/job/tests-on-diff/194/ for more details.

This revision is now accepted and ready to land.Apr 27 2022, 1:22 PM