Page MenuHomeSoftware Heritage

maven: Create one origin per package instead of one per package version
ClosedPublic

Authored by anlambert on Apr 28 2022, 3:54 PM.

Details

Summary

Previously the maven lister was creating an origin for each source
archive (jar, zip) it discovered during the listing process.

This is not the way Software Heritage decided to archive sources
coming from package managers. Instead one origin should be created
per package and all its versions should be found as releases in the
snapshot produced by the package loader.

So modify the maven lister in order to create one origin per package
grouping all its versions.

This change also modifies the way incremental listing is handled,
ListedOrigin instances will be yielded only if we discovered new
versions of a package since the last listing.

Tests have been updated to reflect these changes.

Related to T3874

Diff Detail

Repository
rDLS Listers
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build is green

Patch application report for D7710 (id=27883)

Rebasing onto c251594a1f...

Current branch diff-target is up to date.
Changes applied before test
commit bd706147194dc2a04b5240f623b691d6e7e0f316
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Wed Apr 27 17:54:53 2022 +0200

    maven: Create one origin per package instead of one per package version
    
    Previously the maven lister was creating an origin for each source
    archive (jar, zip) it discovered during the listing process.
    
    This is not the way Software Heritage decided to archive sources
    coming from package managers. Instead one origin should be created
    per package and all its versions should be found as releases in the
    snapshot produced by the package loader.
    
    So modify the maven lister in order to create one origin per package
    grouping all its versions.
    
    This change also modifies the way incremental listing is handled,
    ListedOrigin instances will be yielded only if we discovered new
    versions of a package since the last listing.
    
    Tests have been updated to reflect these changes.
    
    Related to T3874

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/511/ for more details.

ardumont added inline comments.
swh/lister/maven/lister.py
352

what does that snippet do?

lgtm

one question inline.

This revision is now accepted and ready to land.Apr 28 2022, 4:46 PM
swh/lister/maven/lister.py
352

It updates the last_update value of the origin if the jar archive creation date is greater than the current value.

lgtm

one question inline.

Currently working on the maven loader adaptation to reflect that change. A small change is required but I ended up improving tests implementation which takes some time.

Build is green

Patch application report for D7710 (id=27901)

Rebasing onto 985b71e80c...

Current branch diff-target is up to date.
Changes applied before test
commit 22bcd9deb221a11a159422cffec44982f299e9ab
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Wed Apr 27 17:54:53 2022 +0200

    maven: Create one origin per package instead of one per package version
    
    Previously the maven lister was creating an origin for each source
    archive (jar, zip) it discovered during the listing process.
    
    This is not the way Software Heritage decided to archive sources
    coming from package managers. Instead one origin should be created
    per package and all its versions should be found as releases in the
    snapshot produced by the package loader.
    
    So modify the maven lister in order to create one origin per package
    grouping all its versions.
    
    This change also modifies the way incremental listing is handled,
    ListedOrigin instances will be yielded only if we discovered new
    versions of a package since the last listing.
    
    Tests have been updated to reflect these changes.
    
    Related to T3874

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/513/ for more details.