Page MenuHomeSoftware Heritage

debian: Remove the extrinsic version from release names
ClosedPublic

Authored by vlorentz on Dec 6 2021, 2:06 PM.

Details

Summary

Use only the intrinsic version (eg. 1.0.0) instead of the extrinsic version
(eg. stretch/contrib/1.0.0).
Releases should only contain data from DSC, not external 'pointers' to them.

Additionally, having extrinsic data in releases means the same
dsc-sha256 extid can point to different releases, which meant the loader
may reuse a release mentioning a specific suite as a release in a
different suite.
With this commit, this won't be a problem anymore, as releases won't
mention the suite at all, so suites can safely share extids.

Depends on D6748

Note: this diff fixes data corruption, that at best crashes the loaders,
at worst makes them write wrong stuff. In order to fix it, we need to wipe
all ExtIDs from the database with type dsc-sha256 pointing to a release SWHID.

Event Timeline

vlorentz edited the summary of this revision. (Show Details)
vlorentz edited the test plan for this revision. (Show Details)

Build is green

Patch application report for D6749 (id=24510)

Could not rebase; Attempt merge onto 5d22455c94...

Updating 5d22455..2e01f10
Fast-forward
 docs/package-loader-specifications.rst         |  9 ++++--
 swh/loader/package/debian/loader.py            | 19 +++++++----
 swh/loader/package/debian/tests/test_debian.py | 44 +++++++++++++++++---------
 3 files changed, 47 insertions(+), 25 deletions(-)
Changes applied before test
commit 2e01f10bb05215ed724d490415e8cce9ccd89b35
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Dec 6 14:03:51 2021 +0100

    debian: Remove the extrinsic version from release names
    
    Use only the intrinsic version (eg. 1.0.0) instead of the extrinsic version
    (eg. stretch/contrib/1.0.0).
    Releases should only contain data from DSC, not external 'pointers' to them.
    
    Additionally, having extrinsic data in releases means the same
    dsc-sha256 extid can point to different releases, which meant the loader
    may reuse a release mentioning a specific suite as a release in a
    different suite.
    With this commit, this won't be a problem anymore, as releases won't
    mention the suite at all, so suites can safely share extids.

commit 958674dc24171ced169397054ffb1502b60f7f43
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Dec 6 13:48:04 2021 +0100

    debian: Fix confusion between the two versions
    
    'version' was documented as the intrinsic version (eg. '0.7.2-3') and
    'full_version' as the one containing the suite name (eg. 'stretch/contrib/0.7.2-3').
    
    In practice, it was the opposite, except in a few incorrect test.
    
    This commit fixes said tests, and renamed 'full_version' to
    'intrinsic_version'.
    
    This is only a refactoring, the behavior is unchanged for now;
    but a future commit will remove the 'version' (which is extrinsic) from
    the release name (which should contain only data intrinsic to the DSC).

See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/644/ for more details.

anlambert added a subscriber: anlambert.

Looks good to me.

Indeed, releases with the same intrinsic version target the same directory regardless of the suite (see example below before that diff) so nice catch !

Thanks for fixing this.

I'm not too sure about wiping the extids. Sounds like a case for bumping the expected extid_version instead.

This revision is now accepted and ready to land.Dec 6 2021, 2:47 PM

I'm not too sure about wiping the extids. Sounds like a case for bumping the expected extid_version instead.

Bumping the extid_version means we need to reload Debian entirely from scratch. By just wiping these release extids, we'll only discard what was loaded in the last couple of weeks.

I'm not too sure about wiping the extids. Sounds like a case for bumping the expected extid_version instead.

Bumping the extid_version means we need to reload Debian entirely from scratch.

Yes.

By just wiping these release extids, we'll only discard what was loaded in the last couple of weeks.

That is much, much more annoying to do.

Build has FAILED

Patch application report for D6749 (id=24538)

Could not rebase; Attempt merge onto 2d9e93a2f2...

Updating 2d9e93a..4632adb
Fast-forward
 docs/package-loader-specifications.rst         |  9 ++++--
 swh/loader/package/debian/loader.py            | 19 +++++++----
 swh/loader/package/debian/tests/test_debian.py | 44 +++++++++++++++++---------
 3 files changed, 47 insertions(+), 25 deletions(-)
Changes applied before test
commit 4632adb5af307dc48d791276a12e8aafa46975de
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Dec 6 14:03:51 2021 +0100

    debian: Remove the extrinsic version from release names
    
    Use only the intrinsic version (eg. 1.0.0) instead of the extrinsic version
    (eg. stretch/contrib/1.0.0).
    Releases should only contain data from DSC, not external 'pointers' to them.
    
    Additionally, having extrinsic data in releases means the same
    dsc-sha256 extid can point to different releases, which meant the loader
    may reuse a release mentioning a specific suite as a release in a
    different suite.
    With this commit, this won't be a problem anymore, as releases won't
    mention the suite at all, so suites can safely share extids.

commit 26996ef3eef494eb187e01b999255cfa101ad758
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Dec 6 13:48:04 2021 +0100

    debian: Fix confusion between the two versions
    
    'version' was documented as the intrinsic version (eg. '0.7.2-3') and
    'full_version' as the one containing the suite name (eg. 'stretch/contrib/0.7.2-3').
    
    In practice, it was the opposite, except in a few incorrect test.
    
    This commit fixes said tests, and renamed 'full_version' to
    'intrinsic_version'.
    
    This is only a refactoring, the behavior is unchanged for now;
    but a future commit will remove the 'version' (which is extrinsic) from
    the release name (which should contain only data intrinsic to the DSC).

Link to build: https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/650/
See console output for more information: https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/650/console

fix rebase (missing test update)

Build is green

Patch application report for D6749 (id=24541)

Could not rebase; Attempt merge onto 2d9e93a2f2...

Updating 2d9e93a..c87c8c7
Fast-forward
 docs/package-loader-specifications.rst         |  9 +++--
 swh/loader/package/debian/loader.py            | 19 ++++++----
 swh/loader/package/debian/tests/test_debian.py | 50 ++++++++++++++++----------
 3 files changed, 50 insertions(+), 28 deletions(-)
Changes applied before test
commit c87c8c79ea660d5de697455d7c1882a216b23df5
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Dec 6 14:03:51 2021 +0100

    debian: Remove the extrinsic version from release names
    
    Use only the intrinsic version (eg. 1.0.0) instead of the extrinsic version
    (eg. stretch/contrib/1.0.0).
    Releases should only contain data from DSC, not external 'pointers' to them.
    
    Additionally, having extrinsic data in releases means the same
    dsc-sha256 extid can point to different releases, which meant the loader
    may reuse a release mentioning a specific suite as a release in a
    different suite.
    With this commit, this won't be a problem anymore, as releases won't
    mention the suite at all, so suites can safely share extids.

commit 26996ef3eef494eb187e01b999255cfa101ad758
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Dec 6 13:48:04 2021 +0100

    debian: Fix confusion between the two versions
    
    'version' was documented as the intrinsic version (eg. '0.7.2-3') and
    'full_version' as the one containing the suite name (eg. 'stretch/contrib/0.7.2-3').
    
    In practice, it was the opposite, except in a few incorrect test.
    
    This commit fixes said tests, and renamed 'full_version' to
    'intrinsic_version'.
    
    This is only a refactoring, the behavior is unchanged for now;
    but a future commit will remove the 'version' (which is extrinsic) from
    the release name (which should contain only data intrinsic to the DSC).

See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/652/ for more details.