Page MenuHomeSoftware Heritage

package loaders: Make test failures more helpful
ClosedPublic

Authored by vlorentz on Nov 4 2021, 3:50 PM.

Details

Reviewers
olasd
Group Reviewers
Reviewers
Summary

Some tests did the following:

  1. build a snapshot
  2. get the snapshot from the storage
  3. compare it with the expected snapshot
  4. get the origin visit from the storage and check it

If the loader built a wrong snapshot, the test fails at step 2,
and the only information displayed is that the expected snapshot id
does not exist, which is very unhelpful.

Instead, I reordered them as: 1, 4, 2, 3. This way, if a wrong
snapshot is build by the loader, it is detected when comparing
the visit, and pytest shows the two hashes.
Then, the test can be modified to use the hash that is actually
generated to show the actual snapshot.

This is consistent with what was already done in the pypi loader.

Additionally, I made the following changes:

  1. always check stats last (because a difference in numbers is hardly actionable without testing other objects)
  2. add a few more snapshot id checks in visits
  3. deduplicated a hardcoded snapshot id.

Diff Detail

Repository
rDLDBASE Generic VCS/Package Loader
Branch
releases-T3636
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 24873
Build 38852: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 38851: arc lint + arc unit

Event Timeline

add missing item to the commit msg

Build has FAILED

Patch application report for D6607 (id=24000)

Could not rebase; Attempt merge onto 5063082e7d...

Updating 5063082..cde82c2
Fast-forward
 docs/package-loader-tutorial.rst                 |  25 +++
 swh/loader/package/archive/tests/test_archive.py |  20 +--
 swh/loader/package/cran/tests/test_cran.py       |  20 ++-
 swh/loader/package/debian/tests/test_debian.py   |  80 +++++----
 swh/loader/package/deposit/loader.py             |   5 -
 swh/loader/package/deposit/tests/test_deposit.py |  78 +++++++--
 swh/loader/package/nixguix/tests/test_nixguix.py |  31 +++-
 swh/loader/package/npm/tests/test_npm.py         |  83 ++++-----
 swh/loader/package/opam/loader.py                |  41 ++++-
 swh/loader/package/opam/tests/test_opam.py       |  83 ++++++++-
 swh/loader/package/pypi/tests/test_pypi.py       | 205 ++++++++++++-----------
 swh/loader/tests/__init__.py                     |  16 +-
 12 files changed, 455 insertions(+), 232 deletions(-)
Changes applied before test
commit cde82c24765a77b74e04940f12e9446dd7ae05b3
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Nov 4 15:49:45 2021 +0100

    package loaders: Make test failures more helpful
    
    Some tests did the following:
    
    1. build a snapshot
    2. get the snapshot from the storage
    3. compare it with the expected snapshot
    4. get the origin visit from the storage and check it
    
    If the loader built a wrong snapshot, the test fails at step 2,
    and the only information displayed is that the expected snapshot id
    does not exist, which is very unhelpful.
    
    Instead, I reordered them as: 1, 4, 2, 3. This way, if a wrong
    snapshot is build by the loader, it is detected when comparing
    the visit, and pytest shows the two hashes.
    Then, the test can be modified to use the hash that is actually
    generated to show the actual snapshot.
    
    This is consistent with what was already done in the pypi loader.
    
    Additionally, I made the following changes:
    
    1. always check stats last (because a difference in numbers is
       hardly actionable without testing other objects)
    2. add a few more snapshot id checks in visits

commit 65ee04b10025474c59e0521aab7b847a74b38b00
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Oct 21 14:35:47 2021 +0200

    deposit: Remove 'parent' deposit
    
    The parent is computed by the deposit as the revision of the latest deposit
    in the same origin before the current one.
    Therefore, it is redundant, as it can be recomputed from metadata
    + revision date.
    
    This is a preliminary change needed to make package loaders produce
    releases instead of revisions, as releases don't have parent relationships

commit 0d04182ecce3ab34c2ed648bbafa83ab5204e0b0
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Nov 4 11:45:36 2021 +0100

    opam: Write package definitions to the extrinsic metadata storage

commit 18bbbae719fc9d165d5b543e54d449f5befc083a
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Nov 4 11:43:53 2021 +0100

    Add missing documentation for `get_metadata_authority`.

Link to build: https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/594/
See console output for more information: https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/594/console

Build has FAILED

Patch application report for D6607 (id=24001)

Could not rebase; Attempt merge onto 5063082e7d...

Updating 5063082..91b4906
Fast-forward
 docs/package-loader-tutorial.rst                 |  25 +++
 swh/loader/package/archive/tests/test_archive.py |  20 +--
 swh/loader/package/cran/tests/test_cran.py       |  20 ++-
 swh/loader/package/debian/tests/test_debian.py   |  80 +++++----
 swh/loader/package/deposit/loader.py             |   5 -
 swh/loader/package/deposit/tests/test_deposit.py |  78 +++++++--
 swh/loader/package/nixguix/tests/test_nixguix.py |  31 +++-
 swh/loader/package/npm/tests/test_npm.py         |  83 ++++-----
 swh/loader/package/opam/loader.py                |  41 ++++-
 swh/loader/package/opam/tests/test_opam.py       |  83 ++++++++-
 swh/loader/package/pypi/tests/test_pypi.py       | 205 ++++++++++++-----------
 swh/loader/tests/__init__.py                     |  16 +-
 12 files changed, 455 insertions(+), 232 deletions(-)
Changes applied before test
commit 91b49060e86340b7db8a9737b21f41c84261b1ec
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Nov 4 15:49:45 2021 +0100

    package loaders: Make test failures more helpful
    
    Some tests did the following:
    
    1. build a snapshot
    2. get the snapshot from the storage
    3. compare it with the expected snapshot
    4. get the origin visit from the storage and check it
    
    If the loader built a wrong snapshot, the test fails at step 2,
    and the only information displayed is that the expected snapshot id
    does not exist, which is very unhelpful.
    
    Instead, I reordered them as: 1, 4, 2, 3. This way, if a wrong
    snapshot is build by the loader, it is detected when comparing
    the visit, and pytest shows the two hashes.
    Then, the test can be modified to use the hash that is actually
    generated to show the actual snapshot.
    
    This is consistent with what was already done in the pypi loader.
    
    Additionally, I made the following changes:
    
    1. always check stats last (because a difference in numbers is
       hardly actionable without testing other objects)
    2. add a few more snapshot id checks in visits
    3. deduplicated a hardcoded snapshot id.

commit 65ee04b10025474c59e0521aab7b847a74b38b00
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Oct 21 14:35:47 2021 +0200

    deposit: Remove 'parent' deposit
    
    The parent is computed by the deposit as the revision of the latest deposit
    in the same origin before the current one.
    Therefore, it is redundant, as it can be recomputed from metadata
    + revision date.
    
    This is a preliminary change needed to make package loaders produce
    releases instead of revisions, as releases don't have parent relationships

commit 0d04182ecce3ab34c2ed648bbafa83ab5204e0b0
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Nov 4 11:45:36 2021 +0100

    opam: Write package definitions to the extrinsic metadata storage

commit 18bbbae719fc9d165d5b543e54d449f5befc083a
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Nov 4 11:43:53 2021 +0100

    Add missing documentation for `get_metadata_authority`.

Link to build: https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/595/
See console output for more information: https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/595/console

Harbormaster returned this revision to the author for changes because remote builds failed.Nov 4 2021, 3:53 PM
Harbormaster failed remote builds in B24874: Diff 24001!

Build is green

Patch application report for D6607 (id=24003)

Could not rebase; Attempt merge onto 5063082e7d...

Updating 5063082..dffd638
Fast-forward
 docs/package-loader-tutorial.rst                 |  25 +++
 swh/loader/package/archive/tests/test_archive.py |  20 +--
 swh/loader/package/cran/tests/test_cran.py       |  20 ++-
 swh/loader/package/debian/tests/test_debian.py   |  80 +++++----
 swh/loader/package/deposit/loader.py             |   5 -
 swh/loader/package/deposit/tests/test_deposit.py |  78 +++++++--
 swh/loader/package/nixguix/tests/test_nixguix.py |  31 +++-
 swh/loader/package/npm/tests/test_npm.py         |  83 ++++-----
 swh/loader/package/opam/loader.py                |  41 ++++-
 swh/loader/package/opam/tests/test_opam.py       |  83 ++++++++-
 swh/loader/package/pypi/tests/test_pypi.py       | 205 ++++++++++++-----------
 swh/loader/tests/__init__.py                     |  16 +-
 swh/loader/tests/test_init.py                    |   4 +-
 13 files changed, 458 insertions(+), 233 deletions(-)
Changes applied before test
commit dffd6387dfcdc4bb535c13916299d246c0f4d169
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Nov 4 15:49:45 2021 +0100

    package loaders: Make test failures more helpful
    
    Some tests did the following:
    
    1. build a snapshot
    2. get the snapshot from the storage
    3. compare it with the expected snapshot
    4. get the origin visit from the storage and check it
    
    If the loader built a wrong snapshot, the test fails at step 2,
    and the only information displayed is that the expected snapshot id
    does not exist, which is very unhelpful.
    
    Instead, I reordered them as: 1, 4, 2, 3. This way, if a wrong
    snapshot is build by the loader, it is detected when comparing
    the visit, and pytest shows the two hashes.
    Then, the test can be modified to use the hash that is actually
    generated to show the actual snapshot.
    
    This is consistent with what was already done in the pypi loader.
    
    Additionally, I made the following changes:
    
    1. always check stats last (because a difference in numbers is
       hardly actionable without testing other objects)
    2. add a few more snapshot id checks in visits
    3. deduplicated a hardcoded snapshot id.

commit 65ee04b10025474c59e0521aab7b847a74b38b00
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Oct 21 14:35:47 2021 +0200

    deposit: Remove 'parent' deposit
    
    The parent is computed by the deposit as the revision of the latest deposit
    in the same origin before the current one.
    Therefore, it is redundant, as it can be recomputed from metadata
    + revision date.
    
    This is a preliminary change needed to make package loaders produce
    releases instead of revisions, as releases don't have parent relationships

commit 0d04182ecce3ab34c2ed648bbafa83ab5204e0b0
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Nov 4 11:45:36 2021 +0100

    opam: Write package definitions to the extrinsic metadata storage

commit 18bbbae719fc9d165d5b543e54d449f5befc083a
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Nov 4 11:43:53 2021 +0100

    Add missing documentation for `get_metadata_authority`.

See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/596/ for more details.

assert_last_visit_matches and check_snapshot should probably be merged somehow:

it would make sense to show the (full) diff between the branches of the expected snapshot and the actual snapshot added to the storage (regardless of id), rather than just having what is essentially a boolean "yes, the snapshot matches" (which is essentially what happens when the ids are checked for equality).

Overall I think your logic is sound, I just think it should go one step further.

(maybe making the diff recursive, i.e. "these branches are different, and here are the objects they point to" would make sense as well, but I don't think we have a lot of deep structure for the objects that we expect to have archived in these tests)

This revision is now accepted and ready to land.Nov 5 2021, 9:31 AM

551c55ff04774b2c4e50e54f701c514064607f6a