Page MenuHomeSoftware Heritage

tests.get_stats: Don't return a 'person' count.
ClosedPublic

Authored by vlorentz on Sep 17 2020, 2:48 PM.

Details

Summary

The deduplication logic of 'person' objects is an internal detail of
storage backends, so it's better not to rely on it.

Diff Detail

Repository
rDLDBASE Generic VCS/Package Loader
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build is green

Patch application report for D3977 (id=14011)

Could not rebase; Attempt merge onto fbe906c0c9...

Merge made by the 'recursive' strategy.
 swh/loader/core/loader.py                        | 17 ++++++++---------
 swh/loader/package/archive/tests/test_archive.py |  4 ----
 swh/loader/package/cran/tests/test_cran.py       |  2 --
 swh/loader/package/debian/tests/test_debian.py   |  3 ---
 swh/loader/package/deposit/tests/test_deposit.py |  3 ---
 swh/loader/package/nixguix/tests/test_nixguix.py |  3 ---
 swh/loader/package/npm/tests/test_npm.py         |  4 ----
 swh/loader/package/pypi/tests/test_pypi.py       |  7 -------
 swh/loader/tests/__init__.py                     |  1 -
 9 files changed, 8 insertions(+), 36 deletions(-)
Changes applied before test
commit 722eeac96ddb5391ed0b5d88592517dab7723dd2
Merge: fbe906c 60553db
Author: Jenkins user <jenkins@localhost>
Date:   Thu Sep 17 12:49:17 2020 +0000

    Merge branch 'diff-target' into HEAD

commit 60553dbd4b12d1990a284c4ac952ab846189177f
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Sep 17 14:48:15 2020 +0200

    tests.get_stats: Don't return a 'person' count.
    
    The deduplication logic of 'person' objects is an internal detail of
    storage backends, so it's better not to rely on it.

commit 46485fbe943b110a75196236dbae3da31263b755
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Sep 17 14:31:22 2020 +0200

    loader: Stop materializing full lists of objects to be stored.
    
    Since 43728c596498979cd5083b61e93360b4c2071c31, store_data consumes the entire iterator
    of contents, and since 3b97703d7f14e145d6124f1c61f5f283ee8eecf2, it does the same for
    other object types.
    
    This causes all the (new) objects of the loaded repository to be loaded
    in memory at the same time before being sent to the storage, which can
    cause OOM errors.
    
    Instead, with this commit, objects are added one by one to the storage,
    which restores the lazy behavior we had before these two commits using
    the buffered storage proxy.

See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/295/ for more details.

This revision is now accepted and ready to land.Sep 17 2020, 2:54 PM
This revision was landed with ongoing or failed builds.Sep 17 2020, 6:16 PM
This revision was automatically updated to reflect the committed changes.

Build is green

Patch application report for D3977 (id=14027)

Rebasing onto 7b2c80e708...

Current branch diff-target is up to date.
Changes applied before test
commit 64922781b0a19c6a0b2a54ba79818c3a7bd65b6a
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Sep 17 14:48:15 2020 +0200

    tests.get_stats: Don't return a 'person' count.
    
    The deduplication logic of 'person' objects is an internal detail of
    storage backends, so it's better not to rely on it.

See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/296/ for more details.