Page MenuHomeSoftware Heritage

Fix direct sql query for directories to the archive
ClosedPublic

Authored by aeviso on Jan 20 2022, 5:53 PM.

Details

Summary

Duplicated entries are now filtered by a SELECT DISTINCT clause.

Diff Detail

Repository
rDPROV Provenance database
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

LGTM.

As suggested on IRC, if you want to exert that code, you could place a call to update directory set file_entries = file_entries || file_entries, dir_entries = dir_entries || dir_entries after seeding the storage, which will replicate the duplication that seems to have happened in the production storage.

olasd published this revision for review.Jan 20 2022, 5:58 PM
olasd accepted this revision.
This revision is now accepted and ready to land.Jan 20 2022, 5:58 PM

Build is green

Patch application report for D6991 (id=25354)

Could not rebase; Attempt merge onto cc7401096d...

Updating cc74010..128d173
Fast-forward
 requirements-swh.txt                 |  1 +
 swh/provenance/__init__.py           |  9 ++++++-
 swh/provenance/archive.py            |  3 +--
 swh/provenance/postgresql/archive.py | 14 +++++------
 swh/provenance/storage/archive.py    |  2 +-
 swh/provenance/swhgraph/__init__.py  |  0
 swh/provenance/swhgraph/archive.py   | 46 ++++++++++++++++++++++++++++++++++++
 7 files changed, 64 insertions(+), 11 deletions(-)
 create mode 100644 swh/provenance/swhgraph/__init__.py
 create mode 100644 swh/provenance/swhgraph/archive.py
Changes applied before test
commit 128d1734974798536f0716a213a2f0982a1f785e
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Thu Jan 20 17:51:18 2022 +0100

    Fix direct sql query for directories to the archive
    
    Duplicated entries are now filtered by a `SELECT DISTINCT` clause.

commit 846427ea1ce130e1e3d9fd62c40154ff587bbace
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Thu Jan 20 16:08:32 2022 +0100

    Add partial implementation of `ArchiveGraph` class

commit eebf1f7889f1c9072ba8b8c8d0325d151b1ff014
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Thu Jan 20 15:35:43 2022 +0100

    Remove ordered result constrain from `snapshot_get_heads`
    
    It is not require anymore after simplifying the origin-revision
    layer algorithm.

See https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/562/ for more details.

This revision was landed with ongoing or failed builds.Jan 20 2022, 6:16 PM
This revision was automatically updated to reflect the committed changes.

Build is green

Patch application report for D6991 (id=25355)

Rebasing onto cc7401096d...

Current branch diff-target is up to date.
Changes applied before test
commit 3a2f11aadb7d32d1ab8caa5c96d1fa2ea2b5f852
Author: Andres Ezequiel Viso <aeviso@softwareheritage.org>
Date:   Thu Jan 20 17:51:18 2022 +0100

    Fix direct sql query for directories to the archive
    
    Duplicated entries are now filtered by a `SELECT DISTINCT` clause.

See https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/563/ for more details.