Page MenuHomeSoftware Heritage

Add support for indexing from head releases
ClosedPublic

Authored by vlorentz on Jun 1 2022, 5:43 PM.

Details

Summary

Needed as package loaders now create release objects instead
of revision objects since T3638.

Closes T4297.

Diff Detail

Repository
rDCIDX Metadata indexer
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build is green

Patch application report for D7941 (id=28602)

Could not rebase; Attempt merge onto 1aa8bdaca1...

Updating 1aa8bda..244f073
Fast-forward
 swh/indexer/indexer.py                    |  48 ++++----
 swh/indexer/metadata.py                   | 189 ++++++++++++++++++------------
 swh/indexer/origin_head.py                |   2 +-
 swh/indexer/sql/30-schema.sql             |  20 ++--
 swh/indexer/sql/50-func.sql               |  30 ++---
 swh/indexer/sql/60-indexes.sql            |  10 +-
 swh/indexer/sql/upgrades/134.sql          |  18 +++
 swh/indexer/storage/__init__.py           |  38 +++---
 swh/indexer/storage/db.py                 |  26 ++--
 swh/indexer/storage/in_memory.py          |  24 ++--
 swh/indexer/storage/interface.py          |  22 ++--
 swh/indexer/storage/model.py              |   6 +-
 swh/indexer/tests/conftest.py             |   2 +-
 swh/indexer/tests/storage/conftest.py     |   6 +-
 swh/indexer/tests/storage/test_storage.py | 134 ++++++++++-----------
 swh/indexer/tests/tasks.py                |  10 +-
 swh/indexer/tests/test_cli.py             |  20 ++--
 swh/indexer/tests/test_indexer.py         |  16 +--
 swh/indexer/tests/test_metadata.py        |  59 +++++-----
 swh/indexer/tests/test_origin_head.py     |   7 +-
 swh/indexer/tests/test_origin_metadata.py | 118 +++++++++++++------
 swh/indexer/tests/utils.py                |  62 +++++++++-
 22 files changed, 519 insertions(+), 348 deletions(-)
 create mode 100644 swh/indexer/sql/upgrades/134.sql
Changes applied before test
commit 244f073a9492b5e6568de455f906a6f8b8b0c3d9
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Jun 1 17:42:22 2022 +0200

    Add support for indexing from head releases
    
    Needed since package loaders now create release objects instead
    of revision objects.

commit 78903476df18f59030ce647392708841918dacb9
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Jun 1 14:42:37 2022 +0200

    Replace RevisionMetadataIndexer with DirectoryMetadataIndexer
    
    This will make it easier to support indexing from releases in the future,
    as it will remove the strong dependency on revision ids in the database
    and interfaces.
    
    The existence of the indexer/table  is mostly to deduplicate work between
    origins with the same head revision, and we do not use it outside this
    context, so this should have no impact.
    
    The DB migration works by dropping both tables and re-indexing from
    scratch; which is necessary as we need to replace revision ids with
    directory ids.

See https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/244/ for more details.

ardumont added a subscriber: ardumont.

lgtm

But i don't get why you use of assert within the runtime code instead of raising proper exception instead.

swh/indexer/metadata.py
360

Why not raise something more explicit?

This revision is now accepted and ready to land.Jun 2 2022, 12:45 PM

Build is green

Patch application report for D7941 (id=28659)

Could not rebase; Attempt merge onto ca4e91e5b7...

Updating ca4e91e..b7cb270
Fast-forward
 swh/indexer/indexer.py                    |  48 ++++----
 swh/indexer/metadata.py                   | 189 ++++++++++++++++++------------
 swh/indexer/origin_head.py                |   2 +-
 swh/indexer/sql/30-schema.sql             |  20 ++--
 swh/indexer/sql/50-func.sql               |  30 ++---
 swh/indexer/sql/60-indexes.sql            |  10 +-
 swh/indexer/sql/upgrades/134.sql          |  18 +++
 swh/indexer/storage/__init__.py           |  38 +++---
 swh/indexer/storage/db.py                 |  26 ++--
 swh/indexer/storage/in_memory.py          |  24 ++--
 swh/indexer/storage/interface.py          |  22 ++--
 swh/indexer/storage/model.py              |   6 +-
 swh/indexer/tests/conftest.py             |   2 +-
 swh/indexer/tests/storage/conftest.py     |   6 +-
 swh/indexer/tests/storage/test_storage.py | 134 ++++++++++-----------
 swh/indexer/tests/tasks.py                |  10 +-
 swh/indexer/tests/test_cli.py             |  20 ++--
 swh/indexer/tests/test_indexer.py         |  16 +--
 swh/indexer/tests/test_metadata.py        |  59 +++++-----
 swh/indexer/tests/test_origin_head.py     |   7 +-
 swh/indexer/tests/test_origin_metadata.py | 118 +++++++++++++------
 swh/indexer/tests/utils.py                |  62 +++++++++-
 22 files changed, 519 insertions(+), 348 deletions(-)
 create mode 100644 swh/indexer/sql/upgrades/134.sql
Changes applied before test
commit b7cb270ebbfc829df48b6c5a9f36f4c6cde6f672
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Jun 1 17:42:22 2022 +0200

    Add support for indexing from head releases
    
    Needed since package loaders now create release objects instead
    of revision objects.

commit 7dc09f93a7ab5bb12a80ed5d81f7ccd590752256
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Jun 1 14:42:37 2022 +0200

    Replace RevisionMetadataIndexer with DirectoryMetadataIndexer
    
    This will make it easier to support indexing from releases in the future,
    as it will remove the strong dependency on revision ids in the database
    and interfaces.
    
    The existence of the indexer/table  is mostly to deduplicate work between
    origins with the same head revision, and we do not use it outside this
    context, so this should have no impact.
    
    The DB migration works by dropping both tables and re-indexing from
    scratch; which is necessary as we need to replace revision ids with
    directory ids.

See https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/249/ for more details.

Build is green

Patch application report for D7941 (id=28667)

Could not rebase; Attempt merge onto ca4e91e5b7...

Updating ca4e91e..986c672
Fast-forward
 swh/indexer/indexer.py                    |  48 ++++----
 swh/indexer/metadata.py                   | 189 ++++++++++++++++++------------
 swh/indexer/origin_head.py                |   2 +-
 swh/indexer/sql/30-schema.sql             |  20 ++--
 swh/indexer/sql/50-func.sql               |  30 ++---
 swh/indexer/sql/60-indexes.sql            |  10 +-
 swh/indexer/sql/upgrades/134.sql          | 145 +++++++++++++++++++++++
 swh/indexer/storage/__init__.py           |  38 +++---
 swh/indexer/storage/db.py                 |  28 ++---
 swh/indexer/storage/in_memory.py          |  24 ++--
 swh/indexer/storage/interface.py          |  22 ++--
 swh/indexer/storage/model.py              |   6 +-
 swh/indexer/tests/conftest.py             |   2 +-
 swh/indexer/tests/storage/conftest.py     |   6 +-
 swh/indexer/tests/storage/test_storage.py | 134 ++++++++++-----------
 swh/indexer/tests/tasks.py                |  10 +-
 swh/indexer/tests/test_cli.py             |  20 ++--
 swh/indexer/tests/test_indexer.py         |  16 +--
 swh/indexer/tests/test_metadata.py        |  59 +++++-----
 swh/indexer/tests/test_origin_head.py     |   7 +-
 swh/indexer/tests/test_origin_metadata.py | 118 +++++++++++++------
 swh/indexer/tests/utils.py                |  62 +++++++++-
 22 files changed, 647 insertions(+), 349 deletions(-)
 create mode 100644 swh/indexer/sql/upgrades/134.sql
Changes applied before test
commit 986c672ed1acf34c3f7c0e4f2d6e959b8d012278
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Jun 1 17:42:22 2022 +0200

    Add support for indexing from head releases
    
    Needed since package loaders now create release objects instead
    of revision objects.

commit b88e9572f5aaee0707771f2e06b6ecb906a674c1
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Jun 1 14:42:37 2022 +0200

    Replace RevisionMetadataIndexer with DirectoryMetadataIndexer
    
    This will make it easier to support indexing from releases in the future,
    as it will remove the strong dependency on revision ids in the database
    and interfaces.
    
    The existence of the indexer/table  is mostly to deduplicate work between
    origins with the same head revision, and we do not use it outside this
    context, so this should have no impact.
    
    The DB migration works by dropping both tables and re-indexing from
    scratch; which is necessary as we need to replace revision ids with
    directory ids.

See https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/252/ for more details.