Page MenuHomeSoftware Heritage

Rewrite origin_head.py as a normal function instead of an indexer
ClosedPublic

Authored by vlorentz on May 31 2022, 2:48 PM.

Details

Summary

We stopped using it as an indexer years ago, so it does not make sense
to keep this class around.

Additionally, replace the weird dict (needed by the indexer interface)
with a CoreSWHID (now possible, as it doesn't need to be JSON-like)

Diff Detail

Repository
rDCIDX Metadata indexer
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build is green

Patch application report for D7923 (id=28541)

Could not rebase; Attempt merge onto ecc84f3760...

Updating ecc84f3..bdec08b
Fast-forward
 swh/indexer/cli.py                    |  49 ++++--
 swh/indexer/indexer.py                |   4 -
 swh/indexer/metadata.py               |  13 +-
 swh/indexer/origin_head.py            | 243 +++++++++++----------------
 swh/indexer/tests/tasks.py            |   2 -
 swh/indexer/tests/test_cli.py         |   8 +-
 swh/indexer/tests/test_origin_head.py | 308 ++++++++++++++--------------------
 7 files changed, 268 insertions(+), 359 deletions(-)
Changes applied before test
commit bdec08b38db9295c99203fa83fe856631f02db0a
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue May 31 14:47:09 2022 +0200

    Rewrite origin_head.py as a normal function instead of an indexer
    
    We stopped using it as an indexer years ago, so it does not make sense
    to keep this class around.
    
    Additionally, replace the weird dict (needed by the indexer interface)
    with a CoreSWHID (now possible, as it doesn't need to be JSON-like)

commit f8f96257547cfd7a78c21df08ab83193ffbd9bf7
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue May 31 14:27:48 2022 +0200

    Convert test_origin_head from unittest to pytest
    
    A future commit will significantly change test initialization,
    and using fixtures simplifies this.

commit ff728e0541b36edb583f8fbad979e1f58a51d588
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue May 31 12:25:30 2022 +0200

    cli: Add support for running "all" indexers in the journal client
    
    There is only the origin-intrinsic-metadata indexer for now,
    but others will be added in the future

See https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/236/ for more details.

ardumont added a subscriber: ardumont.

lgtm

one suggestion inline.

swh/indexer/origin_head.py
26–29

_get_fn 'cause it's short and that does not make the line exceed the 88 chars limit but whatever ;)

Or even better use a hashmap key to callable defaulting to _try_get_head_generic if not found.
That's more extensible if new visit status type becomes available...

This revision is now accepted and ready to land.May 31 2022, 3:04 PM
swh/indexer/origin_head.py
26–29

it used to be that way, but it wasn't useful in practice