Page MenuHomeSoftware Heritage

update indexer for storage 0.0.156
ClosedPublic

Authored by douardda on Oct 31 2019, 2:08 PM.

Details

Summary

this imply a refactoring of the db schema for origin_intrinsic_metadata, since
we do not have nor want numerical ids for origins, we use origin urls instead.

Note that IndexerStorage.origin_intrinsic_metadata_search_by_producer() still
have start/end arguments which are expected to be strings, thus uses
lexicographic comparisons between origin urls. This is far from
ideal, but a proper fix requires an (new?) endpoint that handle pagination
properly.

Depends on D2206

Diff Detail

Repository
rDCIDX Metadata indexer
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

vlorentz added a subscriber: vlorentz.
vlorentz added inline comments.
swh/indexer/storage/__init__.py
695

singular (same for other occurences)

786

Make sure swh indexer schedule reindex_origin_metadata still works after this change.
You should also probably explain quickly in the docstring how to paginate properly now.

791–793

must be updated

swh/indexer/storage/in_memory.py
715

forgot to change id

swh/indexer/tests/storage/test_storage.py
453–456

self.origin_url_

1556–1571

that doesn't test how a consumer of the API would use it. It needs to start with start='url', then incrementally get new results using only the results from the previous call.

This revision now requires changes to proceed.Oct 31 2019, 3:35 PM

fix almost all vlorentz' comments + fix in_memory's SubStorage.get_all()

douardda added inline comments.
swh/indexer/storage/__init__.py
786

You should also probably explain quickly in the docstring how to paginate properly now.

unfortunately I have no idea how to do such a thing.

vlorentz added inline comments.
swh/indexer/storage/__init__.py
786

Using the last result of the previous call as a start. It also means that this endpoint should return only results strictly greater than start.

swh/indexer/tests/storage/test_storage.py
1556–1571

Sorry, I meant it should start with start=''.

This revision now requires changes to proceed.Nov 4 2019, 12:05 PM
swh/indexer/tests/storage/test_storage.py
1556–1571

I agree, but I just replicated what the tests used to do.
Once again, I believe the kind of modifications you ask for should be in a dedicated diff.

Let's not forget to fix swh.indexer.cli.list_origins_by_producer quickly, as it's broken by this change. (It should also be tested)

This revision is now accepted and ready to land.Nov 4 2019, 4:31 PM
This revision was landed with ongoing or failed builds.Nov 5 2019, 4:04 PM
This revision was automatically updated to reflect the committed changes.