Page MenuHomeSoftware Heritage

Storage: Add origin_count method
ClosedPublic

Authored by anlambert on Feb 4 2019, 4:47 PM.

Details

Summary

Add a method to count the number of origins whose urls contain a given string pattern.
Its purpose is to be used in swh-web to count and display the number origins associated
to each source code provider referenced in the archive coverage list.

Related T1463

Diff Detail

Repository
rDSTO Storage manager
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

vlorentz added inline comments.
swh/storage/db.py
687–689

offset and limit should be removed from the doc.

695–709

Query building should be factorized with origin_search.

swh/storage/in_memory.py
900–901

What's the difference with the former code?

swh/storage/tests/test_storage.py
2255

Should have a test with regexp=False too

anlambert added inline comments.
swh/storage/db.py
687–689

oops, forgot these ones

695–709

Agreed, will work something out

swh/storage/in_memory.py
900–901

I wanted to remove the limit restriction for counting all found origins but I found a better way without changing
that line. This will be removed in next update.

swh/storage/tests/test_storage.py
2255

ack

Update: Address vlorentz comments

vlorentz added inline comments.
swh/storage/db.py
630–636

nitpick: I find this to be more readable:

if count:
    origin_cols = 'COUNT(*)'
else:
    origin_cols = ','.join(self.origin_cols)
642–643

this can be a single line

643–645

so can this

This revision is now accepted and ready to land.Feb 5 2019, 4:38 PM

Update: Rebase and address latest vlorentz comments

You need to rebase on master to fix the build failure.

Oh wait, my mistake. I tagged a new version of swh.core that broke that particular test.

Ok I let you handle the fix, I will rebase again afterwards if needed.

This revision was automatically updated to reflect the committed changes.