Page MenuHomeSoftware Heritage

Storage: Add origin_count method
ClosedPublic

Authored by anlambert on Feb 4 2019, 4:47 PM.

Details

Summary

Add a method to count the number of origins whose urls contain a given string pattern.
Its purpose is to be used in swh-web to count and display the number origins associated
to each source code provider referenced in the archive coverage list.

Related T1463

Diff Detail

Repository
rDSTO Storage manager
Branch
origin-count
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 4008
Build 5254: tox-on-jenkinsJenkins
Build 5253: arc lint + arc unit

Event Timeline

vlorentz added inline comments.
swh/storage/db.py
673–675

offset and limit should be removed from the doc.

681–695

Query building should be factorized with origin_search.

swh/storage/in_memory.py
870–873

What's the difference with the former code?

swh/storage/tests/test_storage.py
2222

Should have a test with regexp=False too

anlambert added inline comments.
swh/storage/db.py
673–675

oops, forgot these ones

681–695

Agreed, will work something out

swh/storage/in_memory.py
870–873

I wanted to remove the limit restriction for counting all found origins but I found a better way without changing
that line. This will be removed in next update.

swh/storage/tests/test_storage.py
2222

ack

Update: Address vlorentz comments

vlorentz added inline comments.
swh/storage/db.py
641–642

nitpick: I find this to be more readable:

if count:
    origin_cols = 'COUNT(*)'
else:
    origin_cols = ','.join(self.origin_cols)
649–650

this can be a single line

651–653

so can this

This revision is now accepted and ready to land.Feb 5 2019, 4:38 PM

Update: Rebase and address latest vlorentz comments

You need to rebase on master to fix the build failure.

Oh wait, my mistake. I tagged a new version of swh.core that broke that particular test.

Ok I let you handle the fix, I will rebase again afterwards if needed.

This revision was automatically updated to reflect the committed changes.