Page MenuHomeSoftware Heritage

Allow filtering extids per extid_version/extid_type when reading
ClosedPublic

Authored by ardumont on Sep 13 2021, 5:15 PM.

Details

Summary

This impacts both the extid_get_from_extid and extid_get_from_target endpoints.

Whe extid_version/extid_type are not provided, this keeps the existing behavior of
returning all extids matching.

Related to T3567

Test Plan

tox happy:

1087 passed, 37 skipped, 3 xfailed, 9 warnings in 189.22s (0:03:09)

Diff Detail

Repository
rDSTO Storage manager
Branch
master
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 23601
Build 36825: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 36824: arc lint + arc unit

Event Timeline

Build is green

Patch application report for D6249 (id=22628)

Rebasing onto 8e94afaa0e...

Current branch diff-target is up to date.
Changes applied before test
commit 503cdbac673603e8d3b543f60c10300a385d9409
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Mon Sep 13 14:25:34 2021 +0200

    Allow filtering extids per extid_version when reading
    
    This impacts both the `extid_get_from_extid` and `extid_get_from_target` endpoints.
    
    If the version is not provided, this keeps the existing behavior of returning all extids
    matching.
    
    Related to T3567

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1404/ for more details.

cassandra implementation for extid_get_from_target needs to be changed to actually allow filtering on both extid_type and extid_version.

ardumont added subscribers: vlorentz, olasd.

Add extid_get_from_target to filter correctly on extid_type and extid_version.

As far as I understood @vlorentz and @olasd, the index part both in cassandra and
postgres needs some update. It's not done yet.

The cassandra part is fuzzy to me though.

ardumont retitled this revision from Allow filtering extids per extid_version when reading to Allow filtering extids per extid_version/extid_type when reading.Sep 14 2021, 6:50 PM
ardumont edited the summary of this revision. (Show Details)

Build has FAILED

Patch application report for D6249 (id=22666)

Rebasing onto 589d20ed64...

First, rewinding head to replay your work on top of it...
Applying: Allow filtering extids per extid_version/extid_type when reading
Changes applied before test
commit e1a46bf93d939b834863b7d263ff0efc6ff4495d
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Mon Sep 13 14:25:34 2021 +0200

    Allow filtering extids per extid_version/extid_type when reading
    
    This impacts both the `extid_get_from_extid` and `extid_get_from_target` endpoints.
    
    Whe extid_version/extid_type are not provided, this keeps the existing behavior of
    returning all extids matching.
    
    Related to T3567

Link to build: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1405/
See console output for more information: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1405/console

Build has FAILED

Patch application report for D6249 (id=22667)

Rebasing onto 589d20ed64...

First, rewinding head to replay your work on top of it...
Applying: Allow filtering extids per extid_version/extid_type when reading
Changes applied before test
commit 6faab80522982ce96b2cdedd774866fbb7dab62b
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Mon Sep 13 14:25:34 2021 +0200

    Allow filtering extids per extid_version/extid_type when reading
    
    This impacts both the `extid_get_from_extid` and `extid_get_from_target` endpoints.
    
    Whe extid_version/extid_type are not provided, this keeps the existing behavior of
    returning all extids matching.
    
    Related to T3567

Link to build: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1406/
See console output for more information: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1406/console

Forgot to catch the RemoteException case for the edge cases of the endpoint adaptation

Build is green

Patch application report for D6249 (id=22670)

Rebasing onto 589d20ed64...

First, rewinding head to replay your work on top of it...
Applying: Allow filtering extids per extid_version/extid_type when reading
Changes applied before test
commit a81a62bfa5fd19ff63dbb3573d64bbf617e53eb2
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Mon Sep 13 14:25:34 2021 +0200

    Allow filtering extids per extid_version/extid_type when reading
    
    This impacts both the `extid_get_from_extid` and `extid_get_from_target` endpoints.
    
    Whe extid_version/extid_type are not provided, this keeps the existing behavior of
    returning all extids matching.
    
    Related to T3567

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1407/ for more details.

Drop the extra index and use the optimization started in D6263 (it works now)

Close D6263

Build was aborted

Patch application report for D6249 (id=22693)

Rebasing onto 589d20ed64...

First, rewinding head to replay your work on top of it...
Applying: Allow filtering extids per extid_version/extid_type when reading
Changes applied before test
commit 9e0600980c24cb63dd14a427279668cd59cbe72b
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Mon Sep 13 14:25:34 2021 +0200

    Allow filtering extids per extid_version/extid_type when reading
    
    This impacts both the `extid_get_from_extid` and `extid_get_from_target` endpoints.
    
    Whe extid_version/extid_type are not provided, this keeps the existing behavior of
    returning all extids matching.
    
    Related to T3567

Link to build: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1409/
See console output for more information: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1409/console

vlorentz added inline comments.
swh/storage/cassandra/cql.py
1338–1340

actually it does not, so the error is legitimate

You can rephrase the comment like this:

Rows are partitioned by token(extid_type, extid), then ordered (aka. "clustered") by (extid_type, extid, extid_version, ...).
This means that, without knowing the exact extid_type and extid, we need to scan the whole partition; which should be reasonably small.
We can change the schema later if this becomes an issue

swh/storage/in_memory.py
733–759

what about splitting this into two functions (one for each case), to make it more readable?

This revision now requires changes to proceed.Sep 15 2021, 4:03 PM
ardumont added inline comments.
swh/storage/cassandra/cql.py
1338–1340

Thanks that's way better.

swh/storage/in_memory.py
733–759

sure

ardumont marked 2 inline comments as done.

Adapt according to suggestions.

This revision is now accepted and ready to land.Sep 15 2021, 5:23 PM

Build is green

Patch application report for D6249 (id=22718)

Rebasing onto 589d20ed64...

First, rewinding head to replay your work on top of it...
Applying: Allow filtering extids per extid_version/extid_type when reading
Changes applied before test
commit ba28141eae5fc76553cf4fdec51d1c1f7b72e6be
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Mon Sep 13 14:25:34 2021 +0200

    Allow filtering extids per extid_version/extid_type when reading
    
    This impacts both the `extid_get_from_extid` and `extid_get_from_target` endpoints.
    
    Whe extid_version/extid_type are not provided, this keeps the existing behavior of
    returning all extids matching.
    
    Related to T3567

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1412/ for more details.

swh/storage/postgresql/db.py
897–905

Please don't mix f-strings and %-formatting

Don't mix f-string and %-formatted string. I used the f-string instead, clearer to read.

swh/storage/postgresql/db.py
897–905

ack

Don't mix f-string and %-format strings (i missed one)

Build is green

Patch application report for D6249 (id=22721)

Rebasing onto 589d20ed64...

First, rewinding head to replay your work on top of it...
Applying: Allow filtering extids per extid_version/extid_type when reading
Changes applied before test
commit 3c6589d2f0149a7187807c2e583d3cbbe3acb894
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Mon Sep 13 14:25:34 2021 +0200

    Allow filtering extids per extid_version/extid_type when reading
    
    This impacts both the `extid_get_from_extid` and `extid_get_from_target` endpoints.
    
    Whe extid_version/extid_type are not provided, this keeps the existing behavior of
    returning all extids matching.
    
    Related to T3567

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1414/ for more details.

Build was aborted

Patch application report for D6249 (id=22720)

Rebasing onto 589d20ed64...

First, rewinding head to replay your work on top of it...
Applying: Allow filtering extids per extid_version/extid_type when reading
Changes applied before test
commit 25f3b5e91431682965058f65f4b4ec4042be58f1
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Mon Sep 13 14:25:34 2021 +0200

    Allow filtering extids per extid_version/extid_type when reading
    
    This impacts both the `extid_get_from_extid` and `extid_get_from_target` endpoints.
    
    Whe extid_version/extid_type are not provided, this keeps the existing behavior of
    returning all extids matching.
    
    Related to T3567

Link to build: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1413/
See console output for more information: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1413/console

Build is green

Patch application report for D6249 (id=22724)

Rebasing onto 589d20ed64...

Current branch diff-target is up to date.
Changes applied before test
commit a9fde72c48162a6710add58355f2802f88c907f0
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Mon Sep 13 14:25:34 2021 +0200

    Allow filtering extids per extid_version/extid_type when reading
    
    This impacts both the `extid_get_from_extid` and `extid_get_from_target` endpoints.
    
    Whe extid_version/extid_type are not provided, this keeps the existing behavior of
    returning all extids matching.
    
    Related to T3567

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1415/ for more details.