Page MenuHomeSoftware Heritage

common/archive: Add branch names filtering support in lookup_snapshot
ClosedPublic

Authored by anlambert on Nov 26 2020, 6:18 PM.

Details

Summary

Enable to filter returned branches according to their name in lookup_snapshot.

Using the optional branch_name_includes parameter will only return branches
whose names contain a substring in the provided list.

Using the optional branch_name_excludes parameter will not return branches
whose names contain a substring in the provided list.

This will enable to filter out bogus pull requests branches but also to implement
a search feature in branches/releases list view.

Build will fail as it depends on newly added filtering feature in swh-storage.

Related to T2782

Depends on D5208

Diff Detail

Repository
rDWAPPS Web applications
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build has FAILED

Patch application report for D4616 (id=16364)

Rebasing onto 8492a4c688...

Current branch diff-target is up to date.
Changes applied before test
commit 676cbce4a7e8bb31bf0a70e950e910f440ac5b9e
Author: Antoine Lambert <antoine.lambert@inria.fr>
Date:   Thu Nov 26 17:54:55 2020 +0100

    common/archive: Add branch names filtering support in lookup_snapshot
    
    Enable to filter returned branches according to their name in lookup_snapshot.
    
    Using the optional branch_name_includes parameter will only return branches
    whose names contain a substring in the provided list.
    
    Using the optional branch_name_excludes parameter will not return branches
    whose names contain a substring in the provided list.
    
    Related to T2782

Link to build: https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/501/
See console output for more information: https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/501/console

Rebase and adapt according to changes in D4615.

Build has FAILED

Patch application report for D4616 (id=18569)

Rebasing onto a7e8c16a34...

Current branch diff-target is up to date.
Changes applied before test
commit 21f2c983d78cecd7c1ec0e4b9772428e8b25c826
Author: Antoine Lambert <antoine.lambert@inria.fr>
Date:   Thu Nov 26 17:54:55 2020 +0100

    common/archive: Add branch names filtering support in lookup_snapshot
    
    Enable to filter returned branches according to their name in
    lookup_snapshot.
    
    Using the optional branch_name_include_pattern parameter will
    only return branches whose name contains the provided pattern.
    
    Using the optional branch_name_exclude_pattern parameter will
    not return branches whose name contains the provided pattern.
    
    Related to T2782

Link to build: https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/606/
See console output for more information: https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/606/console

Simplify in memory implementation.

Build is green

Patch application report for D4616 (id=18659)

Could not rebase; Attempt merge onto 88ff2c2fa0...

Updating 88ff2c2f..2d8a6484
Fast-forward
 sql/upgrades/168.sql               |  39 +++++++++++
 swh/storage/cassandra/cql.py       |  31 ++++++++-
 swh/storage/cassandra/storage.py   |  27 +++++++-
 swh/storage/in_memory.py           |   8 ++-
 swh/storage/interface.py           |  10 ++-
 swh/storage/postgresql/db.py       |  28 ++++++--
 swh/storage/postgresql/storage.py  |  19 +++++-
 swh/storage/sql/30-schema.sql      |   2 +-
 swh/storage/sql/40-funcs.sql       |  12 +++-
 swh/storage/tests/storage_tests.py | 128 +++++++++++++++++++++++++++++++++++++
 10 files changed, 284 insertions(+), 20 deletions(-)
 create mode 100644 sql/upgrades/168.sql
Changes applied before test
commit 2d8a648438a9540e91ac7b637f7eb7a26c0fd091
Author: Antoine Lambert <antoine.lambert@inria.fr>
Date:   Fri Mar 5 16:33:29 2021 +0100

    storage: Allow to filter out branches by prefix when counting them
    
    Add an optional branch_name_exclude_prefix parameter to the
    snapshot_count_branches method of the Storage interface.
    
    It enables to filter out branches whose name starts with a
    given prefix when counting.
    
    The purpose is to get accurate counters in swh-web as pull
    request branches will be filtered out by default.
    
    Related to T2782

commit 2f1fdfe322435b59d08bd56e4a521d01caddacd6
Author: Antoine Lambert <antoine.lambert@inria.fr>
Date:   Tue Mar 2 14:42:57 2021 +0100

    storage: Add branch names filtering support in snapshot_get_branches
    
    Add optional branch_name_include_pattern parameter to snapshot_get_branches,
    if provided only branches whose name contains the given pattern will be
    returned.
    
    Add optional branch_name_exclude_prefix parameter to snapshot_get_branches,
    if provided branches whose name starts with the given prefix will not be
    returned.
    
    Purpose of these new features: add a search form in the branches view
    of swh-web and filter out pull request branches (whose names start with
    "refs/pull/") by default.
    
    Related to T2782

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1189/ for more details.

Build has FAILED

Patch application report for D4616 (id=18661)

Rebasing onto 93b1c0ba9a...

Current branch diff-target is up to date.
Changes applied before test
commit b9e717f23a5edd1cc1d014972a98878174a642d7
Author: Antoine Lambert <antoine.lambert@inria.fr>
Date:   Thu Nov 26 17:54:55 2020 +0100

    common/archive: Add branch names filtering support in lookup_snapshot
    
    Enable to filter returned branches according to their name in
    lookup_snapshot and lookup_snapshot_sizes.
    
    Using the optional branch_name_include_pattern parameter will
    only return branches whose name contains the provided pattern.
    
    Using the optional branch_name_exclude_prefix parameter will
    not return branches whose name starts with the provided prefix.
    
    Related to T2782

Link to build: https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/613/
See console output for more information: https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/613/console

Rebase and synchronize with latest changes in D4615 and D5208.

Build has FAILED

Patch application report for D4616 (id=18679)

Rebasing onto 63d44b561a...

Current branch diff-target is up to date.
Changes applied before test
commit 5cbd80f0e3df9beb818703e71c932045740d09d4
Author: Antoine Lambert <antoine.lambert@inria.fr>
Date:   Thu Nov 26 17:54:55 2020 +0100

    common/archive: Add branch names filtering support in lookup_snapshot
    
    Enable to filter returned branches according to their name in
    lookup_snapshot and lookup_snapshot_sizes.
    
    Using the optional branch_name_include_pattern parameter will
    only return branches whose name contains the provided pattern.
    
    Using the optional branch_name_exclude_prefix parameter will
    not return branches whose name starts with the provided prefix.
    
    Related to T2782

Link to build: https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/616/
See console output for more information: https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/616/console

How does swh-web deal with non-UTF8 branch names in general?

swh/web/common/archive.py
1073

How does swh-web deal with non-UTF8 branch names in general?

Do we have such branch names ?

How does swh-web deal with non-UTF8 branch names in general?

Do we have such branch names ?

The answer is yes:

softwareheritage=> select convert_from(name, 'utf-8') from snapshot_branch;
ERROR:  invalid byte sequence for encoding "UTF8": 0xc2 0x30
softwareheritage=>

How does swh-web deal with non-UTF8 branch names in general?

To answer your question, quoting the Web API doc: A string that could not be decoded will have the bytes of its invalid UTF-8 sequences escaped as \\x<hex value>.
See related conversion code.

Rebase and fix typo in docstring

Build has FAILED

Patch application report for D4616 (id=18723)

Rebasing onto 20da34e958...

Current branch diff-target is up to date.
Changes applied before test
commit cdd9315acd01dfb5eb2bda8118ac7c591446af34
Author: Antoine Lambert <antoine.lambert@inria.fr>
Date:   Thu Nov 26 17:54:55 2020 +0100

    common/archive: Add branch names filtering support in lookup_snapshot
    
    Enable to filter returned branches according to their name in
    lookup_snapshot and lookup_snapshot_sizes.
    
    Using the optional branch_name_include_pattern parameter will
    only return branches whose name contains the provided pattern.
    
    Using the optional branch_name_exclude_prefix parameter will
    not return branches whose name starts with the provided prefix.
    
    Related to T2782

Link to build: https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/620/
See console output for more information: https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/620/console

douardda added a subscriber: douardda.

LGTM, but I would have loved to see a test dedicated to branches (and filters) with non-ascii chars (ach time I see a <str>.encode() I expect the worst... )

This revision is now accepted and ready to land.Mar 18 2021, 12:14 PM

Update:

  • Rebase
  • Synchronize with latest changes in storage API
  • Update test with branch name containing non ASCII characters

Build is green

Patch application report for D4616 (id=18905)

Rebasing onto 26e81f330e...

Current branch diff-target is up to date.
Changes applied before test
commit 5e00370682fcb15ca371aab8ae6e25b7806cbffe
Author: Antoine Lambert <antoine.lambert@inria.fr>
Date:   Thu Nov 26 17:54:55 2020 +0100

    common/archive: Add branch names filtering support in lookup_snapshot
    
    Enable to filter returned branches according to their name in
    lookup_snapshot and lookup_snapshot_sizes.
    
    Using the optional branch_name_include_substring parameter will
    only return branches whose name contains the provided substring.
    
    Using the optional branch_name_exclude_prefix parameter will
    not return branches whose name starts with the provided prefix.
    
    Related to T2782

See https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/626/ for more details.