Page MenuHomeSoftware Heritage

gitlab: allow ignoring projects with certain path prefixes
ClosedPublic

Authored by olasd on Dec 5 2022, 3:40 PM.

Details

Summary

Some GitLab instances use specific namespaces for transient repositories
that it doesn't make sense to archive (for example, gitlab.org has a set
of QA namespaces used for integration testing of their production
deployments; drupal has an issues/ namespace with forks of repos that
are only used for collaboration on merge requests, and aren't that
useful to be archived).

Related to T4680

Test Plan

added a test for the new flag

Diff Detail

Repository
rDLS Listers
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build is green

Patch application report for D8918 (id=32131)

Rebasing onto 64267f8f50...

Current branch diff-target is up to date.
Changes applied before test
commit 5ea79ee3e0deac0e83d2cb2ebb802181d6a5ed20
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Mon Dec 5 15:36:40 2022 +0100

    gitlab: allow ignoring projects with certain path prefixes
    
    Some GitLab instances use specific namespaces for transient repositories
    that it doesn't make sense to archive (for example, gitlab.org has a set
    of QA namespaces used for integration testing of their production
    deployments; drupal has an `issues/` namespace with forks of repos that
    are only used for collaboration on merge requests, and aren't that
    useful to be archived).

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/873/ for more details.

olasd requested review of this revision.Dec 5 2022, 3:45 PM

Could you add this check?

for prefix in ignored_project_prefixes:
    if not prefix.endswith("/"):
        raise ValueError("Prefixes must end with /")

Otherwise we are going to forget to add slashes at the end, and it will match too many namespaces.

Could you add this check?

for prefix in ignored_project_prefixes:
    if not prefix.endswith("/"):
        raise ValueError("Prefixes must end with /")

Otherwise we are going to forget to add slashes at the end, and it will match too many namespaces.

I'd rather not, as I want to explicitly allow dropping a whole set of namespaces, e.g. gitlab.com's gitlab-qa-sandbox-group-*. There's apparently only a handful yet, but I don't really want us to be chasing them either.

This revision is now accepted and ready to land.Dec 5 2022, 5:31 PM