Page MenuHomeSoftware Heritage

cgit: Compute origin urls out of a base git url when provided.
ClosedPublic

Authored by ardumont on Jan 29 2021, 11:41 AM.

Details

Summary

This adds a second behavior to the cgit lister to actually compute origin urls instead
of parsing them out of another http request on git detailed page (per repository listed).

This new behavior is expected to be the default behavior.

The old behavior is kept for now and is expected to be used as fallback if too much
false negatives are returned.

(This is a requisite for the git.eclipse.org instance for example which cannot be listed otherwise [1]).

Related to T2999

[1] Related to T376#57560

Diff Detail

Repository
rDLS Listers
Branch
master
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 18891
Build 29267: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 29266: arc lint + arc unit

Event Timeline

Build is green

Patch application report for D4968 (id=17741)

Rebasing onto 5aa7c8f2b2...

Current branch diff-target is up to date.
Changes applied before test
commit 14d47f1074ca16f1632b24fd416e86e51fee0354
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Fri Jan 29 11:38:26 2021 +0100

    cgit: Compute origin urls out of a base git url when provided.
    
    This adds a second behavior to the cgit lister to actually compute origin urls instead
    of parsing them out of another http request on git detailed page.
    
    This new behavior is expected to be the default behavior.
    
    The old behavior is kept for now and is expected to be used as fallback if too much
    false negatives are returned.
    
    Related to T2999

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/225/ for more details.

Build is green

Patch application report for D4968 (id=17742)

Rebasing onto 5aa7c8f2b2...

Current branch diff-target is up to date.
Changes applied before test
commit 9a56097e1dfb1725a48371f208e89408eaaa7049
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Fri Jan 29 11:38:26 2021 +0100

    cgit: Compute origin urls out of a base git url when provided.
    
    This adds a second behavior to the cgit lister to actually compute origin urls instead
    of parsing them out of another http request on git detailed page.
    
    This new behavior is expected to be the default behavior.
    
    The old behavior is kept for now and is expected to be used as fallback if too much
    false negatives are returned.
    
    Related to T2999

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/226/ for more details.

anlambert added a subscriber: anlambert.

The current implementation will not work for most cases where the base cgit URL and the base clone URL differs (see inline comment).

Also tasks parameters should be updated to handle the new one.

swh/lister/cgit/lister.py
101–114

The urljoin approach will not work for most of cases. For instance with that cgit instance with base URL https://forge.frm2.tum.de/cgit/cgit.cgi/ and base clone URL https://forge.frm2.tum.de/review/ , the computed clone URLs will be in the form https://forge.frm2.tum.de/cgit/cgit.cgi/<path_to_repo>.

The approach I was thinking of is the following:

repo_url = urljoin(self.url, repository_link)

if self.base_git_url:  # mapping provided, we compute the git url
    repo_url = urljoin(self.url, repository_link)
    git_url = repo_url.replace(self.url, self.base_git_url)
    logger.debug("%s => %s", repo_url, git_url)

This way it is guaranteed to compute the good clone URLs.

This revision now requires changes to proceed.Jan 29 2021, 12:26 PM
swh/lister/cgit/lister.py
101–114

oh, right.

The current implem is not enough and the test assertion needs to be improved.

Will adapt.

swh/lister/cgit/tests/test_lister.py
205

not that important but i meant it as /r here ;)

Build is green

Patch application report for D4968 (id=17743)

Rebasing onto 5aa7c8f2b2...

Current branch diff-target is up to date.
Changes applied before test
commit e22d81d78d3e7a5fa30a6b0a2d025fa2375346f7
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Fri Jan 29 11:38:26 2021 +0100

    cgit: Compute origin urls out of a base git url when provided.
    
    This adds a second behavior to the cgit lister to actually compute origin urls instead
    of parsing them out of another http request on git detailed page.
    
    This new behavior is expected to be the default behavior.
    
    The old behavior is kept for now and is expected to be used as fallback if too much
    false negatives are returned.
    
    Related to T2999

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/227/ for more details.

swh/lister/cgit/lister.py
32–34

the listed origin urls are computed out of the base git url and the ones listed in each page (resulting in less
HTTP queries than the 2nd behavior below)

104–111

You could compute repo_url before the if and drop the else block.

swh/lister/cgit/tests/test_lister.py
200

maybe you could parametrize the test with more URLs (using edge case URLs listed in T2999).

swh/lister/cgit/tests/test_lister.py
200

ok, but that means more html sample files in that diff (currenty adding one other origin)

swh/lister/cgit/tests/test_lister.py
200

Ah right, you could isolate the code to compute URLs in a method and only test that then.

Adapt according to review:

  • Add more cgit instance samples
  • Rework docstring
  • Rework url computation a bit

Rework the lister docstring to clarify the 2nd behavior.

Build is green

Patch application report for D4968 (id=17749)

Rebasing onto 4cf0c7f765...

First, rewinding head to replay your work on top of it...
Applying: cgit: Compute origin urls out of a base git url when provided.
Changes applied before test
commit 8d989d3adabc5cf16c4397d09710b24a71836e8e
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Fri Jan 29 11:38:26 2021 +0100

    cgit: Compute origin urls out of a base git url when provided.
    
    This adds a second behavior to the cgit lister to actually compute origin urls instead
    of parsing them out of another http request on git detailed page.
    
    This new behavior is expected to be the default behavior.
    
    The old behavior is kept for now and is expected to be used as fallback if too much
    false negatives are returned.
    
    Related to T2999

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/229/ for more details.

ardumont added inline comments.
swh/lister/cgit/tests/test_lister.py
200

well, it's done now ¯\_(ツ)_/¯ :)

This revision is now accepted and ready to land.Jan 29 2021, 3:31 PM

Build is green

Patch application report for D4968 (id=17750)

Rebasing onto 4cf0c7f765...

Current branch diff-target is up to date.
Changes applied before test
commit 2e220735581532f90755fc2fccefaf70a386ded0
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Fri Jan 29 11:38:26 2021 +0100

    cgit: Compute origin urls out of a base git url when provided.
    
    This adds a second behavior to the cgit lister to actually compute origin urls instead
    of parsing them out of another http request on git detailed page.
    
    This new behavior is expected to be the default behavior.
    
    The old behavior is kept for now and is expected to be used as fallback if too much
    false negatives are returned.
    
    Related to T2999

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/230/ for more details.