Page MenuHomeSoftware Heritage

github/utils: Deal with exotic urls to canonicalize
ClosedPublic

Authored by ardumont on Jun 2 2022, 3:14 PM.

Details

Summary

The sample of those exotic urls got extracted out the staging scheduler [1].

And the actual run of that code makes the github api (without mocks) returns the correct
canonical urls [2]

Related to T3874

[1] P1371

[2]

$ ipython
...
In [1]: from swh.core.github.utils import get_canonical_github_origin_url

In [2]: get_canonical_github_origin_url('git@github.com/huaweicloud/huaweicloud-sdk-java-v3.git')
No tokens set in configuration, using anonymous mode
Out[2]: 'https://github.com/huaweicloud/huaweicloud-sdk-java-v3'

In [3]: get_canonical_github_origin_url('git//github.com/powertac/powertac-server.git')
No tokens set in configuration, using anonymous mode
Out[3]: 'https://github.com/powertac/powertac-server'

In [4]: get_canonical_github_origin_url('https://${env.GITHUB_USER}:${env.GITHUB_TOKEN}@github.com/molgenis/vibe.git')
No tokens set in configuration, using anonymous mode
Out[4]: 'https://github.com/molgenis/vibe'

In [5]: get_canonical_github_origin_url('ssh://git@github.com/softwaremagico/ThinkMachine.git')
No tokens set in configuration, using anonymous mode
Out[5]: 'https://github.com/softwaremagico/ThinkMachine'

In [6]: get_canonical_github_origin_url('ssh://github.com:alibaba/SmartEngine.git')
No tokens set in configuration, using anonymous mode
Out[6]: 'https://github.com/alibaba/SmartEngine'

In [7]: get_canonical_github_origin_url('//github.com:networknt/light-tram-kafka.git')
No tokens set in configuration, using anonymous mode
Out[7]: 'https://github.com/networknt/light-tram-kafka'

In [8]: get_canonical_github_origin_url('[fetch=]git@github.com:turnonline/ecosystem-admin-widgets.git')
No tokens set in configuration, using anonymous mode
Out[8]: 'https://github.com/turnonline/ecosystem-admin-widgets'

In [9]: get_canonical_github_origin_url('git@github.com:ttulka/spring-boot-configuration-properties-store.git')
No tokens set in configuration, using anonymous mode
Out[9]: 'https://github.com/ttulka/spring-boot-configuration-properties-store'

Diff Detail

Repository
rDCORE Foundations and core functionalities
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build is green

Patch application report for D7946 (id=28615)

Rebasing onto 4fc5f601b6...

Current branch diff-target is up to date.
Changes applied before test
commit 4a32a4610d174545eff53c6753ee8fd55060dca5
Author: Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>
Date:   Thu Jun 2 15:09:54 2022 +0200

    github/utils: Deal with exotic urls to canonicalize
    
    The sample of those exotic urls got extracted out the staging scheduler [1].
    
    And the actual run of that code makes the github api (without mocks) returns the correct
    canonical urls [2]
    
    Related to T3874
    
    [1] P1371
    
    [2]
$ ipython
...
In [1]: from swh.core.github.utils import get_canonical_github_origin_url

In [2]: get_canonical_github_origin_url('git@github.com/huaweicloud/huaweicloud-sdk-java-v3.git')
No tokens set in configuration, using anonymous mode
Out[2]: 'https://github.com/huaweicloud/huaweicloud-sdk-java-v3'

In [3]: get_canonical_github_origin_url('git//github.com/powertac/powertac-server.git')
No tokens set in configuration, using anonymous mode
Out[3]: 'https://github.com/powertac/powertac-server'

In [4]: get_canonical_github_origin_url('https://${env.GITHUB_USER}:${env.GITHUB_TOKEN}@github.com/molgenis/vibe.git')
No tokens set in configuration, using anonymous mode
Out[4]: 'https://github.com/molgenis/vibe'

In [5]: get_canonical_github_origin_url('ssh://git@github.com/softwaremagico/ThinkMachine.git')
No tokens set in configuration, using anonymous mode
Out[5]: 'https://github.com/softwaremagico/ThinkMachine'

In [6]: get_canonical_github_origin_url('ssh://github.com:alibaba/SmartEngine.git')
No tokens set in configuration, using anonymous mode
Out[6]: 'https://github.com/alibaba/SmartEngine'

In [7]: get_canonical_github_origin_url('//github.com:networknt/light-tram-kafka.git')
No tokens set in configuration, using anonymous mode
Out[7]: 'https://github.com/networknt/light-tram-kafka'
```
See https://jenkins.softwareheritage.org/job/DCORE/job/tests-on-diff/439/ for more details.
ardumont added inline comments.
swh/core/github/utils.py
22

lies!

22

coverage report lies!

Build is green

Patch application report for D7946 (id=28616)

Rebasing onto 4fc5f601b6...

Current branch diff-target is up to date.
Changes applied before test
commit 7923da2b8adb36f4885acf2ba6e953d626ef4513
Author: Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>
Date:   Thu Jun 2 15:09:54 2022 +0200

    github/utils: Deal with exotic urls to canonicalize
    
    The sample of those exotic urls got extracted out the staging scheduler [1].
    
    And the actual run of that code makes the github api (without mocks) returns the correct
    canonical urls [2]
    
    Related to T3874
    
    [1] P1371
    
    [2]
$ ipython
...
In [1]: from swh.core.github.utils import get_canonical_github_origin_url

In [2]: get_canonical_github_origin_url('git@github.com/huaweicloud/huaweicloud-sdk-java-v3.git')
No tokens set in configuration, using anonymous mode
Out[2]: 'https://github.com/huaweicloud/huaweicloud-sdk-java-v3'

In [3]: get_canonical_github_origin_url('git//github.com/powertac/powertac-server.git')
No tokens set in configuration, using anonymous mode
Out[3]: 'https://github.com/powertac/powertac-server'

In [4]: get_canonical_github_origin_url('https://${env.GITHUB_USER}:${env.GITHUB_TOKEN}@github.com/molgenis/vibe.git')
No tokens set in configuration, using anonymous mode
Out[4]: 'https://github.com/molgenis/vibe'

In [5]: get_canonical_github_origin_url('ssh://git@github.com/softwaremagico/ThinkMachine.git')
No tokens set in configuration, using anonymous mode
Out[5]: 'https://github.com/softwaremagico/ThinkMachine'

In [6]: get_canonical_github_origin_url('ssh://github.com:alibaba/SmartEngine.git')
No tokens set in configuration, using anonymous mode
Out[6]: 'https://github.com/alibaba/SmartEngine'

In [7]: get_canonical_github_origin_url('//github.com:networknt/light-tram-kafka.git')
No tokens set in configuration, using anonymous mode
Out[7]: 'https://github.com/networknt/light-tram-kafka'

In [8]: get_canonical_github_origin_url('[fetch=]git@github.com:turnonline/ecosystem-admin-widgets.git')
No tokens set in configuration, using anonymous mode
Out[8]: 'https://github.com/turnonline/ecosystem-admin-widgets'

In [9]: get_canonical_github_origin_url('git@github.com:ttulka/spring-boot-configuration-properties-store.git')
No tokens set in configuration, using anonymous mode
Out[9]: 'https://github.com/ttulka/spring-boot-configuration-properties-store'

```
See https://jenkins.softwareheritage.org/job/DCORE/job/tests-on-diff/440/ for more details.
anlambert added inline comments.
swh/core/github/utils.py
22

Just by curiosity, did you try to use urlparse from urllib.parse Python module instead of a regexp ?

>>> s = "//github.com/toto/titi"
>>> from urllib.parse import urlparse
>>> urlparse(s)
ParseResult(scheme='', netloc='github.com', path='/toto/titi', params='', query='', fragment='')

The user repo should be in the path field of ParseResult object, looks simpler to me.

vlorentz added inline comments.
swh/core/github/tests/test_github_utils.py
20–21

it seems weird to have a function just for a docstring, but whatever

37

at this point, s/protocol/prefix/ here and in the function signature.

45

that's oddly specific, where does this example come from?

swh/core/github/tests/test_github_utils.py
20–21

i did not get it.

37

yeah, better.

45

The paste mentioned in the diff description [1] (coming out of the full maven listing).

[1] P1371

swh/core/github/utils.py
22

not tested at all, thx for the idea.

swh/core/github/tests/test_github_utils.py
45

fwiw, maven listing still ongoing

swh/core/github/utils.py
22

unfortunately, It won't work for other ones:

In [1]: url='git@github.com/user/repo.git'

In [2]: from urllib.parse import urlparse

In [3]: urlparse(url)
Out[3]: ParseResult(scheme='', netloc='', path='git@github.com/user/repo.git', params='', query='', fragment='')
swh/core/github/tests/test_github_utils.py
20–21

s/docstring/f-string/

Drop function for f-string, simpler indeed.

Looks good to me.

swh/core/github/utils.py
22

ack, too much garbage in maven listing ;-)

This revision is now accepted and ready to land.Jun 2 2022, 3:58 PM
swh/core/github/utils.py
22

lolsob! yeah!

Build is green

Patch application report for D7946 (id=28623)

Rebasing onto 4fc5f601b6...

Current branch diff-target is up to date.
Changes applied before test
commit e1a1d84eb4eaa73500636c1132196ab025542ea8
Author: Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>
Date:   Thu Jun 2 15:09:54 2022 +0200

    github/utils: Deal with exotic urls to canonicalize
    
    The sample of those exotic urls got extracted out the staging scheduler [1].
    
    And the actual run of that code makes the github api (without mocks) returns the correct
    canonical urls [2]
    
    Related to T3874
    
    [1] P1371
    
    [2]
$ ipython
...
In [1]: from swh.core.github.utils import get_canonical_github_origin_url

In [2]: get_canonical_github_origin_url('git@github.com/huaweicloud/huaweicloud-sdk-java-v3.git')
No tokens set in configuration, using anonymous mode
Out[2]: 'https://github.com/huaweicloud/huaweicloud-sdk-java-v3'

In [3]: get_canonical_github_origin_url('git//github.com/powertac/powertac-server.git')
No tokens set in configuration, using anonymous mode
Out[3]: 'https://github.com/powertac/powertac-server'

In [4]: get_canonical_github_origin_url('https://${env.GITHUB_USER}:${env.GITHUB_TOKEN}@github.com/molgenis/vibe.git')
No tokens set in configuration, using anonymous mode
Out[4]: 'https://github.com/molgenis/vibe'

In [5]: get_canonical_github_origin_url('ssh://git@github.com/softwaremagico/ThinkMachine.git')
No tokens set in configuration, using anonymous mode
Out[5]: 'https://github.com/softwaremagico/ThinkMachine'

In [6]: get_canonical_github_origin_url('ssh://github.com:alibaba/SmartEngine.git')
No tokens set in configuration, using anonymous mode
Out[6]: 'https://github.com/alibaba/SmartEngine'

In [7]: get_canonical_github_origin_url('//github.com:networknt/light-tram-kafka.git')
No tokens set in configuration, using anonymous mode
Out[7]: 'https://github.com/networknt/light-tram-kafka'

In [8]: get_canonical_github_origin_url('[fetch=]git@github.com:turnonline/ecosystem-admin-widgets.git')
No tokens set in configuration, using anonymous mode
Out[8]: 'https://github.com/turnonline/ecosystem-admin-widgets'

In [9]: get_canonical_github_origin_url('git@github.com:ttulka/spring-boot-configuration-properties-store.git')
No tokens set in configuration, using anonymous mode
Out[9]: 'https://github.com/ttulka/spring-boot-configuration-properties-store'

```
See https://jenkins.softwareheritage.org/job/DCORE/job/tests-on-diff/441/ for more details.