Page MenuHomeSoftware Heritage

Golang module loader
ClosedPublic

Authored by Alphare on Aug 23 2022, 4:13 PM.

Details

Reviewers
anlambert
Group Reviewers
Reviewers
Maniphest Tasks
T4124: Golang support
Commits
rDLDBASE68e68e3f92c0: Golang module loader

Diff Detail

Repository
rDLDBASE Generic VCS/Package Loader
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build is green

Patch application report for D8296 (id=29954)

Rebasing onto 9f8bdceed1...

Current branch diff-target is up to date.
Changes applied before test
commit 3cbc9ce198d3006883d2961ea57ad4447c51205f
Author: Raphaël Gomès <rgomes@octobus.net>
Date:   Thu Mar 17 18:19:01 2022 +0100

    Non-incremental Golang module loader

See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/832/ for more details.

Do we really need to use the Go proxy? Can't/Shouldn't we access the real origins directly?

swh/loader/package/golang/loader.py
58–61

This shouldn't be called more than once, and cached_method is not available in Python 3.7 (which we still use in production)

79–92
swh/loader/package/golang/loader.py
58–61

Oops, I confused cached_method with functools.cached_property. So it is indeed available.

Is it really useful, though?

Do we really need to use the Go proxy? Can't/Shouldn't we access the real origins directly?

We can access the real origins directly, the Go proxy is simply what go get uses, so it's the expected interface to download them. I also suspect its cache will offer better speed than going directly to the origins.

swh/loader/package/golang/loader.py
58–61

It was useful in an earlier version, but it's not now indeed. :)

79–92

Done.

Remove useless cached_property and document releases

Do we really need to use the Go proxy? Can't/Shouldn't we access the real origins directly?

We can access the real origins directly, the Go proxy is simply what go get uses, so it's the expected interface to download them. I also suspect its cache will offer better speed than going directly to the origins.

Hmm... okay then. I guess for reproducibility it's more useful to get the same thing go get than the backend. What do origin URLs look like, though?

Build is green

Patch application report for D8296 (id=29984)

Rebasing onto 9f8bdceed1...

Current branch diff-target is up to date.
Changes applied before test
commit 3f83aa14db6fbe86b6161020288c5482dcd1040e
Author: Raphaël Gomès <rgomes@octobus.net>
Date:   Thu Mar 17 18:19:01 2022 +0100

    Non-incremental Golang module loader

See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/834/ for more details.

docs/package-loader-specifications.rst
81

Can there be more than one file per version? If not, you should only use version.

(And if it's possible but rare, use version for versions with a single file, and fallback to release_name(version, filename), that's what the pypi loader does)

Fix release names and update description

Hmm... okay then. I guess for reproducibility it's more useful to get the same thing go get than the backend. What do origin URLs look like, though?

They look like collectd.org/@v/v0.3.0.zip or github.com/blang/semver/@v/v3.5.1+incompatible.zip or golang.org/x/tools/@v/v0.0.0-20181213190329-bbccd8cae4a9.zip.

docs/package-loader-specifications.rst
81

Sure, let's just use version then

Build is green

Patch application report for D8296 (id=29986)

Rebasing onto 9f8bdceed1...

Current branch diff-target is up to date.
Changes applied before test
commit fb2d9e8e6ad8fa9b097dc2703980db5ad45a8f47
Author: Raphaël Gomès <rgomes@octobus.net>
Date:   Thu Mar 17 18:19:01 2022 +0100

    Golang module loader
    
    There will not be an incremental loader since versions are just zip
    files.

See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/835/ for more details.

Alphare retitled this revision from Non-incremental Golang module loader to Golang module loader.Aug 24 2022, 4:11 PM
Alphare edited the summary of this revision. (Show Details)

Hmm... okay then. I guess for reproducibility it's more useful to get the same thing go get than the backend. What do origin URLs look like, though?

They look like collectd.org/@v/v0.3.0.zip or github.com/blang/semver/@v/v3.5.1+incompatible.zip or golang.org/x/tools/@v/v0.0.0-20181213190329-bbccd8cae4a9.zip.

That's not good; there shouldn't be one origin per version. Instead, there should be one origin per package, and each version should be a release branch of its snapshot. (If possible, breaking versions should also be the same origin)

Also, they should be URLs, but the Golang loader shouldn't load https://github.com/blang/semver because 1. it would be unexpected to have non-Git visits there 2. it breaks provenance tracking to load through a third-party. So I think all origins should be prefixed by https://proxy.golang.org/

Reflect the changes from the lister

Build is green

Patch application report for D8296 (id=30000)

Rebasing onto 9f8bdceed1...

Current branch diff-target is up to date.
Changes applied before test
commit 3a172bd3a483756a6038c23589ddbe9600221b0c
Author: Raphaël Gomès <rgomes@octobus.net>
Date:   Thu Mar 17 18:19:01 2022 +0100

    Golang module loader

See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/837/ for more details.

Build is green

Patch application report for D8296 (id=30021)

Rebasing onto 9f8bdceed1...

Current branch diff-target is up to date.
Changes applied before test
commit 0217009aaba1022b73f622118de159215a61d16e
Author: Raphaël Gomès <rgomes@octobus.net>
Date:   Thu Mar 17 18:19:01 2022 +0100

    Golang module loader
    
    This uses the Golang proxy since that's what `go get` uses, and since it
    probably offer better performance than most direct sources.

See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/838/ for more details.

anlambert added a subscriber: anlambert.

I found another issue when trying to run the loader in our docker env, see inline comment.

swh/loader/package/golang/tasks.py
12–15

I have the following error when calling the loader in our docker env:

docker-swh-loader-1  | [2022-08-25 13:58:08,026: INFO/MainProcess] Task swh.loader.package.golang.tasks.LoadGolang[71ae13ba-94cb-4e49-82b6-8b75eee39829] received
docker-swh-loader-1  | [2022-08-25 13:58:08,029: ERROR/ForkPoolWorker-10384] Task swh.loader.package.golang.tasks.LoadGolang[71ae13ba-94cb-4e49-82b6-8b75eee39829] raised unexpected: TypeError("load_golang() got an unexpected keyword argument 'lister_name'")
docker-swh-loader-1  | Traceback (most recent call last):
docker-swh-loader-1  |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/celery/app/trace.py", line 451, in trace_task
docker-swh-loader-1  |     R = retval = fun(*args, **kwargs)
docker-swh-loader-1  |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/scheduler/task.py", line 61, in __call__
docker-swh-loader-1  |     result = super().__call__(*args, **kwargs)
docker-swh-loader-1  |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/celery/app/trace.py", line 734, in __protected_call__
docker-swh-loader-1  |     return self.run(*args, **kwargs)
docker-swh-loader-1  | TypeError: load_golang() got an unexpected keyword argument 'lister_name'

Use this instead to fix that issue:

@shared_task(name=__name__ + ".LoadGolang")
def load_golang(**kwargs):
    """Load Golang module"""
    loader = GolangLoader.from_configfile(**kwargs)
    return loader.load()
This revision now requires changes to proceed.Aug 25 2022, 4:34 PM

Build is green

Patch application report for D8296 (id=30082)

Rebasing onto 3ff24fb3b8...

First, rewinding head to replay your work on top of it...
Applying: Golang module loader
Changes applied before test
commit bddc082454ebadd32b3a966d57ad14bc4c228627
Author: Raphaël Gomès <rgomes@octobus.net>
Date:   Thu Mar 17 18:19:01 2022 +0100

    Golang module loader
    
    This uses the Golang proxy since that's what `go get` uses, and since it
    probably offer better performance than most direct sources.

See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/846/ for more details.

This revision is now accepted and ready to land.Aug 30 2022, 12:31 PM
This revision was landed with ongoing or failed builds.Aug 30 2022, 2:35 PM
This revision was automatically updated to reflect the committed changes.

Build is green

Patch application report for D8296 (id=30136)

Rebasing onto 68e68e3f92...

First, rewinding head to replay your work on top of it...
Fast-forwarded diff-target to base-revision-848-D8296.
Changes applied before test

See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/848/ for more details.