Page MenuHomeSoftware Heritage

add opam loader
ClosedPublic

Authored by aleo on Jul 6 2021, 6:38 PM.

Details

Reviewers
ardumont
vlorentz
Group Reviewers
Reviewers
Maniphest Tasks
T3425: Opam loader
Commits
rDLDBASE1602e9913b11: add opam loader
Summary

added an opam loader

Related to T3425

Test Plan

will add tests later using a local opam repository as it's done in the lister

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

I've added some tests and fixed one of the review (I don't think the other one is easily do-able for now...).

swh/loader/package/opam/loader.py
173

Done !

174–175

I don't see an easy way to do that. It's not really part of the opam repository. I guess you can get it if the opam repository is a git repository by checking the git history but I'm not sure we want to do that neither if it always makes sense.

Especially it won't be possible in the case of a local opam repository (used for tests) or for the Coq opam repository which is there: https://coq.inria.fr/opam/released/

The two opam developpers are on vacation, I'll ask them if there's an easy way to do it but I don't expect so.

cut long lines, add task test

Build has FAILED

Patch application report for D5975 (id=21610)

Rebasing onto dbb18628f1...

First, rewinding head to replay your work on top of it...
Applying: add opam loader
Changes applied before test
commit 5e5aedaedb880bad731cfc07719f8da6ce3f5159
Author: zapashcanon <leo@ndrs.fr>
Date:   Tue Jul 6 18:36:40 2021 +0200

    add opam loader

Link to build: https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/490/
See console output for more information: https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/490/console

fix task test, check that the opam root is valid

Build is green

Patch application report for D5975 (id=21614)

Rebasing onto dbb18628f1...

First, rewinding head to replay your work on top of it...
Applying: add opam loader
Changes applied before test
commit b187b00b4a6edb1e219aefe562cc66e7d75d3ff6
Author: zapashcanon <leo@ndrs.fr>
Date:   Tue Jul 6 18:36:40 2021 +0200

    add opam loader

See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/491/ for more details.

Thanks!

Please raise exceptions instead of print + exit(1); they will be run in long-running processes. And use more explicit messages (eg. "can't parse" rather than "can't get"). You could also include details in the logs (eg. extracts from the stdout the loader tried to parse)

swh/loader/package/opam/loader.py
1

copyright header

154–161

So this could potentiality contain multiple names per Person, right?

swh/loader/package/opam/tests/test_opam.py
20–23

Use this instead: https://docs.pytest.org/en/6.2.x/tmpdir.html

Unlike tempfile.mkdtemp, it cleans up the directory after the test finished (even if it failed)

(And it's a (path to a) directory, not a file)

76

can you write expected perms in octal? it's easier to read.

You can get the literal like this:

print(oct(16384))
0o40000
470

I don't think this test tests what we consider metadata; instead it's checking data used to build the revision, without checking the revision itself.

Could you rename it, and check the actual revision info?

This revision now requires changes to proceed.Jul 13 2021, 1:25 PM
aleo marked 2 inline comments as done.

use tmpdir, use octal notation for permissions

Build is green

Patch application report for D5975 (id=21619)

Rebasing onto dbb18628f1...

First, rewinding head to replay your work on top of it...
Applying: add opam loader
Changes applied before test
commit 0f896966b86b11484f903ba4c0d33f420081990e
Author: zapashcanon <leo@ndrs.fr>
Date:   Tue Jul 6 18:36:40 2021 +0200

    add opam loader

See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/492/ for more details.

try to test snapshot info

Build has FAILED

Patch application report for D5975 (id=21627)

Rebasing onto dbb18628f1...

First, rewinding head to replay your work on top of it...
Applying: add opam loader
Changes applied before test
commit 1cf9bc414cab9a9ba0fefcc86ba6de8575ff0d94
Author: zapashcanon <leo@ndrs.fr>
Date:   Tue Jul 6 18:36:40 2021 +0200

    add opam loader

Link to build: https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/493/
See console output for more information: https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/493/console

revision info check works

Build is green

Patch application report for D5975 (id=21630)

Rebasing onto dbb18628f1...

First, rewinding head to replay your work on top of it...
Applying: add opam loader
Changes applied before test
commit 6bd41a852059dc3a5e5fe6e15b2f7189d600b858
Author: zapashcanon <leo@ndrs.fr>
Date:   Tue Jul 6 18:36:40 2021 +0200

    add opam loader

See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/494/ for more details.

use exception rather than print + exit, use more explicit messages

Build is green

Patch application report for D5975 (id=21631)

Rebasing onto dbb18628f1...

First, rewinding head to replay your work on top of it...
Applying: add opam loader
Changes applied before test
commit f8e3d614d0decd214c7c889b54dec2e0ecf30235
Author: zapashcanon <leo@ndrs.fr>
Date:   Tue Jul 6 18:36:40 2021 +0200

    add opam loader

See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/495/ for more details.

swh/loader/package/opam/loader.py
154–161

Indeed. Is there another class that I should use then ?

swh/loader/package/opam/tests/test_opam.py
20–23

Done.

76

Done.

ardumont added inline comments.
swh/loader/package/opam/tasks.py
14–20

Otherwise, the loader, once deployed, won't use any storage thus not ingesting anything at the end of its loading.

fix opam loader task so the loader does use some storage

Build is green

Patch application report for D5975 (id=21689)

Rebasing onto dbb18628f1...

First, rewinding head to replay your work on top of it...
Applying: add opam loader
Changes applied before test
commit 6c7a5f4b723c0f29822e5502131a752dbfd59763
Author: zapashcanon <leo@ndrs.fr>
Date:   Tue Jul 6 18:36:40 2021 +0200

    add opam loader

See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/496/ for more details.

swh/loader/package/opam/tests/test_opam.py
20

why, if i might ask?

Can't we make it accept an already existing directory?
(I'm not sure how that would actually behave in real life in that current implementation.)

402

I understand that to see the loader does indeed ingest, it's good to check that we have
what's expected. That's too much details though, we moved away from such details and now
we check for the visit stats [1] and then the top-level visit snapshot (that you already
checked as a first step).

Can you please simplify according to [1]?

[1] https://forge.softwareheritage.org/source/swh-loader-core/browse/master/swh/loader/package/pypi/tests/test_pypi.py$225-237

405
429

you can drop those explanatory comments when your iteration of this loader is done btw ;)

497

you can drop the print statement.

aleo marked an inline comment as done.

use stats for tests, remove useless comments and print statements

swh/loader/package/opam/tests/test_opam.py
20

Because otherwise it looks like the opam root is already initialised, which is not the case. If the directory exists, I suppose that it is already initialised and thus that I don't have to initialize it. I do this because initializing an opam root is a little bit slow.

429

done

497

done

Build is green

Patch application report for D5975 (id=21691)

Rebasing onto dbb18628f1...

First, rewinding head to replay your work on top of it...
Applying: add opam loader
Changes applied before test
commit 8a2357fe79dddbba3e0df1bfc40450e4ddf39cfc
Author: zapashcanon <leo@ndrs.fr>
Date:   Tue Jul 6 18:36:40 2021 +0200

    add opam loader

See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/497/ for more details.

swh/loader/package/opam/loader.py
26

As you did for the opam lister, can you please add a small explanation summary as a
first paragraph about what the loader is doing (ingesting one opam origin which may hold
multiple tarballs). Also, add a secondary paragrah explaining a bit the state
initialization process (adding the upstream url of the documentation is fine too [2]).

Thanks in advance.

[1] https://forge.softwareheritage.org/source/swh-lister/browse/master/swh/lister/opam/lister.py$0-26

[2] http://opam.ocaml.org/doc/Manual.html

swh/loader/package/opam/tests/test_opam.py
20

thanks.

So, at the moment, for real condition using the same opamroot, one lister could run and initialize that state then other workers could reuse said state, right?

(I'm trying to make sense on how to actually make the opam lister and loader run in their current implementation)

This is all fine with me, except for the author list.

(I'm removing myself from the reviewers as I won't be available for the rest of the week and I don't want to block this diff from being merged)

swh/loader/package/opam/loader.py
154–161

Unfortunately, the data-model doesn't allow this yet.

In the meantime, I wonder if we should use a hack like Github's to embed them in the message... https://docs.github.com/en/github/committing-changes-to-your-project/creating-and-editing-commits/creating-a-commit-with-multiple-authors

This probably needs to be discussed with the whole team

add explanation about opam root

add myself to contributors

Build is green

Patch application report for D5975 (id=21694)

Rebasing onto dbb18628f1...

First, rewinding head to replay your work on top of it...
Applying: add opam loader
Changes applied before test
commit 0532b4f77e90f8001f545cbd72530917056887ce
Author: zapashcanon <leo@ndrs.fr>
Date:   Tue Jul 6 18:36:40 2021 +0200

    add opam loader

See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/498/ for more details.

swh/loader/package/opam/tests/test_opam.py
20

Yes, it's fine to have the lister and the loader to run on the same node or on different nodes.

Build is green

Patch application report for D5975 (id=21695)

Rebasing onto dbb18628f1...

First, rewinding head to replay your work on top of it...
Applying: add opam loader
Changes applied before test
commit 4d6c57a4a344437aadb204405d0a0dfa7cdc2446
Author: zapashcanon <leo@ndrs.fr>
Date:   Tue Jul 6 18:36:40 2021 +0200

    add opam loader

See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/499/ for more details.

look good to me.

One non-blocking question and a couple of suggestions inline to refactor and improve readability a bit.

swh/loader/package/opam/loader.py
59

Out of curiosity, do we really need to use the environment variable?
Can't we just use the --root flag parameter?

(not asking to change)

80–103

what do you think of the following?

That should decrease the repetition a bit.

105–132

which then becomes according to the previous suggestion.

135–155

finally ;)

it's fine for me.

I can always take care of improving the current implementation with what i suggested later ;)

This revision is now accepted and ready to land.Jul 20 2021, 11:36 AM

refactor the loader code

I updated the code as you suggested (I had to raise exceptions in some cases otherwise my editor was complaining with No return statement).

swh/loader/package/opam/loader.py
59

both are fine, the --root way will probably make the code easier to understand but I used the environment variable in the lister too... I'll change it here, don't you mind updating the lister ?

Build is green

Patch application report for D5975 (id=21699)

Rebasing onto dbb18628f1...

First, rewinding head to replay your work on top of it...
Applying: add opam loader
Changes applied before test
commit e4fbfe035320a914e6a3276b8a5d8285ecafe97e
Author: zapashcanon <leo@ndrs.fr>
Date:   Tue Jul 6 18:36:40 2021 +0200

    add opam loader

See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/500/ for more details.

swh/loader/package/opam/loader.py
59

sure thing, fine for me.

I updated the code as you suggested

great

(I had to raise exceptions in some cases otherwise my editor was complaining with No return statement).

That makes the code more clunky than it should.
I'll propose you a diff on top of this so the code is more pythonic.

tl; dr make opam_read directly return the list[str] (which splits the first line it reads from stdout).
And then adapt the code in the other methods so it has the one result it modifies along the way.

I updated the code as you suggested

great

(I had to raise exceptions in some cases otherwise my editor was complaining with No return statement).

That makes the code more clunky than it should.
I'll propose you a diff on top of this so the code is more pythonic.

D6009

tl; dr make opam_read directly return the list[str] (which splits the first line it reads from stdout).
And then adapt the code in the other methods so it has the one result it modifies along the way.

I slightly diverge from this but the gist of it remained roughtly the same idea ;)

This revision was automatically updated to reflect the committed changes.