Page MenuHomeSoftware Heritage

Maven: fix lister after docker-dev review.
ClosedPublic

Authored by borisbaldassari on Jan 31 2022, 8:44 PM.

Details

Summary

Fixing the lister after docker-dev test runs:

  • The last_update entry in ListedOrigins was undef, now returns a datetime object.
  • The lister would produce an invalid origin (https) in the case of malformed scm poms. Now we check the visit_type before yielding scm entries to make sure we only throw valid types.
Test Plan

Added a test case for the invalid scm tag in POM file, as can happen on maven central (e.g. with sprova4j).

Diff Detail

Repository
rDLS Listers
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build is green

Patch application report for D7052 (id=25578)

Rebasing onto a599493b48...

Current branch diff-target is up to date.
Changes applied before test
commit fa4a3cc81503ecde07be0b7a8b7b4c6dafd88ec1
Author: Boris Baldassari <boris@chrysalice.org>
Date:   Mon Jan 31 20:36:57 2022 +0100

    maven: Fix last_update.

commit 7556f6bc555597d947966acc7db257b309d4dc24
Author: Boris Baldassari <boris@chrysalice.org>
Date:   Mon Jan 31 20:35:57 2022 +0100

    maven: Fix bug with malformed pom + add tests.

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/442/ for more details.

Build is green

Patch application report for D7052 (id=25581)

Rebasing onto a599493b48...

Current branch diff-target is up to date.
Changes applied before test
commit 3838edc7964e3cc5212dfb8ee6ffb853a6697892
Author: Boris Baldassari <boris@chrysalice.org>
Date:   Mon Jan 31 22:18:05 2022 +0100

    Fix last_update tzinfo.

commit fa4a3cc81503ecde07be0b7a8b7b4c6dafd88ec1
Author: Boris Baldassari <boris@chrysalice.org>
Date:   Mon Jan 31 20:36:57 2022 +0100

    maven: Fix last_update.

commit 7556f6bc555597d947966acc7db257b309d4dc24
Author: Boris Baldassari <boris@chrysalice.org>
Date:   Mon Jan 31 20:35:57 2022 +0100

    maven: Fix bug with malformed pom + add tests.

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/443/ for more details.

ardumont added a subscriber: ardumont.

I'm confused by one of the test code change. And I have some other suggestions inline ;)

Can you please change the diff's title and description to explain what's wrong and
what's being fixed? And then align this within the commit message.

Could you also please split the diff in 2, something like, one for the:

  • last update fix
  • scm origin management fix

Thanks in advance.

swh/lister/maven/tests/test_lister.py
37

I'm confused by the value's type change. What is that variable a source for?
Does it represent the output of the maven indexer or something else (like an internal test data)?

73–74
133–134

same comment about improving the assertion error message.

136–137

TIL about the for... else instruction.

So currently with that instruction and the current code, that will raise if we do not pass in at least one conditional within the loop right?

If i'm correct, i'd suggest we had a assertion error message to clarify what's wrong here (tentatively suggested something, feel free to improve).

This revision now requires changes to proceed.Feb 1 2022, 10:26 AM
borisbaldassari marked 4 inline comments as done.EditedFeb 2 2022, 8:19 AM

I'm confused by one of the test code change. And I have some other suggestions inline ;)

Explained and fixed the comments.

Can you please change the diff's title and description to explain what's wrong and
what's being fixed? And then align this within the commit message.

Fixed for the description. As for the commit messages, would simply rebasing + squashing the last 2 commits into one be ok for you? (in my view the messages *do* reflect what they fix so not sure what should be done). I've tried to improve and be more precise in the wording.

Could you also please split the diff in 2, something like, one for the:

  • last update fix
  • scm origin management fix

Hum. I'm not sure how to do that. These changes happen roughly in the same region of the same file, so it would imply to have the fixes in parallel versions (/branches) of the file, right?

My intention by having two (well, now three) commits was precisely to distinguish the two fixes. Do you want me to apply the commits on 2 branches so as to create 2 fixes? Or how should I proceed?

Thanks!

swh/lister/maven/tests/test_lister.py
37

Double-checking with the loader's entry I realised it takes an iso8601 string as metadata > time. So I fixed that, updated code and tests in the lister, making sure it fits with the loader.

See https://forge.softwareheritage.org/source/swh-loader-core/browse/master/swh/loader/package/maven/loader.py$86

The "time" metadata entry is used in the loader for the extid only.

73–74

Fixed (on all fixtures).

133–134

Fixed as well.

136–137

Yes, what we're doing here is find an origin in reference set that matches the scheduler one.
Added string to clarify error.

borisbaldassari edited the test plan for this revision. (Show Details)
borisbaldassari marked 4 inline comments as done.
  • maven: dismiss origins if they are malformed - e.g. wrong pom scm format, add test.
  • maven: fix undef last_update in ListedOrigins.

Build is green

Patch application report for D7052 (id=25605)

Rebasing onto a599493b48...

Current branch diff-target is up to date.
Changes applied before test
commit 2595d628d596983311dd38ff7c7bc981c1d81700
Author: Boris Baldassari <boris@chrysalice.org>
Date:   Mon Jan 31 20:36:57 2022 +0100

    maven: fix undef last_update in ListedOrigins.
    
    maven: Fix last_update tzinfo.
    
    maven: Fix/document last_update test case.

commit e07bffeb5973a32600258d77d200b4e8998e9db0
Author: Boris Baldassari <boris@chrysalice.org>
Date:   Mon Jan 31 20:35:57 2022 +0100

    maven: dismiss origins if they are malformed - e.g. wrong pom scm format, add test.

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/444/ for more details.

I'm confused by one of the test code change. And I have some other suggestions inline ;)

Explained and fixed the comments.

Thanks

Can you please change the diff's title and description to explain what's wrong and
what's being fixed? And then align this within the commit message.

Fixed for the description. As for the commit messages, would simply rebasing + squashing the last 2 commits into one be ok for you? (in my view the messages *do* reflect what they fix so not sure what should be done). I've tried to improve and be more precise in the wording.

Great, that's clearer.

Could you also please split the diff in 2, something like, one for the:

  • last update fix
  • scm origin management fix

Hum. I'm not sure how to do that.

Create as much commits you need in the same branch.
Then you just need to be consistent with your arc diff use.

With 3 commits (as you mentioned after) from the head of your master branch (git checkout master):

arc diff HEAD~3 --head HEAD~2 --update D7052   # first commit you did, we keed that diff id (the description will need amending to reflect the commit message though)
arc diff HEAD~2 --head HEAD~   # creates a new diff on top of the previous one with the 2nd commit (diff creation will just use your commit message as description)
arc diff HEAD~                                # create the last diff for the third commit (same about the diff description)

(then if you need some update on diffs you keep the ranges as is ^ and add the --update D<XYZ> to amend the diff.
If you touch one of the base commit, the diff will need an update to keep in sync).

These changes happen roughly in the same region of the same file, so it would imply to have the fixes in parallel versions (/branches) of the file, right?

you can but it's not necessary if you do like i just showed you ;)

My intention by having two (well, now three) commits was precisely to distinguish the two fixes. Do you want me to apply the commits on 2 branches so as to create 2 fixes? Or how should I proceed?

If it's too much to create the 3 diffs, i'm fine for you to keep the one diff with the multiple changes (still with the 3 separate commits though).
The only caveat there is with that approach is that the diff will lose its context once landed (that's a phabricator limitation which expects "one commit one diff" use).
It will only show the first commit once landed.

Thanks!

Well same ;)

swh/lister/maven/tests/test_lister.py
134
193

lgtm (see my previous comment with answers ;)

some working suggestion inline.

This revision is now accepted and ready to land.Feb 2 2022, 9:55 AM

Create as much commits you need in the same branch.
Then you just need to be consistent with your arc diff use.

Oh, ok. Neat. Thanks for advising.
Actually there will only be 2 diffs: one for last_udpate, the other for the scm fix. I had rebased the diff in-between to merge the two commits related to last_update, but failed to update my comments.

Will do that this evening (and also fix the typos as suggested in diff).

swh/lister/maven/tests/test_lister.py
37

Thanks btw (for checking and clarifying ;)

To be clear, it's ok to land now.
Once your commits are ok for you.
Update the diff here (to sync back your commits' contents with the diff) and then push once and the build's diff is green.
That should close this diff.

Cheers,

borisbaldassari marked 3 inline comments as done.
  • maven: Fix undef last_update in ListedOrigins.
  • maven: Dismiss origins if they are malformed - e.g. wrong pom scm format, add test.

Build has FAILED

Patch application report for D7052 (id=25737)

Rebasing onto a599493b48...

First, rewinding head to replay your work on top of it...
Applying: gitlab: Allow listing of instances providing multiple vcs_type
Using index info to reconstruct a base tree...
M	swh/lister/gitlab/lister.py
M	swh/lister/gitlab/tests/test_lister.py
Falling back to patching base and 3-way merge...
Auto-merging swh/lister/gitlab/tests/test_lister.py
CONFLICT (content): Merge conflict in swh/lister/gitlab/tests/test_lister.py
Auto-merging swh/lister/gitlab/lister.py
CONFLICT (content): Merge conflict in swh/lister/gitlab/lister.py
Patch failed at 0001 gitlab: Allow listing of instances providing multiple vcs_type

Resolve all conflicts manually, mark them as resolved with
"git add/rm <conflicted_files>", then run "git rebase --continue".
You can instead skip this commit: run "git rebase --skip".
To abort and get back to the state before "git rebase", run "git rebase --abort".

Rebase failed (ret=1)!

Could not rebase; Attempt merge onto a599493b48...

Already up to date.
Changes applied before test
commit 0f78b32d552a7e21b1ec5b35cbc4be60adac241f
Author: Boris Baldassari <boris@chrysalice.org>
Date:   Mon Jan 31 20:36:57 2022 +0100

    maven: Fix undef last_update in ListedOrigins.
           Fix last_update tzinfo.
           Fix/document last_update test case.
           Fix typo in log.

commit e07bffeb5973a32600258d77d200b4e8998e9db0
Author: Boris Baldassari <boris@chrysalice.org>
Date:   Mon Jan 31 20:35:57 2022 +0100

    maven: dismiss origins if they are malformed - e.g. wrong pom scm format, add test.

Link to build: https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/445/
See console output for more information: https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/445/console

Build has FAILED

Patch application report for D7052 (id=25738)

Rebasing onto a599493b48...

Current branch diff-target is up to date.
Changes applied before test
commit 0f78b32d552a7e21b1ec5b35cbc4be60adac241f
Author: Boris Baldassari <boris@chrysalice.org>
Date:   Mon Jan 31 20:36:57 2022 +0100

    maven: Fix undef last_update in ListedOrigins.
           Fix last_update tzinfo.
           Fix/document last_update test case.
           Fix typo in log.

commit e07bffeb5973a32600258d77d200b4e8998e9db0
Author: Boris Baldassari <boris@chrysalice.org>
Date:   Mon Jan 31 20:35:57 2022 +0100

    maven: dismiss origins if they are malformed - e.g. wrong pom scm format, add test.

Link to build: https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/446/
See console output for more information: https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/446/console

To be clear, it's ok to land now.

Sorry for the latence, and thanks for clarifying.

Once your commits are ok for you.
Update the diff here (to sync back your commits' contents with the diff) and then push once and the build's diff is green.

Not sure why ci failed. Seems to be python-specific, maybe a temporary error?

ImportError: cannot import name 'TempdirFactory' from '_pytest.tmpdir'

Anyway, I just added the two typo fixes you mentioned. Will try to re-trigger the build, and as soon as the build is green I'll push to master.

Thanks, cheers!

Cheers,

Build has FAILED

Patch application report for D7052 (id=25739)

Rebasing onto a599493b48...

Current branch diff-target is up to date.
Changes applied before test
commit 0f78b32d552a7e21b1ec5b35cbc4be60adac241f
Author: Boris Baldassari <boris@chrysalice.org>
Date:   Mon Jan 31 20:36:57 2022 +0100

    maven: Fix undef last_update in ListedOrigins.
           Fix last_update tzinfo.
           Fix/document last_update test case.
           Fix typo in log.

commit e07bffeb5973a32600258d77d200b4e8998e9db0
Author: Boris Baldassari <boris@chrysalice.org>
Date:   Mon Jan 31 20:35:57 2022 +0100

    maven: dismiss origins if they are malformed - e.g. wrong pom scm format, add test.

Link to build: https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/447/
See console output for more information: https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/447/console

ImportError: cannot import name 'TempdirFactory' from '_pytest.tmpdir'

Anyway, I just added the two typo fixes you mentioned. Will try to re-trigger the build, and as soon as the build is green I'll push to master.

unrelated to your current diff. See D7084.

I'll fix it in master and then you can rebase your diff on it.
I'll ping you when it's done.

I'll fix it in master and then you can rebase your diff on it.
I'll ping you when it's done.

@borisbaldassari ^ done (i've opened T3916 if you want some more details).

Please, just rebase, update your diff and then push ;)

Cheers,

  • maven: dismiss origins if they are malformed - e.g. wrong pom scm format, add test.
  • maven: Fix undef last_update in ListedOrigins.

Build is green

Patch application report for D7052 (id=25791)

Rebasing onto a1000dfeb7...

Current branch diff-target is up to date.
Changes applied before test
commit d4e1e8212ace86fa6efc74776fe0a324f768ef54
Author: Boris Baldassari <boris@chrysalice.org>
Date:   Tue Feb 8 07:50:13 2022 +0100

    maven: Fix undef last_update in ListedOrigins.

commit 24eeabfadec25aa04b5ab068927495d1029c1439
Author: Boris Baldassari <boris@chrysalice.org>
Date:   Mon Jan 31 20:35:57 2022 +0100

    maven: dismiss origins if they are malformed - e.g. wrong pom scm format, add test.

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/449/ for more details.

ImportError: cannot import name 'TempdirFactory' from '_pytest.tmpdir'

unrelated to your current diff. See D7084.

Ok, thanks. This is always good to hear, ahah.

I'll fix it in master and then you can rebase your diff on it.
I'll ping you when it's done.

Merged and landed. Have a great day!