Page MenuHomeSoftware Heritage

add swh-hg-identify a cli to identify hg objects
Needs ReviewPublic

Authored by acezar on Fri, Oct 9, 12:40 PM.

Details

Reviewers
marmoute
ardumont
Group Reviewers
Reviewers
Summary

Depends on D4193

Test Plan

tox
(failing until new swh.model release bump, which includes D4193)

Diff Detail

Repository
rDLDHG Mercurial loader
Branch
D4216
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 16502
Build 25426: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 25425: arc lint + arc unit

Unit TestsFailed

TimeTest
182 msJenkins > .tox.py3.lib.python3.7.site-packages.swh.loader.mercurial.tests.test_identify::test_all
datadir = '/var/lib/jenkins/workspace/DLDHG/tests-on-diff/.tox/py3/lib/python3.7/site-packages/swh/loader/mercurial/tests/data' tmp_path = PosixPath('/tmp/pytest-of-jenkins/pytest-0/test_all0')
170 msJenkins > .tox.py3.lib.python3.7.site-packages.swh.loader.mercurial.tests.test_identify::test_all_revisions
datadir = '/var/lib/jenkins/workspace/DLDHG/tests-on-diff/.tox/py3/lib/python3.7/site-packages/swh/loader/mercurial/tests/data' tmp_path = PosixPath('/tmp/pytest-of-jenkins/pytest-0/test_all_revisions0')
170 msJenkins > .tox.py3.lib.python3.7.site-packages.swh.loader.mercurial.tests.test_identify::test_single_revision
datadir = '/var/lib/jenkins/workspace/DLDHG/tests-on-diff/.tox/py3/lib/python3.7/site-packages/swh/loader/mercurial/tests/data' tmp_path = PosixPath('/tmp/pytest-of-jenkins/pytest-0/test_single_revision0')
2 msJenkins > .tox.py3.lib.python3.7.site-packages.swh.loader.mercurial.tests.test_converters.TestParseAuthorConverters::test_parse_author_2
1 msJenkins > .tox.py3.lib.python3.7.site-packages.swh.loader.mercurial.tests.test_converters.TestParseAuthorConverters::test_parse_author_3
View Full Test Results (3 Failed · 17 Passed)

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Fixes will come with the next update.

acezar updated this revision to Diff 15062.Wed, Oct 14, 7:38 PM

Added:

  • snapshot command for snapshot id
  • all command to get all the ids at once (dir, rev, rel and snp)
acezar marked 11 inline comments as done.Wed, Oct 14, 7:41 PM
acezar abandoned this revision.Wed, Oct 14, 8:11 PM

superseded by D4263 to fix some ci issues

acezar updated this revision to Diff 15092.Thu, Oct 15, 2:27 PM

Reopening with hopefully fixed CI issue

acezar updated this revision to Diff 15093.Thu, Oct 15, 2:33 PM

Last try to fix things

  • ssh key -> ok
  • ssh url in .arcconfig -> ok
  • ssh url in .git/config -> ok
  • git config url.git@forge.softwareheritage.org:.pushInsteadOf https://forge.softwareheritage.org -> ok
acezar updated this revision to Diff 15094.Thu, Oct 15, 2:35 PM

Maybe last time?

acezar updated this revision to Diff 15095.Thu, Oct 15, 2:36 PM

I have no idea

acezar updated this revision to Diff 15105.Thu, Oct 15, 3:14 PM

This should be the last broken update. Maybe...

Build is green

Patch application report for D4216 (id=15105)

Rebasing onto bd914dec39...

Current branch diff-target is up to date.
Changes applied before test
commit 379520e26c79edfb924d120db292efaccdbe6790
Author: Antoine Cezar <antoine.cezar@octobus.net>
Date:   Thu Oct 8 18:07:50 2020 +0200

    add swh-hg-identfy a cli to identify hg objects

See https://jenkins.softwareheritage.org/job/DLDHG/job/tests-on-diff/58/ for more details.

acezar updated this revision to Diff 15127.Thu, Oct 15, 7:48 PM

Fixed releases not being displayed by the all command

Build is green

Patch application report for D4216 (id=15127)

Rebasing onto bd914dec39...

Current branch diff-target is up to date.
Changes applied before test
commit b9c101051d0b0a78bd000102d506b5ccbafb00ad
Author: Antoine Cezar <antoine.cezar@octobus.net>
Date:   Thu Oct 8 18:07:50 2020 +0200

    add swh-hg-identfy a cli to identify hg objects

See https://jenkins.softwareheritage.org/job/DLDHG/job/tests-on-diff/59/ for more details.

acezar edited the summary of this revision. (Show Details)Fri, Oct 16, 3:57 PM
acezar added a parent revision: D4193: swh identify: add --exclude.
ardumont added inline comments.
setup.py
58

@douardda ^ do we still need this?

I recall using the swh_cli_group is the way now, isn't it?

ardumont added inline comments.Sat, Oct 17, 2:12 PM
setup.py
58

as an example 5cc573d

acezar added inline comments.Mon, Oct 19, 12:21 PM
setup.py
58

swh-identify is declared like this:

[console_scripts]
swh-identify=swh.model.cli:identify

Since the mercurial version has the same purpose I used the same declaration. swh-identify is either not up to date or different on purpose?

ardumont added inline comments.Mon, Oct 19, 3:31 PM
setup.py
58

Actually it got both declaration [1] so we can choose to either use:

  • swh identify (and i noticed from your previous diff that you used this form ;)
  • swh-identify from the early days and then we moved away from those (I guess it did not get cleaned up after)

Note that the commit I linked actually drops the old declaration.

Either way, that's not a blocker per say (we can always improve on this later).
I just wanted some clarification from the original author ;)

[1] https://forge.softwareheritage.org/source/swh-model/browse/master/setup.py$61-64

Can we please have some tests?
I'm a bit nervous to review or integrate code without tests.

Thanks in advance.

If you look up a bit other modules with cli, we have some there, some examples [1] [2]

[1] https://forge.softwareheritage.org/source/swh-deposit/browse/master/swh/deposit/tests/cli/test_admin.py

[2] https://forge.softwareheritage.org/source/swh-deposit/browse/master/swh/deposit/tests/cli/test_client.py

add swh-hg-identfy a cli to identify hg objects

oh also, there is a typo in identify here (same goes for the diff title ;)

douardda added inline comments.Mon, Oct 19, 4:49 PM
setup.py
58

unless I'm mistaken, this is not supposed to be a subcommand of the main swh identify for now, so yes, we expect it to a separate command for now.

For now, swh identify has no plugin-like architecture for addigin support for specific VCS, and since it's not a click group but a command, we cannot easily "inject" the hg-identify there. So until we have a better solution, let's keep it a dedicated command.

acezar updated this revision to Diff 15231.Mon, Oct 19, 6:10 PM
acezar retitled this revision from add swh-hg-identfy a cli to identify hg objects to add swh-hg-identify a cli to identify hg objects.

Add tests using an existing repository whose top level hashes are already known

acezar marked 4 inline comments as done.Mon, Oct 19, 6:11 PM

Build has FAILED

Patch application report for D4216 (id=15231)

Rebasing onto bd914dec39...

Current branch diff-target is up to date.
Changes applied before test
commit 25cd91475dc038d69c5bcc64d6e0ce586dc23c6a
Author: Antoine Cezar <antoine.cezar@octobus.net>
Date:   Thu Oct 8 18:07:50 2020 +0200

    add swh-hg-identify a cli to identify hg objects

Link to build: https://jenkins.softwareheritage.org/job/DLDHG/job/tests-on-diff/60/
See console output for more information: https://jenkins.softwareheritage.org/job/DLDHG/job/tests-on-diff/60/console

ardumont added inline comments.Mon, Oct 19, 6:12 PM
setup.py
58

thanks for the clarification ;)

acezar updated this revision to Diff 15232.Mon, Oct 19, 6:19 PM

Last minute changes before push is a good way to break working tests ^_^. Fixed

Build has FAILED

Patch application report for D4216 (id=15232)

Rebasing onto bd914dec39...

Current branch diff-target is up to date.
Changes applied before test
commit b543125f433ab3022cc80c5b76e3a2eab0162b61
Author: Antoine Cezar <antoine.cezar@octobus.net>
Date:   Thu Oct 8 18:07:50 2020 +0200

    add swh-hg-identify a cli to identify hg objects

Link to build: https://jenkins.softwareheritage.org/job/DLDHG/job/tests-on-diff/61/
See console output for more information: https://jenkins.softwareheritage.org/job/DLDHG/job/tests-on-diff/61/console

Thanks for adding tests. It would be best that they go green now ;)

The idea was to lift jenkins' coverage report which alongside the diff
can show us what coverage is missing here (if any). The diff is large
enough that this kind of help is greatly appreciated for reviewers.

Nonetheless, I had a pass to your code and added some remarks or questions.

Hope this helps,

swh/loader/mercurial/identify.py
63

you can use docstring """raw bytes...""" just below the instance variable.

73

extra space.

113
fields["description"] = description
return HgRevision(**fields)

?

243

I recall the hgtags can be noisy in its content (possibly due to merge badly dealt with by users).
The following might interest you

Related to T970 rDLDHGe0b48c6c6e9aa3329c9058e1a39da38ff801714a

252

extra space

291

might as well use swh.model.identifiers.swhid function [1] instead of hard-coding values:

even if that's giving a mouthful

from swh.model.identifiers import swhid
return f"{swhid("directory", self.directory_swhid.hex())}\t{self.node_id.decode()}"

(or use an intermediary variable, as you wish...)

[1] https://forge.softwareheritage.org/source/swh-model/browse/master/swh/model/identifiers.py$767-772

295

same

327

curious me, is that a form of destructuring to say, give me the first element from the result, and discard the rest?
(without having to mention how many "rest" there is that is)

it looks like it, TIL ;)

(oh and that works because we are dealing with NamedTuple, neat)

351

same about the model swhid function use.

395

snapshot identifiers.

you might want to align to use identities instead of identifiers as you use elsewhere or use identifiers everywhere... ;)

434

as a heads up, I recall some modifications is currently happening on the module swh.model.identifiers. [1]
We are moving away from calling it directly.

To avoid some breakage on this in a near future, prefer using the swh data model object instead.

from swh.model.model import Snapshot, SnapshotBranch

branches[b"HEAD"] = SnapshotBranch(target=tip.branch(), target_type=TargetType.ALIAS)
...
# adapt the other 2 loops with this ^
...

return Snapshot(branches=branches).id

That remark is also valid for revision_identifier and release_identifier (earlier), it only hit me now ¯\_(ツ)_/¯

[1] T2715, specifically T2713

461

same use swhid function.

marmoute requested changes to this revision.Wed, Oct 21, 9:55 AM
marmoute added inline comments.
swh/loader/mercurial/identify.py
243

that same tag name can appears multiple time in the tag file. In this case, the latest value should be used. The previous entries are here to indicate which older value have been overwritten.

(We should add a case of this in the new tests (when it gets to the new tests))

This revision now requires changes to proceed.Wed, Oct 21, 9:55 AM
acezar updated this revision to Diff 15267.Wed, Oct 21, 11:16 AM

Followup

Note that CI won't pass until D4193 is merged.
Directory identities rely on the added --exclude of swh-identify

Build has FAILED

Patch application report for D4216 (id=15267)

Rebasing onto bd914dec39...

Current branch diff-target is up to date.
Changes applied before test
commit 67647aeba8573a720cb53f55ed25a9a318f882b0
Author: Antoine Cezar <antoine.cezar@octobus.net>
Date:   Thu Oct 8 18:07:50 2020 +0200

    add swh-hg-identify a cli to identify hg objects

Link to build: https://jenkins.softwareheritage.org/job/DLDHG/job/tests-on-diff/65/
See console output for more information: https://jenkins.softwareheritage.org/job/DLDHG/job/tests-on-diff/65/console

acezar marked 11 inline comments as done.Wed, Oct 21, 11:20 AM
acezar added inline comments.
swh/loader/mercurial/identify.py
327

Made this cleared and not relying on the field declaration order.

acezar updated this revision to Diff 15269.Wed, Oct 21, 11:35 AM
acezar marked an inline comment as done.

Followup for tag parsing

Build has FAILED

Patch application report for D4216 (id=15269)

Rebasing onto bd914dec39...

Current branch diff-target is up to date.
Changes applied before test
commit c3d959fe1cf5bee461ff7366697f907a578d9288
Author: Antoine Cezar <antoine.cezar@octobus.net>
Date:   Thu Oct 8 18:07:50 2020 +0200

    add swh-hg-identify a cli to identify hg objects

Link to build: https://jenkins.softwareheritage.org/job/DLDHG/job/tests-on-diff/66/
See console output for more information: https://jenkins.softwareheritage.org/job/DLDHG/job/tests-on-diff/66/console

acezar marked 2 inline comments as done.Wed, Oct 21, 11:37 AM
ardumont edited the test plan for this revision. (Show Details)Wed, Oct 21, 11:57 AM
ardumont accepted this revision.EditedWed, Oct 21, 12:02 PM
ardumont edited the test plan for this revision. (Show Details)

Thanks!

Overall lgtm.

Don't land this yet though.
Let's wait for @marmoute to validate it as well (he started the review ;)

Let's also wait for D4193 to be accepted, landed and the model to be tagged
Then we shall be able to trigger back the build here to check the coverage.

(i've updated the test plan to mention the current limitation on D4193 which i forgot prior to my remark on green tests ¯\_(ツ)_/¯)

Cheers,

swh/loader/mercurial/identify.py
197

Note to self, add the hg runtime dependency in the debian branch (if not there already, don't remember).