Page MenuHomeSoftware Heritage

add swh-hg-identify a cli to identify hg objects
ClosedPublic

Authored by acezar on Oct 9 2020, 12:40 PM.

Details

Diff Detail

Repository
rDLDHG Mercurial loader
Branch
identify
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 16289
Build 25069: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 25068: arc lint + arc unit

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Add tests using an existing repository whose top level hashes are already known

Build has FAILED

Patch application report for D4216 (id=15231)

Rebasing onto bd914dec39...

Current branch diff-target is up to date.
Changes applied before test
commit 25cd91475dc038d69c5bcc64d6e0ce586dc23c6a
Author: Antoine Cezar <antoine.cezar@octobus.net>
Date:   Thu Oct 8 18:07:50 2020 +0200

    add swh-hg-identify a cli to identify hg objects

Link to build: https://jenkins.softwareheritage.org/job/DLDHG/job/tests-on-diff/60/
See console output for more information: https://jenkins.softwareheritage.org/job/DLDHG/job/tests-on-diff/60/console

setup.py
58

thanks for the clarification ;)

Last minute changes before push is a good way to break working tests ^_^. Fixed

Build has FAILED

Patch application report for D4216 (id=15232)

Rebasing onto bd914dec39...

Current branch diff-target is up to date.
Changes applied before test
commit b543125f433ab3022cc80c5b76e3a2eab0162b61
Author: Antoine Cezar <antoine.cezar@octobus.net>
Date:   Thu Oct 8 18:07:50 2020 +0200

    add swh-hg-identify a cli to identify hg objects

Link to build: https://jenkins.softwareheritage.org/job/DLDHG/job/tests-on-diff/61/
See console output for more information: https://jenkins.softwareheritage.org/job/DLDHG/job/tests-on-diff/61/console

Thanks for adding tests. It would be best that they go green now ;)

The idea was to lift jenkins' coverage report which alongside the diff
can show us what coverage is missing here (if any). The diff is large
enough that this kind of help is greatly appreciated for reviewers.

Nonetheless, I had a pass to your code and added some remarks or questions.

Hope this helps,

swh/loader/mercurial/identify.py
62

you can use docstring """raw bytes...""" just below the instance variable.

72

extra space.

112
fields["description"] = description
return HgRevision(**fields)

?

242

I recall the hgtags can be noisy in its content (possibly due to merge badly dealt with by users).
The following might interest you

Related to T970 rDLDHGe0b48c6c6e9aa3329c9058e1a39da38ff801714a

251

extra space

290

might as well use swh.model.identifiers.swhid function [1] instead of hard-coding values:

even if that's giving a mouthful

from swh.model.identifiers import swhid
return f"{swhid("directory", self.directory_swhid.hex())}\t{self.node_id.decode()}"

(or use an intermediary variable, as you wish...)

[1] https://forge.softwareheritage.org/source/swh-model/browse/master/swh/model/identifiers.py$767-772

294

same

326

curious me, is that a form of destructuring to say, give me the first element from the result, and discard the rest?
(without having to mention how many "rest" there is that is)

it looks like it, TIL ;)

(oh and that works because we are dealing with NamedTuple, neat)

350

same about the model swhid function use.

394

snapshot identifiers.

you might want to align to use identities instead of identifiers as you use elsewhere or use identifiers everywhere... ;)

433

as a heads up, I recall some modifications is currently happening on the module swh.model.identifiers. [1]
We are moving away from calling it directly.

To avoid some breakage on this in a near future, prefer using the swh data model object instead.

from swh.model.model import Snapshot, SnapshotBranch

branches[b"HEAD"] = SnapshotBranch(target=tip.branch(), target_type=TargetType.ALIAS)
...
# adapt the other 2 loops with this ^
...

return Snapshot(branches=branches).id

That remark is also valid for revision_identifier and release_identifier (earlier), it only hit me now ¯\_(ツ)_/¯

[1] T2715, specifically T2713

460

same use swhid function.

marmoute added inline comments.
swh/loader/mercurial/identify.py
242

that same tag name can appears multiple time in the tag file. In this case, the latest value should be used. The previous entries are here to indicate which older value have been overwritten.

(We should add a case of this in the new tests (when it gets to the new tests))

This revision now requires changes to proceed.Oct 21 2020, 9:55 AM

Followup

Note that CI won't pass until D4193 is merged.
Directory identities rely on the added --exclude of swh-identify

Build has FAILED

Patch application report for D4216 (id=15267)

Rebasing onto bd914dec39...

Current branch diff-target is up to date.
Changes applied before test
commit 67647aeba8573a720cb53f55ed25a9a318f882b0
Author: Antoine Cezar <antoine.cezar@octobus.net>
Date:   Thu Oct 8 18:07:50 2020 +0200

    add swh-hg-identify a cli to identify hg objects

Link to build: https://jenkins.softwareheritage.org/job/DLDHG/job/tests-on-diff/65/
See console output for more information: https://jenkins.softwareheritage.org/job/DLDHG/job/tests-on-diff/65/console

acezar added inline comments.
swh/loader/mercurial/identify.py
326

Made this cleared and not relying on the field declaration order.

acezar marked an inline comment as done.

Followup for tag parsing

Build has FAILED

Patch application report for D4216 (id=15269)

Rebasing onto bd914dec39...

Current branch diff-target is up to date.
Changes applied before test
commit c3d959fe1cf5bee461ff7366697f907a578d9288
Author: Antoine Cezar <antoine.cezar@octobus.net>
Date:   Thu Oct 8 18:07:50 2020 +0200

    add swh-hg-identify a cli to identify hg objects

Link to build: https://jenkins.softwareheritage.org/job/DLDHG/job/tests-on-diff/66/
See console output for more information: https://jenkins.softwareheritage.org/job/DLDHG/job/tests-on-diff/66/console

ardumont edited the test plan for this revision. (Show Details)

Thanks!

Overall lgtm.

Don't land this yet though.
Let's wait for @marmoute to validate it as well (he started the review ;)

Let's also wait for D4193 to be accepted, landed and the model to be tagged
Then we shall be able to trigger back the build here to check the coverage.

(i've updated the test plan to mention the current limitation on D4193 which i forgot prior to my remark on green tests ¯\_(ツ)_/¯)

Cheers,

swh/loader/mercurial/identify.py
197

Note to self, add the hg runtime dependency in the debian branch (if not there already, don't remember).

Trigger CI, now that D4193 has landed

Build has FAILED

Patch application report for D4216 (id=15403)

Rebasing onto bd914dec39...

Current branch diff-target is up to date.
Changes applied before test
commit c3d959fe1cf5bee461ff7366697f907a578d9288
Author: Antoine Cezar <antoine.cezar@octobus.net>
Date:   Thu Oct 8 18:07:50 2020 +0200

    add swh-hg-identify a cli to identify hg objects

Link to build: https://jenkins.softwareheritage.org/job/DLDHG/job/tests-on-diff/68/
See console output for more information: https://jenkins.softwareheritage.org/job/DLDHG/job/tests-on-diff/68/console

Build has FAILED

Patch application report for D4216 (id=15420)

Rebasing onto bd914dec39...

Current branch diff-target is up to date.
Changes applied before test
commit c3d959fe1cf5bee461ff7366697f907a578d9288
Author: Antoine Cezar <antoine.cezar@octobus.net>
Date:   Thu Oct 8 18:07:50 2020 +0200

    add swh-hg-identify a cli to identify hg objects

Link to build: https://jenkins.softwareheritage.org/job/DLDHG/job/tests-on-diff/75/
See console output for more information: https://jenkins.softwareheritage.org/job/DLDHG/job/tests-on-diff/75/console

Build has FAILED

Patch application report for D4216 (id=15447)

Rebasing onto bd914dec39...

Current branch diff-target is up to date.
Changes applied before test
commit c3d959fe1cf5bee461ff7366697f907a578d9288
Author: Antoine Cezar <antoine.cezar@octobus.net>
Date:   Thu Oct 8 18:07:50 2020 +0200

    add swh-hg-identify a cli to identify hg objects

Link to build: https://jenkins.softwareheritage.org/job/DLDHG/job/tests-on-diff/79/
See console output for more information: https://jenkins.softwareheritage.org/job/DLDHG/job/tests-on-diff/79/console

Build has FAILED

Patch application report for D4216 (id=15451)

Rebasing onto bd914dec39...

Current branch diff-target is up to date.
Changes applied before test
commit e75b99287ba58bdbc6cb95703f11303c561a8f34
Author: Antoine Cezar <antoine.cezar@octobus.net>
Date:   Thu Oct 8 18:07:50 2020 +0200

    add swh-hg-identify a cli to identify hg objects

Link to build: https://jenkins.softwareheritage.org/job/DLDHG/job/tests-on-diff/81/
See console output for more information: https://jenkins.softwareheritage.org/job/DLDHG/job/tests-on-diff/81/console

Build has FAILED

Patch application report for D4216 (id=15469)

Rebasing onto bd914dec39...

Current branch diff-target is up to date.
Changes applied before test
commit e75b99287ba58bdbc6cb95703f11303c561a8f34
Author: Antoine Cezar <antoine.cezar@octobus.net>
Date:   Thu Oct 8 18:07:50 2020 +0200

    add swh-hg-identify a cli to identify hg objects

Link to build: https://jenkins.softwareheritage.org/job/DLDHG/job/tests-on-diff/85/
See console output for more information: https://jenkins.softwareheritage.org/job/DLDHG/job/tests-on-diff/85/console

So the test fails on jenkins because they use the hg command from the system (since mercurial is oddly enough, not a dependency of swh-loader-mercurial) and on stretch, mercurial is 4.8
Using mercurial 5.5 is ok.

So mercurial needs to be added to the requirements-test.txt file.

swh/loader/mercurial/identify.py
306

Why do you use the swh-identify command in a subprocess rather than just using the identify_object function from swh.model.cli?

see comments

This revision now requires changes to proceed.Oct 28 2020, 11:14 AM
acezar marked an inline comment as done.
  • Add mercurial to requirements.txt
  • use swh.model.cli.identify_object instead of swh-identify subprocess

Build is green

Patch application report for D4216 (id=15471)

Rebasing onto bd914dec39...

Current branch diff-target is up to date.
Changes applied before test
commit 00b709bae8c01854fa9eb37e2b5d9989fa68a17d
Author: Antoine Cezar <antoine.cezar@octobus.net>
Date:   Thu Oct 8 18:07:50 2020 +0200

    add swh-hg-identify a cli to identify hg objects

See https://jenkins.softwareheritage.org/job/DLDHG/job/tests-on-diff/86/ for more details.

Any reason for not landing this? (@marmoute ? do you keep your request for changes?)

Looks overall good. I made various feedback that could be included in this patch or done as follow up.

swh/loader/mercurial/identify.py
150

"Actual" feel weird here. Maybe "to match historical implementation"

234

Can you explain the "\n\n" part in a small comment ?

255

Use "node-id" instead of revision here to make it clearer that we are talking about binary node ID

259

What does "at the current revision" means in this context ?

swh/loader/mercurial/tests/test_identify.py
35 ↗(On Diff #15471)

Should we put this in a level modul """text""" constant to avoid having to build it ?

58 ↗(On Diff #15471)

same feedback.

91 ↗(On Diff #15471)

idem

This revision is now accepted and ready to land.Nov 9 2020, 7:30 PM

Build is green

Patch application report for D4216 (id=15845)

Rebasing onto bd914dec39...

Current branch diff-target is up to date.
Changes applied before test
commit a2e9cf16919a5f81a06f955a533a254a9b3c9689
Author: Antoine Cezar <antoine.cezar@octobus.net>
Date:   Thu Oct 8 18:07:50 2020 +0200

    add swh-hg-identify a cli to identify hg objects

See https://jenkins.softwareheritage.org/job/DLDHG/job/tests-on-diff/92/ for more details.