Page MenuHomeSoftware Heritage

nixguix: Add support for listing origins with "recursive" integrity
ClosedPublic

Authored by ardumont on Oct 4 2022, 4:50 PM.

Details

Summary

Without this distinction the current directory or content loader will fail the download
as they currently expect the checksums to be about the tarball. When a recursive
"integrity" is provided, it's actually about the uncompressed tarball as per the
nix-store computation.

It's detailed within the code.

The next step is to adapt the Directory/Content loader to deal with this.

Related to T3294
Related to T3781

Diff Detail

Repository
rDLS Listers
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build is green

Patch application report for D8614 (id=31112)

Rebasing onto 5daead68ad...

Current branch diff-target is up to date.
Changes applied before test
commit 51c4d15e320f3eed71a2f022723c852029daab1a
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Tue Oct 4 16:47:02 2022 +0200

    nixguix: Add support for listing origins with "recursive" integrity
    
    Without this distinction the current directory or content loader will fail the download
    as they currently expect the checksums to be about the tarball. When a recursive
    "integrity" is provided, it's actually about the uncompressed tarball as per the
    nix-store computation.
    
    It's detailed within the code.
    
    Related to T3294
    Related to T3781

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/747/ for more details.

anlambert added inline comments.
swh/lister/nixguix/lister.py
83

how about naming this nar_dump_checksums instead ?

374–382

You should indicate this is related to Nix only, not to Guix.

is_integrity_about_artifact=False (which should be called are_checksums_about_artifact btw) is not actionable at all for loaders. Instead, there should be a way to tell them what layout to use to compute the checksum (eg. original, nar, and git)

swh/lister/nixguix/lister.py
374–382

NAR is used by Guix too

swh/lister/nixguix/lister.py
374–382

yes, you are both right!

It could be used by guix but currently in the manifest, it does not [1]

I'll try to find a way to mention it.

[1] T3781#92686

swh/lister/nixguix/lister.py
83

That was my first implem' but i find it more convoluted when it's actually used below.

84–86

oh, i can revert that (left over from the other implem where i had both checkums and nar_checksums).
And then doing some if else all over the place...

is_integrity_about_artifact=False (which should be called are_checksums_about_artifact btw)

ack on the plural

is not actionable at all for loaders.

I disagree. I plan to act on that based on which checksums is provided.
If plain "checksums" key is provided, that's the current check which is implemented loader side.

If it's "nar_checksums", then that must do as explain in the "recursive" doc in the code.

Instead, there should be a way to tell them what layout to use to compute the checksum
(eg. original, nar, and git)

Even though i disagree with the "not at all actionable", that may be clearer as you
suggest.

Can you please give me some details on what you mean by "original" and "git" though? I
don't get all of it.

Thanks in advance.

by "not actionable", I means that being False tells the loaders the checksum cannot be checked by hashing the tarball, but does not tell them how to check them.

original means the layout is the original tarball (whatever you want to call it), nar is the layout used by Nix, git is the layout used by Git (id of tree objects) and SWH (id of Directory objects).
Currently there are no use for the latter, it was just an example of a non-nar format

by "not actionable", I means that being False tells the loaders the checksum cannot be
checked by hashing the tarball, but does not tell them how to check them.

ok but still it's not how it's implemented in the diff.

In that diff, the boolean is just here to let the lister know how to push information to
the core loaders. There are only 2 cases for now, the usual checkums on the tarball/file
and the nar ones on the uncompressed (whatever it means for the file, that's unclear to
me for now).

When the bool is false, it's providing the 'standard' checksums in the loader as
'checksums'. Otherwise, it's sending as 'nar_checksums'. Based on that, they actually
should know how to check.

original means the layout is the original tarball (whatever you want to call it), nar
is the layout used by Nix, git is the layout used by Git (id of tree objects) and SWH
(id of Directory objects). Currently there are no use for the latter, it was just an
example of a non-nar format

Ack, it feels like it's yagni to do more than "original" (I prefer "standard" [1]) and
"nar". I can try and rework to match this. It feels clearer.

[1] It's after all a bit of a defacto standard to use those when fetching stuff from the
"internet".

Improve intent implementation

oh, ok I thought that was send to loaders.

This revision is now accepted and ready to land.Oct 4 2022, 6:02 PM

Build is green

Patch application report for D8614 (id=31115)

Rebasing onto 5daead68ad...

Current branch diff-target is up to date.
Changes applied before test
commit 944d4b5b60d011877aded3f15c91912fff9eecda
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Tue Oct 4 16:47:02 2022 +0200

    nixguix: Add support for listing origins with "recursive" integrity
    
    Without this distinction the current directory or content loader will fail the download
    as they currently expect the checksums to be about the tarball. When a recursive
    "integrity" is provided, it's actually about the uncompressed tarball as per the
    nix-store computation.
    
    It's detailed within the code.
    
    Related to T3294
    Related to T3781

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/749/ for more details.