Page MenuHomeSoftware Heritage

package.tar: Add a generic archive loader implementation (merge with gnu's)
AbandonedPublic

Authored by ardumont on Oct 15 2019, 6:36 PM.

Details

Reviewers
douardda
Group Reviewers
Reviewers
Summary

In the end, assuming we push the parsing 'version' logic within the gnu lister (D2147),
we can have a sufficientely generic 'tar' loader as the following.

For more specific use case than gnu's, it can be passed a list of keys to use to build a composite primary key.
That primary key is solely used to check if we already downloaded the artifacts or not.

Those default keys are the one needed for gnu origins.

When D2025 lands, we will have the choice to either 1. pass along that list of keys for each scheduled tasks.
Or 2. even create in this module a dedicated task which set those.

I prefer 1. as it's more explicit.

Related D2147
Related D2025
Related T1389

Depends on D2135
Depends on D2156
Depends on D2164

Test Plan

tox

Diff Detail

Repository
rDLDBASE Generic VCS/Package Loader
Branch
merge-package-gnu-and-package-tar
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 8494
Build 12331: tox-on-jenkinsJenkins
Build 12330: arc lint + arc unit

Event Timeline

I think the hash and the version should be optional. I explain a bit why we can't always provide the version in D2025.

Regarding the hash, in nixpkgs we always have a hash, but not necessarly the hash of the tar.gz file. For instance, we download archives from the github API (something such as github.com/owner/repo/archives/MY-SOFT.tgz) . Github dynamically generates this archive and the compression algorithm can evolve over time. To avoid reproducibility issues, we compute the hash on the unpacked archive. So, the hash specified in nixpkgs is not the hash of the archive itself, but a hash on the content of this archive. Note we can know if the hash is the hash of the archive itself, to only provide it in this case.

Make sure the snapshot is the same across the 2 visits

I think the hash and the version should be optional. I explain a bit why we can't always provide the version in D2025.

Indeed, i'll reply there ;)

Regarding the hash, in nixpkgs we always have a hash, but not necessarly the hash of the tar.gz file. For instance, we download archives from the github API (something such as github.com/owner/repo/archives/MY-SOFT.tgz) . Github dynamically generates this archive and the compression algorithm can evolve over time. To avoid reproducibility issues, we compute the hash on the unpacked archive. So, the hash specified in nixpkgs is not the hash of the archive itself, but a hash on the content of this archive. Note we can know if the hash is the hash of the archive itself, to only provide it in this case.

Thanks for the heads up.

Rebase on latest package-loader implementation

Depends on D2135

  • gnu: Move version parsing logic to lister
  • gnu/tar: Merge gnu/tar loader behavior into an archive loader
ardumont retitled this revision from package.tar: Add a tar loader implementation to package.tar: Add a generic archive loader implementation.
ardumont retitled this revision from package.tar: Add a generic archive loader implementation to package.tar: Add a generic archive loader implementation (merge with gnu's).

Plug to package-loader branch

Rebase to latest package-loader branch

Build has FAILED

Currently, this depends on the latest migration on storage not yet landed (dropping progressively origin's 'type' column).

maybe rename 'primary key' as 'identity key' or similar.

This revision is now accepted and ready to land.Oct 18 2019, 5:05 PM
  • gnu: Move version parsing logic to lister
  • gnu/archive: Merge gnu/archive loader behavior into an archive loader

Depends on D2164