Page MenuHomeSoftware Heritage

crates: Loader implements incremental mode
Needs ReviewPublic

Authored by franckbret on Aug 2 2022, 9:31 AM.

Details

Reviewers
vlorentz
anlambert
Group Reviewers
Reviewers
Summary

Add incremental support based on sha256 EXTID
Adapt test dataset and add incremental test cases

Related T4104

Diff Detail

Repository
rDLDBASE Generic VCS/Package Loader
Branch
crates-incremental
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 32146
Build 50334: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 50333: arc lint + arc unit

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
swh/loader/package/crates/loader.py
349

@vlorentz Considering that the crate file is a tar.gz I may stat the Cargo.toml to get a timestamp that should be quite accurate?

Change EXTID_TYPE name and MANIFEST_FORMAT

Build has FAILED

Patch application report for D8171 (id=29658)

Rebasing onto 43597c4806...

First, rewinding head to replay your work on top of it...
Applying: crates: Loader implements incremental mode
Changes applied before test
commit 3a5aab819f75899d7e7e09d3f0af255923e204db
Author: Franck Bret <franck.bret@octobus.net>
Date:   Tue Aug 2 09:21:24 2022 +0200

    crates: Loader implements incremental mode
    
    Add incremental support based on sha256 EXTID
    Adapt test dataset and add incremental test cases
    
    Related T4104

Link to build: https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/821/
See console output for more information: https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/821/console

Build is green

Patch application report for D8171 (id=29661)

Rebasing onto 43597c4806...

First, rewinding head to replay your work on top of it...
Applying: crates: Loader implements incremental mode
Changes applied before test
commit bbc96fd5703dbca0bd4372301e1890d7b24c5e3b
Author: Franck Bret <franck.bret@octobus.net>
Date:   Tue Aug 2 09:21:24 2022 +0200

    crates: Loader implements incremental mode
    
    Add incremental support based on sha256 EXTID
    Adapt test dataset and add incremental test cases
    
    Related T4104

See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/822/ for more details.

swh/loader/package/crates/loader.py
349

I've investigate and it's not possible to get something accurate stating Cargo.toml from within the archive. Looks like that for a lot of old packages the file date is 01011970.

Do you have any other ideas, or do I have to get back to get it from http api call?

swh/loader/package/crates/loader.py
349

API call is fine.

But actually, can't you get it from the lister? Would get_last_update_by_file match what the API returns?

swh/loader/package/crates/loader.py
349

get_last_update_by_file returns the date of the last commit for a package, not for each version.

Please note that the git repository we target is squashed every few months. See https://internals.rust-lang.org/t/cargos-crate-index-upcoming-squash-into-one-commit/8440

So I suspected that even when we get_last_update_by_file the results are not consistent.

Example with hg-core that have only one version from 2019:

franck@debian-franck:/tmp/crates.io-index/hg/-c$ git log hg-core
commit 9ceec3bd05e9d6ca5a70084cc0078a2f324b66af
Author: bors <bors@rust-lang.org>
Date:   Wed Jul 6 02:31:28 2022 +0000

    Collapse index into one commit
    
    Previous HEAD was 075e7a606882092af5c5bbe4124872745dc4611c, now on the `snapshot-2022-07-06` branch
    
    More information about this change can be found [online] and on [this issue].
    
    [online]: https://internals.rust-lang.org/t/cargos-crate-index-upcoming-squash-into-one-commit/8440
    [this issue]: https://github.com/rust-lang/crates-io-cargo-teams/issues/47

Now if I do the same on the archive repository https://github.com/rust-lang/crates.io-index-archive:

franck@debian-franck:/tmp$ git ls-remote https://github.com/rust-lang/crates.io-index-archive | grep refs/heads/snapshot
9110daee6752e903379f3af955506d6116315273	refs/heads/snapshot-2018-09-26
e669e7256d9d00baea377e9f487c0d086ac78c2c	refs/heads/snapshot-2019-10-17
f6bccfc6021a2088cb0e89652b9bfcd105c0c2a0	refs/heads/snapshot-2020-03-25
eb6c4f86a152ee407c7a466327c6a4cbbb92cd7a	refs/heads/snapshot-2020-08-04
1b7e17acbb67d41e148ba6dbaf8975f412dc6207	refs/heads/snapshot-2020-11-20
a5dcd8438da2d8f99e3661a1956afbfb8f026fa0	refs/heads/snapshot-2021-05-05
4181c62812c70fafb2b56cbbd66c31056671b445	refs/heads/snapshot-2021-07-02
f954048ea7b374a6261fa751710b73981b292048	refs/heads/snapshot-2021-09-24
94b5429198de77c890839b962228b187f0c25468	refs/heads/snapshot-2021-12-21
ba5efd5ab04919dd77b8a7b8298327c3ce75457e	refs/heads/snapshot-2022-03-02
075e7a606882092af5c5bbe4124872745dc4611c	refs/heads/snapshot-2022-07-06

Now clone the latest branch (it can takes minutes)

franck@debian-franck:/tmp$ git clone -b snapshot-2022-07-06 https://github.com/rust-lang/crates.io-index-archive

franck@debian-franck:/tmp/crates.io-index-archive$ git log hg/-c/hg-core
commit d511f68fa91e266ba7a20b5f37e7a4801423c289
Author: bors <bors@rust-lang.org>
Date:   Wed Mar 2 02:43:52 2022 +0000

    Collapse index into one commit
    
    Previous HEAD was ba5efd5ab04919dd77b8a7b8298327c3ce75457e, now on the `snapshot-2022-03-02` branch
    
    More information about this change can be found [online] and on [this issue].
    
    [online]: https://internals.rust-lang.org/t/cargos-crate-index-upcoming-squash-into-one-commit/8440
    [this issue]: https://github.com/rust-lang/crates-io-cargo-teams/issues/47

Not better.

Let's try with first snapshot

franck@debian-franck:/tmp/crates.io-index-archive$ git checkout snapshot-2018-09-26
Updating files: 100% (76174/76174), done.
Branch 'snapshot-2018-09-26' set up to track remote branch 'snapshot-2018-09-26' from 'origin'.
Switched to a new branch 'snapshot-2018-09-26'
franck@debian-franck:/tmp/crates.io-index-archive$ git log hg/-c/hg-core
fatal: ambiguous argument 'hg/-c/hg-core': unknown revision or path not in the working tree.
Use '--' to separate paths from revisions, like this:
'git <command> [<revision>...] -- [<file>...]'

Obviously file doesn't exists because its 2018

franck@debian-franck:/tmp/crates.io-index-archive$ git checkout snapshot-2019-10-17
Updating files: 100% (17852/17852), done.
Branch 'snapshot-2019-10-17' set up to track remote branch 'snapshot-2019-10-17' from 'origin'.
Switched to a new branch 'snapshot-2019-10-17'
franck@debian-franck:/tmp/crates.io-index-archive$ git log hg/-c/hg-core
commit 57336a33dde6225e0cc201fe7c5715f0351702cb
Author: bors <bors@rust-lang.org>
Date:   Tue Apr 16 18:48:16 2019 +0000

    Updating crate `hg-core#0.0.1`

Ok here is the first commit, and the date Tue Apr 16 18:48:16 2019 +0000 seems 5 seconds after the one from the api which was 2019-04-16T18:48:11.404457+00:00.

I don't know if it is doable to rebuild a full linear git log from all those snapshot from crates.io-index-archives while excluding all of those squashed commits, but it could be a way to get accurate release date for each versions only using git at the lister level.

Another way I explored is downloading the experimental db dump https://static.crates.io/db-dump.tar.gz which contains two interesting files, crates.csv which list all packages name with a unique id per line and versions.csv which lists all package versions and reference the previous crate_id. The database is dumped every 24 hours.

Let's check date for hg-core:

franck@debian-franck:~/Téléchargements/2022-08-08-020027/data$ cat crates.csv | grep hg-core
2019-04-16 18:48:11.404457,"Mercurial pure Rust core library, with no assumption on Python bindings (FFI)",,563,https://mercurial-scm.org,128438,,hg-core,,https://www.mercurial-scm.org/repo/hg,2019-04-16 18:48:11.404457
franck@debian-franck:~/Téléchargements/2022-08-08-020027/data$ cat versions.csv | grep 128438
128438,21344,2019-04-16 18:48:11.404457,563,{},145309,GPL-2.0-or-later,0.0.1,45544,2019-04-16 18:48:11.404457,f

The date is corresponding and we grab also the package name and version, but we miss the cksum of each crates versions (the dump is not a real iso dump of the database as it excludes some table and or some columns).

With that say, the options I see now are:

If that problem of squashed commit date is not a problem at all on the lister side :

  • Option 1 : Stay with current implementation of the lister, the loader makes one api call per package to get the release date for each versions. (+/- 89000 packages as today, so 89000 api call)

If we want accurate date for both the lister and the loader:

  • Option 2 : Once the lister has listed all existing packages name and versions from the git repository, download the database dump, then find the package id in crates.csv and finally get the release date for each versions it knows in versions.csv. There may be a delta between db and git so we accept that we did not find some date for some packages. When the loader face an empty date it can call the api. The amount of http api call should be drastically reduced. When the lister enters incremental mode we can stay on the same strategy or adapt by bypassing the database dump search, I guess it depends on the frequency the lister runs, so not sure for now what's best.
  • Option 3 : If there is a way to get a linear git log of all snapshots from the crates.io-index-archive, excluding squashed commits, we can imagine getting the released date of each versions through per file git diff manipulation. The lister get the release date for each versions, no more call to the http api on the loader side.

What are your thoughts on this @vlorentz @ardumont ?

swh/loader/package/crates/loader.py
349

Hmm ok, squashes are going to make it tricky.

Using the DB dump is a nice idea, but it would add complexity on our side to manage this kind of large data dump and share it across workers. We would probably add a dedicated worker for this, but this also adds complexity.

I think API access is fine. https://crates.io/policies#crawlers says they allow up to 1 request per second; which we are unlikely to hit anyway, given the time it takes to ingest a package. However, we would need some way to ensure we don't exceed it, and I don't see a way to do it without assigning a dedicated worker...

I'd like @ardumont's input as he may have some insight; but sadly he is on vacation until the end of the month :/

swh/loader/package/crates/loader.py
349

Using the DB dump is a nice idea, but it would add complexity on our side to manage this kind of large data dump and share it across workers. We would probably add a dedicated worker for this, but this also adds complexity.

When talking about using the db dump its on the lister side not on the loader. The db dump is about 200mo the git repository is about 800mo. Does the problem you talk about also exists for the git repo?

I will go back to finalize arch and aur for now.

swh/loader/package/crates/loader.py
349

ah yes, of course. I guess that would be fine then, as listers are already assigned to specific workers afaik

From chatroom:
ardumont
val: ^ currently listers and loaders are not really separated (only the github lister is separated from the rest to avoid starvation around listing forges)
If we want accurate date for both the lister and the loader:
yes, we do so i guess only the option 2 is the way forward

Update the patch to make it work with new lister patch that give a last_update value for each versions

Related D8454

Build is green

Patch application report for D8171 (id=30607)

Rebasing onto 134087342b...

Current branch diff-target is up to date.
Changes applied before test
commit b83aa6e71e737d32dee223b47e6ff3435531530e
Author: Franck Bret <franck.bret@octobus.net>
Date:   Tue Aug 2 09:21:24 2022 +0200

    crates: Loader implements incremental mode
    
    Add incremental support based on sha256 EXTID
    Manage release date for each versions of a package
    Adapt test dataset and add incremental test cases
    
    Related T4104

See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/872/ for more details.

anlambert added a subscriber: anlambert.

The loader requires some adaptation to align with lister output.

swh/loader/package/crates/loader.py
83–85

To align with lister output:

artifacts: Dict[str, Dict[str, Any]],
crates_metadata: Dict[str, Dict[str, Any]],
140–141

To align with lister output:

self.artifacts = artifacts
self.crates_metadata = crates_metadata
This revision now requires changes to proceed.Sep 27 2022, 2:12 PM
swh/loader/package/crates/loader.py
83–85

Ignore this comment, I was not aware that we should use this format

140–141

Ignore this comment, I was not aware that we should use this format

Use this instead to ensure all versions get loaded:

self.artifacts: Dict[str, Dict] = {
    artifact["version"]: artifact for artifact in artifacts
}
self.crates_metadata: Dict[str, Dict] = {
    metadata["version"]: metadata for metadata in crates_metadata
}

Switch back artifacts and crates_metadata to list

Build is green

Patch application report for D8171 (id=31170)

Rebasing onto 8aa6dab72a...

First, rewinding head to replay your work on top of it...
Applying: crates: Loader implements incremental mode
Using index info to reconstruct a base tree...
M	swh/loader/package/crates/loader.py
M	swh/loader/package/crates/tests/test_tasks.py
Falling back to patching base and 3-way merge...
Auto-merging swh/loader/package/crates/tests/test_tasks.py
CONFLICT (content): Merge conflict in swh/loader/package/crates/tests/test_tasks.py
Removing swh/loader/package/crates/tests/data/https_crates.io/api_v1_crates_micro-timer
Removing swh/loader/package/crates/tests/data/https_crates.io/api_v1_crates_hg-core
Auto-merging swh/loader/package/crates/loader.py
CONFLICT (content): Merge conflict in swh/loader/package/crates/loader.py
Patch failed at 0001 crates: Loader implements incremental mode

Resolve all conflicts manually, mark them as resolved with
"git add/rm <conflicted_files>", then run "git rebase --continue".
You can instead skip this commit: run "git rebase --skip".
To abort and get back to the state before "git rebase", run "git rebase --abort".

Rebase failed (ret=1)!

Could not rebase; Attempt merge onto 8aa6dab72a...

Already up to date.
Changes applied before test
commit 8dc23c52dacd0acef1a1f145279aeda82eaebf32
Author: Franck Bret <franck.bret@octobus.net>
Date:   Tue Aug 2 09:21:24 2022 +0200

    crates: Loader implements incremental mode
    
    Add incremental support based on sha256 EXTID
    Manage release date for each versions of a package
    Adapt test dataset and add incremental test cases
    
    Related T4104

See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/962/ for more details.

swh/loader/package/crates/loader.py
199–211

Use checksums={"sha256": sha256} instead in order for the loader to check download integrity.

franckbret marked an inline comment as done.

Make use of checksums

Instantiate test task with valid data
P_info.last_update is now typed as datetime

Build is green

Patch application report for D8171 (id=31176)

Could not rebase; Attempt merge onto 8aa6dab72a...

Updating 8aa6dab..9179f4b
Fast-forward
 docs/package-loader-specifications.rst             |   9 +
 setup.py                                           |   1 +
 swh/loader/package/conda/__init__.py               |  17 +
 swh/loader/package/conda/loader.py                 | 168 ++++++++++
 swh/loader/package/conda/tasks.py                  |  14 +
 swh/loader/package/conda/tests/__init__.py         |   0
 swh/loader/package/conda/tests/data/fake_conda.sh  | 231 ++++++++++++++
 ...inux-64_lifetimes-0.11.1-py36h9f0ad1d_1.tar.bz2 | Bin 0 -> 1742 bytes
 ...inux-64_lifetimes-0.11.1-py36hc560c46_1.tar.bz2 | Bin 0 -> 1286 bytes
 swh/loader/package/conda/tests/test_conda.py       | 133 ++++++++
 swh/loader/package/conda/tests/test_tasks.py       |  24 ++
 swh/loader/package/crates/loader.py                | 166 ++++------
 swh/loader/package/crates/tests/data/expected.json | 133 ++++++++
 .../package/crates/tests/data/fake_crates.sh       |  15 +-
 .../data/https_crates.io/api_v1_crates_hg-core     |   2 -
 .../data/https_crates.io/api_v1_crates_micro-timer |   2 -
 .../crates_hg-core_hg-core-0.0.1.crate             | Bin 426 -> 427 bytes
 .../crates_micro-timer_micro-timer-0.1.0.crate     | Bin 456 -> 484 bytes
 .../crates_micro-timer_micro-timer-0.1.1.crate     | Bin 458 -> 456 bytes
 .../crates_micro-timer_micro-timer-0.1.2.crate     | Bin 485 -> 484 bytes
 .../crates_micro-timer_micro-timer-0.2.0.crate     | Bin 419 -> 419 bytes
 .../crates_micro-timer_micro-timer-0.2.1.crate     | Bin 420 -> 420 bytes
 .../crates_micro-timer_micro-timer-0.3.0.crate     | Bin 413 -> 419 bytes
 .../crates_micro-timer_micro-timer-0.3.1.crate     | Bin 421 -> 416 bytes
 .../crates_micro-timer_micro-timer-0.4.0.crate     | Bin 417 -> 419 bytes
 swh/loader/package/crates/tests/test_crates.py     | 346 +++++++++++++++------
 swh/loader/package/crates/tests/test_tasks.py      |  18 +-
 27 files changed, 1052 insertions(+), 227 deletions(-)
 create mode 100644 swh/loader/package/conda/__init__.py
 create mode 100644 swh/loader/package/conda/loader.py
 create mode 100644 swh/loader/package/conda/tasks.py
 create mode 100644 swh/loader/package/conda/tests/__init__.py
 create mode 100644 swh/loader/package/conda/tests/data/fake_conda.sh
 create mode 100644 swh/loader/package/conda/tests/data/https_conda.anaconda.org/conda-forge_linux-64_lifetimes-0.11.1-py36h9f0ad1d_1.tar.bz2
 create mode 100644 swh/loader/package/conda/tests/data/https_conda.anaconda.org/conda-forge_linux-64_lifetimes-0.11.1-py36hc560c46_1.tar.bz2
 create mode 100644 swh/loader/package/conda/tests/test_conda.py
 create mode 100644 swh/loader/package/conda/tests/test_tasks.py
 create mode 100644 swh/loader/package/crates/tests/data/expected.json
 delete mode 100644 swh/loader/package/crates/tests/data/https_crates.io/api_v1_crates_hg-core
 delete mode 100644 swh/loader/package/crates/tests/data/https_crates.io/api_v1_crates_micro-timer
Changes applied before test
commit 9179f4b7ca0bece92609ebb41c806ea5efadee77
Author: Franck Bret <franck.bret@octobus.net>
Date:   Tue Aug 2 09:21:24 2022 +0200

    crates: Loader implements incremental mode
    
    Add incremental support based on sha256 EXTID
    Manage release date for each versions of a package
    Adapt test dataset and add incremental test cases
    
    Related T4104

commit 74289c868125a4d08743a1b2f00a4cd22410e1ad
Author: Franck Bret <franck.bret@octobus.net>
Date:   Wed Sep 28 16:23:45 2022 +0200

    Conda: Anaconda packages archive loader
    
    For each origin it takes advantage of 'artifacts' data send through
    'extra_loader_arguments' of the conda lister, providing versions,
    archive url, checksum, etc.
    Author and description are extracted from intrinsic metadata.
    
    Related T4579

See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/963/ for more details.

swh/loader/package/crates/loader.py
231

Based on @vlorentz remarks on irc, we should remove the description in the release message as it is related to the crate package and not that particular release.

Remove description from Release message, add raw extrinsic metadata

Description removed, its value were describing the package, not a releaseAdd raw extrinsic metatata with format="original-artifacts-json". Populate it with data from extra loader arguments "artifacts"
Adapt tests

Build is green

Patch application report for D8171 (id=31180)

Could not rebase; Attempt merge onto 8aa6dab72a...

Updating 8aa6dab..6cacf4e
Fast-forward
 docs/package-loader-specifications.rst             |   9 +
 setup.py                                           |   1 +
 swh/loader/package/conda/__init__.py               |  17 +
 swh/loader/package/conda/loader.py                 | 168 ++++++++
 swh/loader/package/conda/tasks.py                  |  14 +
 swh/loader/package/conda/tests/__init__.py         |   0
 swh/loader/package/conda/tests/data/fake_conda.sh  | 231 +++++++++++
 ...inux-64_lifetimes-0.11.1-py36h9f0ad1d_1.tar.bz2 | Bin 0 -> 1742 bytes
 ...inux-64_lifetimes-0.11.1-py36hc560c46_1.tar.bz2 | Bin 0 -> 1286 bytes
 swh/loader/package/conda/tests/test_conda.py       | 133 ++++++
 swh/loader/package/conda/tests/test_tasks.py       |  24 ++
 swh/loader/package/crates/loader.py                | 216 +++++-----
 swh/loader/package/crates/tests/data/expected.json | 133 ++++++
 .../package/crates/tests/data/fake_crates.sh       |  15 +-
 .../data/https_crates.io/api_v1_crates_hg-core     |   2 -
 .../data/https_crates.io/api_v1_crates_micro-timer |   2 -
 .../crates_hg-core_hg-core-0.0.1.crate             | Bin 426 -> 427 bytes
 .../crates_micro-timer_micro-timer-0.1.0.crate     | Bin 456 -> 484 bytes
 .../crates_micro-timer_micro-timer-0.1.1.crate     | Bin 458 -> 456 bytes
 .../crates_micro-timer_micro-timer-0.1.2.crate     | Bin 485 -> 484 bytes
 .../crates_micro-timer_micro-timer-0.2.0.crate     | Bin 419 -> 419 bytes
 .../crates_micro-timer_micro-timer-0.2.1.crate     | Bin 420 -> 420 bytes
 .../crates_micro-timer_micro-timer-0.3.0.crate     | Bin 413 -> 419 bytes
 .../crates_micro-timer_micro-timer-0.3.1.crate     | Bin 421 -> 416 bytes
 .../crates_micro-timer_micro-timer-0.4.0.crate     | Bin 417 -> 419 bytes
 swh/loader/package/crates/tests/test_crates.py     | 444 +++++++++++++++------
 swh/loader/package/crates/tests/test_tasks.py      |  18 +-
 27 files changed, 1162 insertions(+), 265 deletions(-)
 create mode 100644 swh/loader/package/conda/__init__.py
 create mode 100644 swh/loader/package/conda/loader.py
 create mode 100644 swh/loader/package/conda/tasks.py
 create mode 100644 swh/loader/package/conda/tests/__init__.py
 create mode 100644 swh/loader/package/conda/tests/data/fake_conda.sh
 create mode 100644 swh/loader/package/conda/tests/data/https_conda.anaconda.org/conda-forge_linux-64_lifetimes-0.11.1-py36h9f0ad1d_1.tar.bz2
 create mode 100644 swh/loader/package/conda/tests/data/https_conda.anaconda.org/conda-forge_linux-64_lifetimes-0.11.1-py36hc560c46_1.tar.bz2
 create mode 100644 swh/loader/package/conda/tests/test_conda.py
 create mode 100644 swh/loader/package/conda/tests/test_tasks.py
 create mode 100644 swh/loader/package/crates/tests/data/expected.json
 delete mode 100644 swh/loader/package/crates/tests/data/https_crates.io/api_v1_crates_hg-core
 delete mode 100644 swh/loader/package/crates/tests/data/https_crates.io/api_v1_crates_micro-timer
Changes applied before test
commit 6cacf4e4da48ac88e13adee2a423bc73e26401be
Author: Franck Bret <franck.bret@octobus.net>
Date:   Tue Aug 2 09:21:24 2022 +0200

    crates: Loader implements incremental mode
    
    Add incremental support based on sha256 EXTID
    Manage release date for each versions of a package
    Adapt test dataset and add incremental test cases
    
    Related T4104

commit 74289c868125a4d08743a1b2f00a4cd22410e1ad
Author: Franck Bret <franck.bret@octobus.net>
Date:   Wed Sep 28 16:23:45 2022 +0200

    Conda: Anaconda packages archive loader
    
    For each origin it takes advantage of 'artifacts' data send through
    'extra_loader_arguments' of the conda lister, providing versions,
    archive url, checksum, etc.
    Author and description are extracted from intrinsic metadata.
    
    Related T4579

See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/964/ for more details.

@vlorentz @anlambert Last commit introduce raw extrinsic metatata with format="original-artifacts-json". It is populated with data from extra loader arguments "artifacts".

Regarding the "yanked" information, I suppose I just need to add one more entry to directory_extrinsic_metadata, "crates-package-json"?

Also I will do some cleanup, big part of the intrinsic metadata related code is useless now, as we only need to get the author.

Add "crates-package-json" raw extrinsic metadata

Add a second entry raw extrinsic metadata format named "crates-package-json", it contains "last_update" and "yanked" information

Cleanup useless code

Adapt loader specification documentation

Build has FAILED

Patch application report for D8171 (id=31183)

Could not rebase; Attempt merge onto 8aa6dab72a...

Updating 8aa6dab..382a099
Fast-forward
 docs/package-loader-specifications.rst             |  19 +-
 setup.py                                           |   1 +
 swh/loader/package/conda/__init__.py               |  17 +
 swh/loader/package/conda/loader.py                 | 168 ++++++++
 swh/loader/package/conda/tasks.py                  |  14 +
 swh/loader/package/conda/tests/__init__.py         |   0
 swh/loader/package/conda/tests/data/fake_conda.sh  | 231 +++++++++++
 ...inux-64_lifetimes-0.11.1-py36h9f0ad1d_1.tar.bz2 | Bin 0 -> 1742 bytes
 ...inux-64_lifetimes-0.11.1-py36hc560c46_1.tar.bz2 | Bin 0 -> 1286 bytes
 swh/loader/package/conda/tests/test_conda.py       | 133 ++++++
 swh/loader/package/conda/tests/test_tasks.py       |  24 ++
 swh/loader/package/crates/loader.py                | 305 +++++---------
 swh/loader/package/crates/tests/data/expected.json | 133 ++++++
 .../package/crates/tests/data/fake_crates.sh       |  15 +-
 .../data/https_crates.io/api_v1_crates_hg-core     |   2 -
 .../data/https_crates.io/api_v1_crates_micro-timer |   2 -
 .../crates_hg-core_hg-core-0.0.1.crate             | Bin 426 -> 427 bytes
 .../crates_micro-timer_micro-timer-0.1.0.crate     | Bin 456 -> 484 bytes
 .../crates_micro-timer_micro-timer-0.1.1.crate     | Bin 458 -> 456 bytes
 .../crates_micro-timer_micro-timer-0.1.2.crate     | Bin 485 -> 484 bytes
 .../crates_micro-timer_micro-timer-0.2.0.crate     | Bin 419 -> 419 bytes
 .../crates_micro-timer_micro-timer-0.2.1.crate     | Bin 420 -> 420 bytes
 .../crates_micro-timer_micro-timer-0.3.0.crate     | Bin 413 -> 419 bytes
 .../crates_micro-timer_micro-timer-0.3.1.crate     | Bin 421 -> 416 bytes
 .../crates_micro-timer_micro-timer-0.4.0.crate     | Bin 417 -> 419 bytes
 swh/loader/package/crates/tests/test_crates.py     | 459 +++++++++++++++------
 swh/loader/package/crates/tests/test_tasks.py      |  18 +-
 27 files changed, 1189 insertions(+), 352 deletions(-)
 create mode 100644 swh/loader/package/conda/__init__.py
 create mode 100644 swh/loader/package/conda/loader.py
 create mode 100644 swh/loader/package/conda/tasks.py
 create mode 100644 swh/loader/package/conda/tests/__init__.py
 create mode 100644 swh/loader/package/conda/tests/data/fake_conda.sh
 create mode 100644 swh/loader/package/conda/tests/data/https_conda.anaconda.org/conda-forge_linux-64_lifetimes-0.11.1-py36h9f0ad1d_1.tar.bz2
 create mode 100644 swh/loader/package/conda/tests/data/https_conda.anaconda.org/conda-forge_linux-64_lifetimes-0.11.1-py36hc560c46_1.tar.bz2
 create mode 100644 swh/loader/package/conda/tests/test_conda.py
 create mode 100644 swh/loader/package/conda/tests/test_tasks.py
 create mode 100644 swh/loader/package/crates/tests/data/expected.json
 delete mode 100644 swh/loader/package/crates/tests/data/https_crates.io/api_v1_crates_hg-core
 delete mode 100644 swh/loader/package/crates/tests/data/https_crates.io/api_v1_crates_micro-timer
Changes applied before test
commit 382a099eaa21b67b37cf47d66192ede116c5cf11
Author: Franck Bret <franck.bret@octobus.net>
Date:   Tue Aug 2 09:21:24 2022 +0200

    crates: Loader implements incremental mode
    
    Add incremental support based on sha256 EXTID
    Manage release date for each versions of a package
    Adapt test dataset and add incremental test cases
    
    Related T4104

commit 74289c868125a4d08743a1b2f00a4cd22410e1ad
Author: Franck Bret <franck.bret@octobus.net>
Date:   Wed Sep 28 16:23:45 2022 +0200

    Conda: Anaconda packages archive loader
    
    For each origin it takes advantage of 'artifacts' data send through
    'extra_loader_arguments' of the conda lister, providing versions,
    archive url, checksum, etc.
    Author and description are extracted from intrinsic metadata.
    
    Related T4579

Link to build: https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/966/
See console output for more information: https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/966/console

Build has FAILED

Patch application report for D8171 (id=31186)

Could not rebase; Attempt merge onto 8aa6dab72a...

Updating 8aa6dab..515fbdd
Fast-forward
 docs/package-loader-specifications.rst             |  19 +-
 setup.py                                           |   1 +
 swh/loader/package/conda/__init__.py               |  17 +
 swh/loader/package/conda/loader.py                 | 168 ++++++++
 swh/loader/package/conda/tasks.py                  |  14 +
 swh/loader/package/conda/tests/__init__.py         |   0
 swh/loader/package/conda/tests/data/fake_conda.sh  | 231 +++++++++++
 ...inux-64_lifetimes-0.11.1-py36h9f0ad1d_1.tar.bz2 | Bin 0 -> 1742 bytes
 ...inux-64_lifetimes-0.11.1-py36hc560c46_1.tar.bz2 | Bin 0 -> 1286 bytes
 swh/loader/package/conda/tests/test_conda.py       | 133 ++++++
 swh/loader/package/conda/tests/test_tasks.py       |  24 ++
 swh/loader/package/crates/loader.py                | 304 +++++---------
 swh/loader/package/crates/tests/data/expected.json | 133 ++++++
 .../package/crates/tests/data/fake_crates.sh       |  15 +-
 .../data/https_crates.io/api_v1_crates_hg-core     |   2 -
 .../data/https_crates.io/api_v1_crates_micro-timer |   2 -
 .../crates_hg-core_hg-core-0.0.1.crate             | Bin 426 -> 427 bytes
 .../crates_micro-timer_micro-timer-0.1.0.crate     | Bin 456 -> 484 bytes
 .../crates_micro-timer_micro-timer-0.1.1.crate     | Bin 458 -> 456 bytes
 .../crates_micro-timer_micro-timer-0.1.2.crate     | Bin 485 -> 484 bytes
 .../crates_micro-timer_micro-timer-0.2.0.crate     | Bin 419 -> 419 bytes
 .../crates_micro-timer_micro-timer-0.2.1.crate     | Bin 420 -> 420 bytes
 .../crates_micro-timer_micro-timer-0.3.0.crate     | Bin 413 -> 419 bytes
 .../crates_micro-timer_micro-timer-0.3.1.crate     | Bin 421 -> 416 bytes
 .../crates_micro-timer_micro-timer-0.4.0.crate     | Bin 417 -> 419 bytes
 swh/loader/package/crates/tests/test_crates.py     | 459 +++++++++++++++------
 swh/loader/package/crates/tests/test_tasks.py      |  18 +-
 27 files changed, 1188 insertions(+), 352 deletions(-)
 create mode 100644 swh/loader/package/conda/__init__.py
 create mode 100644 swh/loader/package/conda/loader.py
 create mode 100644 swh/loader/package/conda/tasks.py
 create mode 100644 swh/loader/package/conda/tests/__init__.py
 create mode 100644 swh/loader/package/conda/tests/data/fake_conda.sh
 create mode 100644 swh/loader/package/conda/tests/data/https_conda.anaconda.org/conda-forge_linux-64_lifetimes-0.11.1-py36h9f0ad1d_1.tar.bz2
 create mode 100644 swh/loader/package/conda/tests/data/https_conda.anaconda.org/conda-forge_linux-64_lifetimes-0.11.1-py36hc560c46_1.tar.bz2
 create mode 100644 swh/loader/package/conda/tests/test_conda.py
 create mode 100644 swh/loader/package/conda/tests/test_tasks.py
 create mode 100644 swh/loader/package/crates/tests/data/expected.json
 delete mode 100644 swh/loader/package/crates/tests/data/https_crates.io/api_v1_crates_hg-core
 delete mode 100644 swh/loader/package/crates/tests/data/https_crates.io/api_v1_crates_micro-timer
Changes applied before test
commit 515fbdd935c1f3c078264ebded1bee98af7b738f
Author: Franck Bret <franck.bret@octobus.net>
Date:   Tue Aug 2 09:21:24 2022 +0200

    crates: Loader implements incremental mode
    
    Add incremental support based on sha256 EXTID
    Manage release date for each versions of a package
    Adapt test dataset and add incremental test cases
    
    Related T4104

commit 74289c868125a4d08743a1b2f00a4cd22410e1ad
Author: Franck Bret <franck.bret@octobus.net>
Date:   Wed Sep 28 16:23:45 2022 +0200

    Conda: Anaconda packages archive loader
    
    For each origin it takes advantage of 'artifacts' data send through
    'extra_loader_arguments' of the conda lister, providing versions,
    archive url, checksum, etc.
    Author and description are extracted from intrinsic metadata.
    
    Related T4579

Link to build: https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/967/
See console output for more information: https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/967/console

Temporaly add verbose options to pytest to understand why CI fail

Build has FAILED

Patch application report for D8171 (id=31189)

Could not rebase; Attempt merge onto 8aa6dab72a...

Updating 8aa6dab..df538b0
Fast-forward
 docs/package-loader-specifications.rst             |  19 +-
 setup.py                                           |   1 +
 swh/loader/package/conda/__init__.py               |  17 +
 swh/loader/package/conda/loader.py                 | 168 ++++++++
 swh/loader/package/conda/tasks.py                  |  14 +
 swh/loader/package/conda/tests/__init__.py         |   0
 swh/loader/package/conda/tests/data/fake_conda.sh  | 231 +++++++++++
 ...inux-64_lifetimes-0.11.1-py36h9f0ad1d_1.tar.bz2 | Bin 0 -> 1742 bytes
 ...inux-64_lifetimes-0.11.1-py36hc560c46_1.tar.bz2 | Bin 0 -> 1286 bytes
 swh/loader/package/conda/tests/test_conda.py       | 133 ++++++
 swh/loader/package/conda/tests/test_tasks.py       |  24 ++
 swh/loader/package/crates/loader.py                | 304 +++++---------
 swh/loader/package/crates/tests/data/expected.json | 133 ++++++
 .../package/crates/tests/data/fake_crates.sh       |  15 +-
 .../data/https_crates.io/api_v1_crates_hg-core     |   2 -
 .../data/https_crates.io/api_v1_crates_micro-timer |   2 -
 .../crates_hg-core_hg-core-0.0.1.crate             | Bin 426 -> 427 bytes
 .../crates_micro-timer_micro-timer-0.1.0.crate     | Bin 456 -> 484 bytes
 .../crates_micro-timer_micro-timer-0.1.1.crate     | Bin 458 -> 456 bytes
 .../crates_micro-timer_micro-timer-0.1.2.crate     | Bin 485 -> 484 bytes
 .../crates_micro-timer_micro-timer-0.2.0.crate     | Bin 419 -> 419 bytes
 .../crates_micro-timer_micro-timer-0.2.1.crate     | Bin 420 -> 420 bytes
 .../crates_micro-timer_micro-timer-0.3.0.crate     | Bin 413 -> 419 bytes
 .../crates_micro-timer_micro-timer-0.3.1.crate     | Bin 421 -> 416 bytes
 .../crates_micro-timer_micro-timer-0.4.0.crate     | Bin 417 -> 419 bytes
 swh/loader/package/crates/tests/test_crates.py     | 459 +++++++++++++++------
 swh/loader/package/crates/tests/test_tasks.py      |  18 +-
 tox.ini                                            |   1 +
 28 files changed, 1189 insertions(+), 352 deletions(-)
 create mode 100644 swh/loader/package/conda/__init__.py
 create mode 100644 swh/loader/package/conda/loader.py
 create mode 100644 swh/loader/package/conda/tasks.py
 create mode 100644 swh/loader/package/conda/tests/__init__.py
 create mode 100644 swh/loader/package/conda/tests/data/fake_conda.sh
 create mode 100644 swh/loader/package/conda/tests/data/https_conda.anaconda.org/conda-forge_linux-64_lifetimes-0.11.1-py36h9f0ad1d_1.tar.bz2
 create mode 100644 swh/loader/package/conda/tests/data/https_conda.anaconda.org/conda-forge_linux-64_lifetimes-0.11.1-py36hc560c46_1.tar.bz2
 create mode 100644 swh/loader/package/conda/tests/test_conda.py
 create mode 100644 swh/loader/package/conda/tests/test_tasks.py
 create mode 100644 swh/loader/package/crates/tests/data/expected.json
 delete mode 100644 swh/loader/package/crates/tests/data/https_crates.io/api_v1_crates_hg-core
 delete mode 100644 swh/loader/package/crates/tests/data/https_crates.io/api_v1_crates_micro-timer
Changes applied before test
commit df538b07a5cff3583dc74187ca46db31c52fe3d1
Author: Franck Bret <franck.bret@octobus.net>
Date:   Tue Aug 2 09:21:24 2022 +0200

    crates: Loader implements incremental mode
    
    Add incremental support based on sha256 EXTID
    Manage release date for each versions of a package
    Adapt test dataset and add incremental test cases
    
    Related T4104

commit 74289c868125a4d08743a1b2f00a4cd22410e1ad
Author: Franck Bret <franck.bret@octobus.net>
Date:   Wed Sep 28 16:23:45 2022 +0200

    Conda: Anaconda packages archive loader
    
    For each origin it takes advantage of 'artifacts' data send through
    'extra_loader_arguments' of the conda lister, providing versions,
    archive url, checksum, etc.
    Author and description are extracted from intrinsic metadata.
    
    Related T4579

Link to build: https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/968/
See console output for more information: https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/968/console

Add pytest verbose option to tox (previous failed)

Build has FAILED

Patch application report for D8171 (id=31192)

Could not rebase; Attempt merge onto 8aa6dab72a...

Updating 8aa6dab..818c349
Fast-forward
 docs/package-loader-specifications.rst             |  19 +-
 setup.py                                           |   1 +
 swh/loader/package/conda/__init__.py               |  17 +
 swh/loader/package/conda/loader.py                 | 168 ++++++++
 swh/loader/package/conda/tasks.py                  |  14 +
 swh/loader/package/conda/tests/__init__.py         |   0
 swh/loader/package/conda/tests/data/fake_conda.sh  | 231 +++++++++++
 ...inux-64_lifetimes-0.11.1-py36h9f0ad1d_1.tar.bz2 | Bin 0 -> 1742 bytes
 ...inux-64_lifetimes-0.11.1-py36hc560c46_1.tar.bz2 | Bin 0 -> 1286 bytes
 swh/loader/package/conda/tests/test_conda.py       | 133 ++++++
 swh/loader/package/conda/tests/test_tasks.py       |  24 ++
 swh/loader/package/crates/loader.py                | 304 +++++---------
 swh/loader/package/crates/tests/data/expected.json | 133 ++++++
 .../package/crates/tests/data/fake_crates.sh       |  15 +-
 .../data/https_crates.io/api_v1_crates_hg-core     |   2 -
 .../data/https_crates.io/api_v1_crates_micro-timer |   2 -
 .../crates_hg-core_hg-core-0.0.1.crate             | Bin 426 -> 427 bytes
 .../crates_micro-timer_micro-timer-0.1.0.crate     | Bin 456 -> 484 bytes
 .../crates_micro-timer_micro-timer-0.1.1.crate     | Bin 458 -> 456 bytes
 .../crates_micro-timer_micro-timer-0.1.2.crate     | Bin 485 -> 484 bytes
 .../crates_micro-timer_micro-timer-0.2.0.crate     | Bin 419 -> 419 bytes
 .../crates_micro-timer_micro-timer-0.2.1.crate     | Bin 420 -> 420 bytes
 .../crates_micro-timer_micro-timer-0.3.0.crate     | Bin 413 -> 419 bytes
 .../crates_micro-timer_micro-timer-0.3.1.crate     | Bin 421 -> 416 bytes
 .../crates_micro-timer_micro-timer-0.4.0.crate     | Bin 417 -> 419 bytes
 swh/loader/package/crates/tests/test_crates.py     | 459 +++++++++++++++------
 swh/loader/package/crates/tests/test_tasks.py      |  18 +-
 tox.ini                                            |   2 +-
 28 files changed, 1189 insertions(+), 353 deletions(-)
 create mode 100644 swh/loader/package/conda/__init__.py
 create mode 100644 swh/loader/package/conda/loader.py
 create mode 100644 swh/loader/package/conda/tasks.py
 create mode 100644 swh/loader/package/conda/tests/__init__.py
 create mode 100644 swh/loader/package/conda/tests/data/fake_conda.sh
 create mode 100644 swh/loader/package/conda/tests/data/https_conda.anaconda.org/conda-forge_linux-64_lifetimes-0.11.1-py36h9f0ad1d_1.tar.bz2
 create mode 100644 swh/loader/package/conda/tests/data/https_conda.anaconda.org/conda-forge_linux-64_lifetimes-0.11.1-py36hc560c46_1.tar.bz2
 create mode 100644 swh/loader/package/conda/tests/test_conda.py
 create mode 100644 swh/loader/package/conda/tests/test_tasks.py
 create mode 100644 swh/loader/package/crates/tests/data/expected.json
 delete mode 100644 swh/loader/package/crates/tests/data/https_crates.io/api_v1_crates_hg-core
 delete mode 100644 swh/loader/package/crates/tests/data/https_crates.io/api_v1_crates_micro-timer
Changes applied before test
commit 818c34980a3dc253b35ad0e6b8c0a3ab9a453830
Author: Franck Bret <franck.bret@octobus.net>
Date:   Tue Aug 2 09:21:24 2022 +0200

    crates: Loader implements incremental mode
    
    Add incremental support based on sha256 EXTID
    Manage release date for each versions of a package
    Adapt test dataset and add incremental test cases
    
    Related T4104

commit 74289c868125a4d08743a1b2f00a4cd22410e1ad
Author: Franck Bret <franck.bret@octobus.net>
Date:   Wed Sep 28 16:23:45 2022 +0200

    Conda: Anaconda packages archive loader
    
    For each origin it takes advantage of 'artifacts' data send through
    'extra_loader_arguments' of the conda lister, providing versions,
    archive url, checksum, etc.
    Author and description are extracted from intrinsic metadata.
    
    Related T4579

Link to build: https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/969/
See console output for more information: https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/969/console

Adapt raw extrinsic metadata tests for CI

Build is green

Patch application report for D8171 (id=31195)

Could not rebase; Attempt merge onto 8aa6dab72a...

Updating 8aa6dab..9365e42
Fast-forward
 docs/package-loader-specifications.rst             |  19 +-
 setup.py                                           |   1 +
 swh/loader/package/conda/__init__.py               |  17 +
 swh/loader/package/conda/loader.py                 | 168 ++++++++
 swh/loader/package/conda/tasks.py                  |  14 +
 swh/loader/package/conda/tests/__init__.py         |   0
 swh/loader/package/conda/tests/data/fake_conda.sh  | 231 ++++++++++
 ...inux-64_lifetimes-0.11.1-py36h9f0ad1d_1.tar.bz2 | Bin 0 -> 1742 bytes
 ...inux-64_lifetimes-0.11.1-py36hc560c46_1.tar.bz2 | Bin 0 -> 1286 bytes
 swh/loader/package/conda/tests/test_conda.py       | 133 ++++++
 swh/loader/package/conda/tests/test_tasks.py       |  24 ++
 swh/loader/package/crates/loader.py                | 304 +++++--------
 swh/loader/package/crates/tests/data/expected.json | 133 ++++++
 .../package/crates/tests/data/fake_crates.sh       |  15 +-
 .../data/https_crates.io/api_v1_crates_hg-core     |   2 -
 .../data/https_crates.io/api_v1_crates_micro-timer |   2 -
 .../crates_hg-core_hg-core-0.0.1.crate             | Bin 426 -> 427 bytes
 .../crates_micro-timer_micro-timer-0.1.0.crate     | Bin 456 -> 484 bytes
 .../crates_micro-timer_micro-timer-0.1.1.crate     | Bin 458 -> 456 bytes
 .../crates_micro-timer_micro-timer-0.1.2.crate     | Bin 485 -> 484 bytes
 .../crates_micro-timer_micro-timer-0.2.0.crate     | Bin 419 -> 419 bytes
 .../crates_micro-timer_micro-timer-0.2.1.crate     | Bin 420 -> 420 bytes
 .../crates_micro-timer_micro-timer-0.3.0.crate     | Bin 413 -> 419 bytes
 .../crates_micro-timer_micro-timer-0.3.1.crate     | Bin 421 -> 416 bytes
 .../crates_micro-timer_micro-timer-0.4.0.crate     | Bin 417 -> 419 bytes
 swh/loader/package/crates/tests/test_crates.py     | 473 +++++++++++++++------
 swh/loader/package/crates/tests/test_tasks.py      |  18 +-
 27 files changed, 1203 insertions(+), 351 deletions(-)
 create mode 100644 swh/loader/package/conda/__init__.py
 create mode 100644 swh/loader/package/conda/loader.py
 create mode 100644 swh/loader/package/conda/tasks.py
 create mode 100644 swh/loader/package/conda/tests/__init__.py
 create mode 100644 swh/loader/package/conda/tests/data/fake_conda.sh
 create mode 100644 swh/loader/package/conda/tests/data/https_conda.anaconda.org/conda-forge_linux-64_lifetimes-0.11.1-py36h9f0ad1d_1.tar.bz2
 create mode 100644 swh/loader/package/conda/tests/data/https_conda.anaconda.org/conda-forge_linux-64_lifetimes-0.11.1-py36hc560c46_1.tar.bz2
 create mode 100644 swh/loader/package/conda/tests/test_conda.py
 create mode 100644 swh/loader/package/conda/tests/test_tasks.py
 create mode 100644 swh/loader/package/crates/tests/data/expected.json
 delete mode 100644 swh/loader/package/crates/tests/data/https_crates.io/api_v1_crates_hg-core
 delete mode 100644 swh/loader/package/crates/tests/data/https_crates.io/api_v1_crates_micro-timer
Changes applied before test
commit 9365e42957c39f83a3fd9748dfc1b0476548c5e1
Author: Franck Bret <franck.bret@octobus.net>
Date:   Tue Aug 2 09:21:24 2022 +0200

    crates: Loader implements incremental mode
    
    Add incremental support based on sha256 EXTID
    Manage release date for each versions of a package
    Adapt test dataset and add incremental test cases
    
    Related T4104

commit 74289c868125a4d08743a1b2f00a4cd22410e1ad
Author: Franck Bret <franck.bret@octobus.net>
Date:   Wed Sep 28 16:23:45 2022 +0200

    Conda: Anaconda packages archive loader
    
    For each origin it takes advantage of 'artifacts' data send through
    'extra_loader_arguments' of the conda lister, providing versions,
    archive url, checksum, etc.
    Author and description are extracted from intrinsic metadata.
    
    Related T4579

See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/971/ for more details.

swh/loader/package/crates/loader.py
191

you can keep the version in crate metadata, there is no format specification for it.

199–211

@franckbret , ping for this important change to handle.

franckbret marked an inline comment as done.

Manage checksums

Build is green

Patch application report for D8171 (id=31219)

Could not rebase; Attempt merge onto 8aa6dab72a...

Updating 8aa6dab..da35d82
Fast-forward
 docs/package-loader-specifications.rst             |  19 +-
 setup.py                                           |   1 +
 swh/loader/package/conda/__init__.py               |  17 +
 swh/loader/package/conda/loader.py                 | 168 ++++++++
 swh/loader/package/conda/tasks.py                  |  14 +
 swh/loader/package/conda/tests/__init__.py         |   0
 swh/loader/package/conda/tests/data/fake_conda.sh  | 231 ++++++++++
 ...inux-64_lifetimes-0.11.1-py36h9f0ad1d_1.tar.bz2 | Bin 0 -> 1742 bytes
 ...inux-64_lifetimes-0.11.1-py36hc560c46_1.tar.bz2 | Bin 0 -> 1286 bytes
 swh/loader/package/conda/tests/test_conda.py       | 133 ++++++
 swh/loader/package/conda/tests/test_tasks.py       |  24 ++
 swh/loader/package/crates/loader.py                | 304 +++++--------
 swh/loader/package/crates/tests/data/expected.json | 133 ++++++
 .../package/crates/tests/data/fake_crates.sh       |  15 +-
 .../data/https_crates.io/api_v1_crates_hg-core     |   2 -
 .../data/https_crates.io/api_v1_crates_micro-timer |   2 -
 .../crates_hg-core_hg-core-0.0.1.crate             | Bin 426 -> 427 bytes
 .../crates_micro-timer_micro-timer-0.1.0.crate     | Bin 456 -> 484 bytes
 .../crates_micro-timer_micro-timer-0.1.1.crate     | Bin 458 -> 456 bytes
 .../crates_micro-timer_micro-timer-0.1.2.crate     | Bin 485 -> 484 bytes
 .../crates_micro-timer_micro-timer-0.2.0.crate     | Bin 419 -> 419 bytes
 .../crates_micro-timer_micro-timer-0.2.1.crate     | Bin 420 -> 420 bytes
 .../crates_micro-timer_micro-timer-0.3.0.crate     | Bin 413 -> 419 bytes
 .../crates_micro-timer_micro-timer-0.3.1.crate     | Bin 421 -> 416 bytes
 .../crates_micro-timer_micro-timer-0.4.0.crate     | Bin 417 -> 419 bytes
 swh/loader/package/crates/tests/test_crates.py     | 479 +++++++++++++++------
 swh/loader/package/crates/tests/test_tasks.py      |  18 +-
 27 files changed, 1209 insertions(+), 351 deletions(-)
 create mode 100644 swh/loader/package/conda/__init__.py
 create mode 100644 swh/loader/package/conda/loader.py
 create mode 100644 swh/loader/package/conda/tasks.py
 create mode 100644 swh/loader/package/conda/tests/__init__.py
 create mode 100644 swh/loader/package/conda/tests/data/fake_conda.sh
 create mode 100644 swh/loader/package/conda/tests/data/https_conda.anaconda.org/conda-forge_linux-64_lifetimes-0.11.1-py36h9f0ad1d_1.tar.bz2
 create mode 100644 swh/loader/package/conda/tests/data/https_conda.anaconda.org/conda-forge_linux-64_lifetimes-0.11.1-py36hc560c46_1.tar.bz2
 create mode 100644 swh/loader/package/conda/tests/test_conda.py
 create mode 100644 swh/loader/package/conda/tests/test_tasks.py
 create mode 100644 swh/loader/package/crates/tests/data/expected.json
 delete mode 100644 swh/loader/package/crates/tests/data/https_crates.io/api_v1_crates_hg-core
 delete mode 100644 swh/loader/package/crates/tests/data/https_crates.io/api_v1_crates_micro-timer
Changes applied before test
commit da35d82f47a569d09e7f302002a6549270a5f57e
Author: Franck Bret <franck.bret@octobus.net>
Date:   Tue Aug 2 09:21:24 2022 +0200

    crates: Loader implements incremental mode
    
    Add incremental support based on sha256 EXTID
    Manage release date for each versions of a package
    Adapt test dataset and add incremental test cases
    
    Related T4104

commit 74289c868125a4d08743a1b2f00a4cd22410e1ad
Author: Franck Bret <franck.bret@octobus.net>
Date:   Wed Sep 28 16:23:45 2022 +0200

    Conda: Anaconda packages archive loader
    
    For each origin it takes advantage of 'artifacts' data send through
    'extra_loader_arguments' of the conda lister, providing versions,
    archive url, checksum, etc.
    Author and description are extracted from intrinsic metadata.
    
    Related T4579

See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/978/ for more details.

Looks good to me, thanks !

vlorentz added inline comments.
swh/loader/package/crates/loader.py
52–54

It's missing the date.

(All fields used to build release objects should be covered by this manifest)

203–206

same comment as on the other diffs; original-artifacts-json should already be created by the base package loader

224

any(authors) here too

This revision now requires changes to proceed.Oct 13 2022, 10:30 AM
franckbret marked 3 inline comments as done.

Remove original-artifacts-json from raw extrinsic metadata, it should already be created by the base package loader

Add last_update to manifest

Build is green

Patch application report for D8171 (id=31342)

Rebasing onto a13e3e6f35...

Current branch diff-target is up to date.
Changes applied before test
commit a72f68e4f9d14f36f654f8e0b85690ef1bcdf480
Author: Franck Bret <franck.bret@octobus.net>
Date:   Tue Aug 2 09:21:24 2022 +0200

    crates: Loader implements incremental mode
    
    Add incremental support based on sha256 EXTID
    Manage release date for each versions of a package
    Adapt test dataset and add incremental test cases
    
    Related T4104

See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/998/ for more details.