Page MenuHomeSoftware Heritage

Conda: List origins from anaconda.com, the Package, dependency and environment management for any language
ClosedPublic

Authored by franckbret on Sep 21 2022, 2:53 PM.

Diff Detail

Repository
rDLS Listers
Branch
conda
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 31676
Build 49557: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 49556: arc lint + arc unit

Event Timeline

Build is green

Patch application report for D8517 (id=30668)

Rebasing onto bd35d54398...

First, rewinding head to replay your work on top of it...
Applying: [WIP] Conda: List origins from anaconda.com, the Package, dependency and environment management for any language
Changes applied before test
commit 587b8853600d4e1d08c0930662fe4edaa21fe35c
Author: Franck Bret <franck.bret@octobus.net>
Date:   Wed Sep 21 14:44:56 2022 +0200

    [WIP] Conda: List origins from anaconda.com, the Package, dependency and environment management for any language
    
    Related T4547

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/670/ for more details.

anlambert added a subscriber: anlambert.

Overall, looks god to me but there is still some rooms for a couple of improvements (see inline comments).

swh/lister/conda/lister.py
5

Please add an empty line after license header.

39

We should download compressed version instead.

REPO_URL_PATTERN = "{url}/{channel}/{arch}/repodata.json.bz2"
94

For the compressed version, we need to use:

packages = json.loads(bz2.decompress(response.content))["packages"]
102–138

Some nitpicks about more compact code and better naming:

for filename, package_metdata in packages.items():
    artifact = {
        "filename": filename,
        "url": self.ARCHIVE_URL_PATTERN.format(
            url=self.url,
            channel=self.channel,
            filename=filename,
            arch=arch,
        ),
        "version": package_metdata["version"],
        "checksums": {},
    }

    for checksum in ("md5", "sha256"):
        if checksum in package_metdata:
            artifact["checksums"][checksum] = package_metdata[checksum]

    version_key = (
        f"{arch}/{package_metdata['version']}-{package_metdata['build']}"
    )
    self.packages[package_metdata["name"]][version_key] = artifact

    package_date = None
    if "timestamp" in package_metdata:
        package_date = datetime.datetime.fromtimestamp(
            package_metdata["timestamp"] / 1e3, datetime.timezone.utc
        )
    elif "date" in package_metdata:
        package_date = iso8601.parse_date(package_metdata["date"])

    last_update = None
    if package_date:
        artifact["date"] = package_date.isoformat()
        self.package_dates[package_metdata["name"]].append(package_date)
        last_update = max(self.package_dates[package_metdata["name"]])
This revision now requires changes to proceed.Sep 21 2022, 4:17 PM
franckbret marked 4 inline comments as done.

Lister improvment

Use json.bz2 endpoints instead of .json
Remove msys related fixtures as those arch do not have source code to download
Tests check artifacts

Build is green

Patch application report for D8517 (id=30697)

Rebasing onto 9b3e565cf7...

First, rewinding head to replay your work on top of it...
Applying: Conda: List origins from anaconda.com, the Package, dependency and environment management for any language
Changes applied before test
commit 384faae8b1edf07bf107caf555a2308ba56a28f5
Author: Franck Bret <franck.bret@octobus.net>
Date:   Wed Sep 21 14:44:56 2022 +0200

    Conda: List origins from anaconda.com, the Package, dependency and environment management for any language
    
    Related T4547

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/674/ for more details.

franckbret retitled this revision from [WIP] Conda: List origins from anaconda.com, the Package, dependency and environment management for any language to Conda: List origins from anaconda.com, the Package, dependency and environment management for any language.Sep 22 2022, 7:51 PM

Add documentation for lister usage

Build has FAILED

Patch application report for D8517 (id=30698)

Rebasing onto 9b3e565cf7...

First, rewinding head to replay your work on top of it...
Applying: Conda: List origins for Anaconda, the package manager that provides tooling for datascience
Changes applied before test
commit 011f60a8f0398dd6c60f8081200939f5bd62c2b4
Author: Franck Bret <franck.bret@octobus.net>
Date:   Wed Sep 21 14:44:56 2022 +0200

    Conda: List origins for Anaconda, the package manager that provides tooling for datascience
    
    Related T4547

Link to build: https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/675/
See console output for more information: https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/675/console

Add missing documentation link

Build is green

Patch application report for D8517 (id=30699)

Rebasing onto 9b3e565cf7...

First, rewinding head to replay your work on top of it...
Applying: Conda: List origins for Anaconda, the package manager that provides tooling for datascience
Changes applied before test
commit 0c3e50fe0fb41832250753011f7aaad5fff710c7
Author: Franck Bret <franck.bret@octobus.net>
Date:   Wed Sep 21 14:44:56 2022 +0200

    Conda: List origins for Anaconda, the package manager that provides tooling for datascience
    
    Related T4547

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/676/ for more details.

Doc: Add missing quote to docker command example to add a conda task

Build is green

Patch application report for D8517 (id=30700)

Rebasing onto 9b3e565cf7...

First, rewinding head to replay your work on top of it...
Applying: Conda: List origins for Anaconda, the package manager that provides tooling for datascience
Changes applied before test
commit 3f53628d68f2c13ad86a32697abaa7945dc23f9c
Author: Franck Bret <franck.bret@octobus.net>
Date:   Wed Sep 21 14:44:56 2022 +0200

    Conda: List origins for Anaconda, the package manager that provides tooling for datascience
    
    Related T4547

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/677/ for more details.

Some test report after running the lister (lister only, no loader yet) with following command:

swh scheduler task add list-conda channel="free" archs='["linux-64", "osx-64", "win-64"]' -p oneshot
swh-lister_1                         | [2022-09-23 07:39:37,418: INFO/ForkPoolWorker-1] Task swh.lister.conda.tasks.CondaListerTask[91139a37-dee3-4ebc-9ffd-2c8177e4b2d3] succeeded in 51.53754648496397s: {'pages': 3, 'origins': 2161}

Make use of http_retry instead of throttling_retry decorator after D8519

Build is green

Patch application report for D8517 (id=30752)

Rebasing onto d5c30a3ce3...

Current branch diff-target is up to date.
Changes applied before test
commit 928cd6baede495906ed1f07b3a997de851e332fd
Author: Franck Bret <franck.bret@octobus.net>
Date:   Wed Sep 21 14:44:56 2022 +0200

    Conda: List origins for Anaconda, the package manager that provides tooling for datascience
    
    Related T4547

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/692/ for more details.

Still some changes to bring to that diff now HTTP requests and user-agent setting codes have been deduplicated.

swh/lister/conda/__init__.py
99

We now use docker compose v2, command should now be: docker compose up -d

101–107

This should be the preferred way to execute a lister through docker.

Then schedule a conda listing task::

   docker compose exec swh-scheduler swh scheduler task add -p oneshot list-conda channel="free" archs="[linux-64, osx-64, win-64]"

You can follow lister execution by displaying logs of swh-lister service::

   docker compose logs -f swh-lister
swh/lister/conda/lister.py
24–27

You can remove user agent setting code, it is now handled in base lister class.

72–87

You can remove that method and use self.http_request method from base lister class instead

This revision now requires changes to proceed.Sep 26 2022, 4:03 PM
franckbret marked 4 inline comments as done.

Make use of http_request after D8520

Build is green

Patch application report for D8517 (id=30789)

Rebasing onto fd1a4244a0...

Current branch diff-target is up to date.
Changes applied before test
commit 011157136966c7d93a7bddb2a49705103c93257a
Author: Franck Bret <franck.bret@octobus.net>
Date:   Wed Sep 21 14:44:56 2022 +0200

    Conda: List origins for Anaconda, the package manager that provides tooling for datascience
    
    Related T4547

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/700/ for more details.

Looks good to me, thanks !

This revision is now accepted and ready to land.Sep 27 2022, 10:45 AM