Page MenuHomeSoftware Heritage

[wip] swh.lister.functionalPackages: add lister getting sources from a JSON file
Needs ReviewPublic

Authored by lewo on Sep 22 2019, 8:59 PM.

Details

Summary
swh.lister.functionalPackages: add lister getting sources from a JSON file

This lister downloads a JSON file containing a list of sources
provided by the NixOS and Guix distribution. This file looks like:

    {
      "version": 1
      "sources": [
        {
          "type": "url",
          "url": "https://ftpmirror.gnu.org//hello/hello-2.10.tar.gz"
        }
      ],
    }

This is a work in progress lister and we need to work on several points:

  • define a JSON format
  • expose the JSON file from a NixOS community managed server (edit(lewo): i'm working on this)

Diff Detail

Repository
rDLS Listers
Branch
json-lister
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 10163
Build 15077: tox-on-jenkinsJenkins
Build 15076: arc lint + arc unit

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
ardumont edited the summary of this revision. (Show Details)Oct 11 2019, 6:28 PM
ardumont edited the summary of this revision. (Show Details)
ardumont edited the summary of this revision. (Show Details)
ardumont edited the summary of this revision. (Show Details)
ardumont added a project: Lister.
ardumont edited the summary of this revision. (Show Details)
ardumont added inline comments.Oct 15 2019, 5:21 PM
swh/lister/json/lister.py
32 ↗(On Diff #6807)

You could drop the package's name i guess.

33 ↗(On Diff #6807)

I guess we could:

  • drop the package's name (it's mostly unused in other listers and i'm dropping when i can)
  • use the named parameter instead (it's clearer, also when introspecting the scheduler db)
  • rename tarballs to packages.

About the integrity field, i guess we can split it to explicitely name it with its hash...
So something like this would do:

packages = build_packages(...)  # < to clear things up a bit
# where packages is of the form:
# {{'uri': origin_url, 'date': <some-date-isoformat>, "sha256": "MeBmE3qWJnbon2nRtlOC3pWn732RS4y5VvQepy4PUWs="}]

return utils.create_task_dict(
            'load-tar', kwargs.get('policy', 'oneshot'),
            origin=origin_url,
            packages=packages)

@douardda ^ what do you think?

ardumont added inline comments.Oct 15 2019, 5:26 PM
swh/lister/json/lister.py
33 ↗(On Diff #6807)

If we could have the version of the package also (within the packages's entries), that'd be awesome.

ardumont added inline comments.Oct 15 2019, 5:44 PM
swh/lister/json/lister.py
33 ↗(On Diff #6807)

Also for the hash, i mean the base64 decoded value as ascii string...

so packages really becomes:

{{'uri': origin_url, 'date': <some-date-isoformat>, "sha256": "31e066137a962676e89f69d1b65382de95a7ef7d914b8cb956f41ea72e0f516b"}]

That unifies with other existing lister output and loader expectations.

The following will help:

from typing import Tuple


def integrity_to_hash(integrity_value: str) -> Tuple[str, str]:
    """Parse an integrity field into a field (hash_name, hash_hex) [1] 

   [1] https://www.w3.org/TR/SRI

    """
    hash_name, base64_value = integrity_value.split('-')

    from base64 import b64decode
    from binascii import hexlify

    hash_hex = hexlify(b64decode(base64_value)).decode('utf-8')
    return hash_name, hash_hex


def test_integrity_to_hash():
    """Parsing an integrity field hash should return a tuple hash_name, hash_hex strings

    """
    actual_hash_name, actual_hash_hex = integrity_to_hash(
        'sha256-MeBmE3qWJnbon2nRtlOC3pWn732RS4y5VvQepy4PUWs=')

    assert hash_name == 'sha256'
    assert hash_hex == '31e066137a962676e89f69d1b65382de95a7ef7d914b8cb956f41ea72e0f516b'  # noqa
lewo marked an inline comment as done.Oct 15 2019, 6:26 PM
lewo added inline comments.
swh/lister/json/lister.py
33 ↗(On Diff #6807)

Unfortunately, it's difficult to get the version.
In nixpkgs, we basically have this kind of structure:

packages = [
  hello = [
    name = "hello"
    version = "1.0"
    buildRecipe = "make"
    src = {
       url = "http://gnu/hello-1.0.tgz"
       sha = "bla"
    }
  ]

So, the src attribute is not versioned. The version is on the package level.
And this can be much more complex, one package can use several patches, and several sources.

Just one note: the name is IMHO inappropriate. This is NOT a JSON lister. JSON is nothing but the serialization format used to retrieve some (more or less) well defined structured data.

What defines this lister is its ability to comprehend the very data structure mentioned in this diff's description.

The question is: what data model does this implement? Is it Guix (only)? is it somewhat standardized?

ardumont edited the summary of this revision. (Show Details)Oct 16 2019, 10:11 AM
ardumont added inline comments.Oct 16 2019, 10:20 AM
swh/lister/json/lister.py
33 ↗(On Diff #6807)

Right.

Nonetheless, we could have that lister parse the version from what's provided (here the url then).

Developing the loader tar (D2145) raises interesting questions about the gnu loader [1] and the new one.
Aside the version parsing logic, their implementation is near identical.
This begs the question whether we should push that parsing logic step here in listers and pass along that information to the loader (D2145's description explains the rationale).

[1] https://forge.softwareheritage.org/source/swh-loader-core/browse/package-loader/swh/loader/package/gnu.py$111-187

To develop further, i guess we need some more dataset sample though ;)

ardumont added a comment.EditedOct 16 2019, 10:27 AM

Just one note: the name is IMHO inappropriate. This is NOT a JSON lister. JSON is nothing but the serialization format used to retrieve some (more or less) well defined structured data.
What defines this lister is its ability to comprehend the very data structure mentioned in this diff's description.

Indeed, i proposed a name in D2025#inline-13301 ;)

The question is: what data model does this implement? Is it Guix (only)? is it somewhat standardized?

My understanding of the swh-devel mailing-list discussion[1], standardization in between nix/guix is the way forward.

And to repeat an interesting thing i forgot, guix already exposes its listing [2]

[1] https://sympa.inria.fr/sympa/arc/swh-devel/2019-10/msg00000.html

[2] https://guix.gnu.org/packages.json

lewo added a comment.Oct 17 2019, 9:41 AM

@ardumont @douardda Regarding the lister name, I agree the JSONLister name is not appropriate. @ardumont proposed FunctionalPackageManagerLister or FunctionalPackageLister but I'm not sure they are appropriated too :)

The file ingested by this lister already contains different kind of elements:

If the goal is to provide reproducibility for some package managers, in the future, this could also contains git revisions, or Docker images, png images...

I don't know what is the/your definition of "package", but if we agree on "a patch is not a package", then this lister is more than a "package" lister!

Also, this lister will be used by Guix and Nix, but it could also be used by others. It could provide a quick way for communities to archive their sources with softwareheritage: it's much more simpler to generate and expose a JSON file, ask softwareheritate to setup a lister pointing to this file rather than writing a dedicated lister. For instance, Terraform could expose a file containing pointers to the sources of all the modules they provide (see https://registry.terraform.io/).

So, what about sourcesLister, sourceListLister?

Of course, I'd be 100% ok if you want to use FunctionalPackageLister (which could be easily renamed later if needed) :)

@ardumont @douardda Regarding the lister name, I agree the JSONLister name is not appropriate. @ardumont proposed FunctionalPackageManagerLister or FunctionalPackageLister but I'm not sure they are appropriated too :)

Well, i was trying to make stand-out the common (and awesome) nature of both guix and nix package managers ;)

The file ingested by this lister already contains different kind of elements:

Yes, and it's fine.
It's source code forms.
It should be listed amongst the artifacts for the origin in question.

If the goal is to provide reproducibility for some package managers, in the future, this could also contains git revisions, or Docker images, png images...

The goal is archiving whatever is source code past, present and future.
What we can do further down the line can be reproducibility among other things.
But that would be the swh clients to determine that (using most probably some swh mirrors yet to come ;)

And sure, the lister can scheduled 'git' type origins already.
Or other kind of loading task types by the way (be that: git, svn, hg, debian, deposit, tar, npm, pypi ...).
Actually, there is at least 2 listers which do that already (bitbucket lists hg and git repositories, packagist should be able to list different dvcs as well).

I don't know what is the/your definition of "package", but if we agree on "a patch is not a package", then this lister is more than a "package" lister!

A patch is not a package indeed.
It's a source code artifact for a package though.
So it must be listed as well (within the scope of the list of artifacts).

The definition of a package was not clearly set (for me at least) but the current work on package loader tend to this.
A "source" package is a list of source code artifacts, be that tarballs (.zip, .tar.*, etc...), patches, .dsc, source code (<- insert your techno here) repositories, etc...

Also, this lister will be used by Guix and Nix, but it could also be used by others.

Indeed, well, as a first step, let's try to focus on guix and nix ;)
If adaptations is needed later, we'll do that.

It could provide a quick way for communities to archive their sources with softwareheritage: it's much more simpler to generate and expose a JSON file, ask softwareheritate to setup a lister pointing to this file rather than writing a dedicated lister.

It's becoming rather easy now to add a new lister in charge of adapting the json from the api in question than to ask to adapt the json output...
But i may be wrong.

For instance, Terraform could expose a file containing pointers to the sources of all the modules they provide (see https://registry.terraform.io/).

Indeed, thanks for the pointer.

So, what about sourcesLister, sourceListLister?

That's what other listers do already, they list origins (exposed through some form of apis/websites) which are source code to ingest (dvcs: git, mercurial, svn, ..., plain tarballs: gnu, pypi, npm, debian packages, etc...).
In the end everything this repository lists is source code package in various forms.

Of course, I'd be 100% ok if you want to use FunctionalPackageLister (which could be easily renamed later if needed) :)

Indeed.


Although, when i looked at the guix listing again, the current ouput is different so far.
Will there be work with guix people (@civodul) to align the json output?


By the way, about the version again, are you sure you cannot provide it?
I see it's present in the current guix listing and that would be real swell to have it (instead of trying to parse it).
Experience shows (gnu's...) that's it's not easy to parse without making choices.

By the way, i entertained the idea to write a guix lister with the current api, would that help (that could demo the tests to write as well)?

lewo added a comment.Oct 21 2019, 9:38 AM

By the way, i entertained the idea to write a guix lister with the current api, would that help (that could demo the tests to write as well)?

I think my branch is almost working (i still need to rename the lister). Also, I will try to discuss with some guix people about this format this week.

One thing I have to clarify is the notion of package and artefacts in our case! Actually, we cannot know which artefacts belong to a package. To get the list of sources required by a derivation (kind of Nix package), I recursively walk on all attributes of this derivation to find fixed output derivation (the derivations specifying a source url). Of course, in all of these attribute, I have all build requires of this derivation. This means, for each derivation, I have a tree of sources. It is then really hard to know what is belonging to this derivation and what is not. So, instead of providing this graph, I only provide the list of all sources of this graph.
I think we will only have one artefact per package, otherwise, we will have the gcc artefacts in almost all packages!

lewo updated this revision to Diff 7389.Oct 21 2019, 10:44 AM
lewo edited the summary of this revision. (Show Details)
  • [wip] swh.lister.json: Add lister getting sources from JSON file
  • Rebased

I think my branch is almost working (i still need to rename the lister).

neat ;)

Also, I will try to discuss with some guix people about this format this week.

Cool. Indeed, given the difficulty of this, some feedback loop should be appreciated...

One thing I have to clarify is the notion of package and artefacts in our case!

yes, it's not that easy to grasp in the current nix/guix context ;)

I guess it depends if you decide to do an in-depth walk of the dependency graph or not...
The in-depth sounds like something unreasonable as you will end up having the world of dependencies as artifacts... (say for example, as you mention, down to the compiler's source...)

Actually, we cannot know which artefacts belong to a package. To get the list of sources required by a derivation (kind of Nix package), I recursively walk on all attributes of this derivation to find fixed output derivation (the derivations specifying a source url).
Of course, in all of these attribute, I have all build requires of this derivation. This means, for each derivation, I have a tree of sources. It is then really hard to know what is belonging to this derivation and what is not. So, instead of providing this graph, I only provide the list of all sources of this graph.
I think we will only have one artefact per package, otherwise, we will have the gcc artefacts in almost all packages!

Indeed.

From my rapid look at the the guix listing, it seems some choices were made already as there is not that much of dependencies per package there.
(though nix/guix share the same build approach).
And it seems to be amongst the same trail of thought so that seems reasonable.

Let's just see where your discussion will lead this ;)

Cheers,

Also ci job currently fails for the pep8 violations [1]:

flake8 run-test: commands[0] | /home/jenkins/workspace/DLS/tox/.tox/flake8/bin/python -m flake8
./swh/lister/json/models.py:5:1: F401 'sqlalchemy.Integer' imported but unused
./swh/lister/json/tests/conftest.py:10:1: E302 expected 2 blank lines, found 1

So the tests actually did not even run.
I expect, in the current state, to fail nonetheless because you need to rename your test dataset file (please, check my latest remark ;).

To avoid being avoid by those pep8 violation, you should be able to reproduce those by either using tox (it runs multiple things including flake8 and unittest) or make check at the top-level repository.

[1] https://jenkins.softwareheritage.org/job/DLS/job/tox/458/console

swh/lister/json/tests/data/sources.nixos.org/sources.json
1 ↗(On Diff #7389)

For tests, using the requests_mock_datadir fixture (as seen below), this needs to be renamed to:

data/https_sources.nixos.org/sources.json

Provided the request query done within the lister is:

https://sources.nixos.org/sources.json
zimoun added a subscriber: zimoun.Oct 25 2019, 4:15 PM

Dear,

Since Ludo (@civodul) posted this WIP feature on guix-devel mailing list [1] I am trying to follow this thread and I would like to help.

Currently, what is not clear to me are:

  • the fields of the JSON file. Can we agree on which ones are required?
  • the patches. Do they go in SWH?

Thank for your work.

All the best,
simon


Basically, the Guix definition of a package containing patches is:

(define-public 4store
  (package
    (name "4store")
    (version "1.1.6")
    (source (origin
      (method git-fetch)
      (uri (git-reference
             (url "https://github.com/4store/4store.git")
             (commit (string-append "v" version))))
      (sha256
       (base32 "1kzdfmwpzy64cgqlkcz5v4klwx99w0jk7afckyf7yqbqb4rydmpk"))
      (patches (search-patches "4store-unset-preprocessor-directive.patch"
                               "4store-fix-buildsystem.patch"))))
    (build-system gnu-build-system)
    (blabla blabla)))

and currently https://guix.gnu.org/packages.json exposes:

{
  "name": "4store",
  "version": "1.1.6",
  "source": {
    "type": "git",
    "git_url": "https://github.com/4store/4store.git",
    "git_ref": "v1.1.6"
  },
  "synopsis": "Clustered RDF storage and query engine",
  "homepage": "https://github.com/4store/4store",
  "location": "gnu/packages/databases.scm:134"
}

And this does not exactly matches the format of data/https_sources.nixos.org/sources.json:

  {
    "name": "keyutils-1.6.tar.bz2",
    "source": {
      "hash": "05bi5ja6f3h3kdi7p9dihlqlfrsmi1wh1r2bdgxc0180xh6g5bnk",
      "hashAlgo": "sha256",
      "type": "url",
      "url": "https://people.redhat.com/dhowells/keyutils/keyutils-1.6.tar.bz2",
      "integrity": "sha256-067yDOwABcD6a0vkAHmIVWdHMYWxpXtimwMOZ5QscRU="
    }
}

[1] https://lists.gnu.org/archive/html/guix-devel/2019-09/msg00227.html

lewo added a comment.Oct 26 2019, 10:12 PM

Since Ludo (@civodul) posted this WIP feature on guix-devel mailing list [1] I am trying to follow this thread and I would like to help.

Cool! Welcome aboard ;)

Currently, what is not clear to me are:

  • the fields of the JSON file. Can we agree on which ones are required?

This is something that still needs to be defined and I will publish a detailed comment on that.

  • the patches. Do they go in SWH?

Yes, if we want to be able to rebuild a package from SHW.

lewo added a subscriber: zack.Oct 26 2019, 11:55 PM

I discussed with Ludo and we agreed on the fact the current packages.json file is not really suitable for the SWH usecase.
tl;dr the idea is to expose a list of sources instead of a list of packages.

The objective is to be able to rebuild all packages of nixpkgs (the Nix package set) and the Guix package set even if some sources required by a package disappeared. To achieve that, we need to build the list of these sources, and ingest these sources in SWH.

We were initially exposing a list of packages, but it would be better to expose a list of sources instead. I explain why it is hard in the following.
In nixpkgs, we expose top level packages (gcc, git, emacs,...). These packages can be used as build dependencies by other packages. For instance, we expose gcc which is used by the hello-world package during its build phase. So, if we expose the source of the gcc package and the source of hello-world package, we could rebuild hello-world by fallbacking on SWH. However, nixpkgs is much more complex (and powerful).

Suppose now the latest version of hello-world needs a specific patch on gcc to be compiled. In nixpkgs, we can override the gcc package by providing a patch. To achieve that, we don't create a new version of gcc, we don't expose this gcc as a package. We just override gcc in place, in the hello-world build recipe. If we want to archive all sources required by the`hello-world` package, we would have to archive this patch, which is part of the overridden gcc package:/ In this kind of scenario, it is really hard to add the patch in the list of required sources of the hello-world package without adding all gcc sources.

Let me explain how I generate the list of all sources required by the package set. In Nix, we have two kind of derivations: normal derivations and fixed output derivation. Only the fixed output derivations can access network. These derivations contain the url of sources.
To get sources required by nixpkgs, I traverse the nixpkgs graph to extract all fixed output derivations.
If we consider our previous hello-world example, this means when I traverse the hello-world attribute, i get sources of gcc and the url of the patch. I flatten this graph to build the list of sources.

We could expose a sources.json file containing a list of sources. At the beginning, a source would be an object containing only two fields:

  • type: specifies the type of the source. Currently, only the type url is supported. This field is required.
  • url: specifies the url of the source. This field is required when the type is url.

Here an example of a such list:

[
  {
    "type": "url",
    "url": "https://ftpmirror.gnu.org//hello/hello-2.10.tar.gz"
  },
  {
    "type": "url",
    "url": "https://github.com/curl/curl/commit/5fc28510a4664f4.patch"
  },
]

We could latter add other types (git especially), and some other fields such as

  • an integrity field: optional because we don't always have a usable checksum
  • a name: optional because a patch is not named
  • a version: optional because a patch is generally not versioned
  • ...

@zack You were talking about versioning this file. What about adding a version field attribute to the file such as:

{
  version: 1
  sources: [
    # The list of actual sources 
  ]
}

Note also that having top-level attributes in the file could also be useful for the future to add the nixpkgs commit sha for instance or some other values shared across sources.

An example of this file (only contains all sources of hello-world) is available at http://tmp.abesis.fr/sources.json

What do you think?

civodul added a subscriber: civodul.Nov 5 2019, 3:40 PM

Hi @lewo and all!

I like the spec you've come up with! Like you write, having a JSON file that is "source-oriented" rather than "package-oriented" sounds more appropriate for archiving purposes.

We could expose a sources.json file containing a list of sources. At the beginning, a source would be an object containing only two fields:

  • type: specifies the type of the source. Currently, only the type url is supported. This field is required.
  • url: specifies the url of the source. This field is required when the type is url.

LGTM, though I think we should define the git type right away. For that, we can probably reuse a format similar to that found at https://guix.gnu.org/packages.json, which looks like:

{
    "type": "git",
    "git_url": "https://github.com/pali/0xffff.git",
    "git_ref": "0.8"
}

... where git_ref can be a tag name or a commit ID.

WDYT?

We could latter add other types (git especially), and some other fields such as

  • an integrity field: optional because we don't always have a usable checksum

Sure, that can always come later IMO.

  • a name: optional because a patch is not named
  • a version: optional because a patch is generally not versioned

These would be the name and version of what? Given that this format is "source-oriented", there's no notion of a package, and thus no name and version.

Anyway, if we do find a use for such extensions, I'd say that can come later. :-)

@zack You were talking about versioning this file. What about adding a version field attribute to the file such as:

{
  version: 1
  sources: [
    # The list of actual sources 
  ]
}

That LGTM.

I haven't looked at the implementation, I'm sure fellow SWH hackers will have feedback more useful than I could provide :-). That said, if this format is OK in principle for you @lewo and for SWH, I'm happy to implement it and publish it at guix.gnu.org so we can see what it's like.

Thanks for all the work, @lewo!

lewo added a comment.EditedNov 6 2019, 9:52 PM

LGTM, though I think we should define the git type right away. For that, we can probably reuse a format similar to that found at https://guix.gnu.org/packages.json, which looks like:

{
    "type": "git",
    "git_url": "https://github.com/pali/0xffff.git",
    "git_ref": "0.8"
}

... where git_ref can be a tag name or a commit ID.
WDYT?

The first version of the lister will not support git sources, only tarballs. So, i would prefer to postpone this discussion for a future version of the lister. Note the lister currently removes all sources with a type which is not equal to url (so you could add a git type if you want :/).

I haven't looked at the implementation, I'm sure fellow SWH hackers will have feedback more useful than I could provide :-). That said, if this format is OK in principle for you @lewo and for SWH, I'm happy to implement it and publish it at guix.gnu.org so we can see what it's like.

The implementation is not ready yet but I will work on it on the next following days.
@civodul I could ping you once the lister is working. We could then expose this file on guix.gnu.org and on nixos.org ;) Thx for your comments.

lewo updated this revision to Diff 7683.Nov 6 2019, 9:54 PM
  • wip: switch to the new format
lewo updated this revision to Diff 7800.Nov 13 2019, 9:43 AM
  • Rename JSONLister to FunctionalPackageLister
  • Fix test
  • Cleaning
ardumont added inline comments.Nov 13 2019, 5:06 PM
swh/lister/functional_package/tests/data/sources.nixos.org/sources.json
1 ↗(On Diff #7800)

rename this swh/lister/functional_package/tests/data/https_nixos.org/sources.json.n

for the requests_datadir_fixture to actually find the file (and be able to do its jobs ;).
Also, might be remove some entries, there is no need for so many urls.

For better feedback loop, you can use:

pytest -x --log-level=DEBUG ./swh/lister/functional_package/tests/test_lister.py

or

tox -- -x --log-level=DEBUG -k test_lister_no_page_check_results
lewo updated this revision to Diff 7849.Nov 14 2019, 8:19 PM
  • Move sources.json mock to correct location
lewo marked an inline comment as done.Nov 14 2019, 8:22 PM
lewo added inline comments.
swh/lister/functional_package/tests/data/sources.nixos.org/sources.json
1 ↗(On Diff #7800)

The test was passing locally because I had an older swh-core version. It should now be fixed.

Thanks!

Please, see my latest comments.

swh/lister/functional_package/lister.py
29

Please, drop this duplicated comment and explain a bit what the generated tasks are.

why is there only one artifact per package for example (<- i don't remember the detail, so that has double purposes here ;)

33

So now, this needs to be changed.

The actual loader to use now is swh.loader.core.package.archive.ArchiveLoader.

The task referring to this is:

@shared_task(name=__name__ + '.LoadArchive')
def load_archive(url=None, artifacts=None, identity_artifact_keys=None):
    return ArchiveLoader(url, artifacts,
                         identity_artifact_keys=identity_artifact_keys).load()
...

So your code can change to something like (untested):

return utils.create_task_dict(
            'load-tar', kwargs.get('policy', 'oneshot'),
            url=origin_url,                       # <- prefer to use kwargs instead of args
            artifacts=[{'archive': origin_url}],  # <- only provide what you can
            identity_artifact_keys=['archive'],   # <- unicity key
            retries_left=3)                       # <- that will fail otherwise when actually running

^ then you'd need to adapt the test below.

swh/lister/functional_package/tests/test_lister.py
10

You can remove this.

17

Please remove the print when you are done debugging ;)

25

If you adapt according to my remarks, this need to change as well.

lewo updated this revision to Diff 7968.Nov 21 2019, 12:12 AM
lewo marked an inline comment as done.

Fix ardumont comments

lewo marked 3 inline comments as done.Nov 21 2019, 12:16 AM
lewo added inline comments.
swh/lister/functional_package/lister.py
29

I added a comment.
If you need more details, I explained why the initial package file is not suitable in this context and why we want to expose a file containing a list of sources (instead of packages) in https://forge.softwareheritage.org/D2025#51269.

33

Thanks ;)

ardumont added a comment.EditedNov 21 2019, 2:26 PM

Yes, thanks for the update.

Build has failed
See console output for more information:

https://jenkins.softwareheritage.org/job/DLS/job/tox/494/console

I fixed the ci on latest master.
Something changed in the scheduler (it no longer sets up the loader's task-types the lister generates, thus the current failure here).

Can you please just do the last adaptations?

  • Rebase to latest master
  • Rename 'load-tar' reference to 'load-archive-files'.
  • Then update the diff.

The ci should go back to green

Note: i did the necessary changes to decrease the amount of changes here ;)

Cheers,

swh/lister/functional_package/lister.py
29

Thanks!

gentle ping ;)

lewo updated this revision to Diff 8999.Jan 14 2020, 6:56 PM

Rebase and change load-tar to load-archive-files

lewo added a comment.Jan 14 2020, 6:58 PM

Sorry for the delay... I will be more responsive now.

ardumont added inline comments.Jan 16 2020, 1:09 PM
swh/lister/functional_package/tests/test_lister.py
12

test must be failing because of the old load-tar reference here, if you change that to load-archive-files, you should find what you listed ;)

Sorry for the delay... I will be more responsive now.

no problem ;)

lewo updated this revision to Diff 9099.Jan 16 2020, 7:33 PM

Fix the loader name in the test

ardumont added inline comments.Jan 16 2020, 8:20 PM
swh/lister/functional_package/tests/test_lister.py
16

erf, i missed that one as well :/

lewo updated this revision to Diff 9100.Jan 16 2020, 10:00 PM

And fix another one :/

lewo added inline comments.Jan 16 2020, 10:12 PM
swh/lister/functional_package/tests/test_lister.py
16

héhé, you are not the only one!
I'm actually no longer able to run tests locally. I need to take some time to reset my local setup.

ardumont added inline comments.Jan 17 2020, 8:41 AM
swh/lister/functional_package/tests/test_lister.py
16

Have you tried running tox --recreate (or -r for short)?

Could you please update the title and the description according to the current state?
(i you don't have time, please tell me so i will ;)

lewo retitled this revision from [wip] swh.lister.json: Add lister getting sources from JSON file to [wip] swh.lister.functionalPackages: add lister getting sources from a JSON file.Tue, Jan 21, 7:12 PM
lewo edited the summary of this revision. (Show Details)
lewo edited the summary of this revision. (Show Details)Wed, Jan 29, 11:33 PM
lewo added a comment.Wed, Jan 29, 11:39 PM

A CI job is building a sources.json every day! The file is available at https://nix-community.github.io/nixpkgs-swh/sources.json ;)
This is a community CI (not hosted on main NixOS infrascture) which will allow me to iterate quickly on this file.

If you are going to the FOSDEM, would be nice to meet you there to talk about next steps!

In D2025#61931, @lewo wrote:

A CI job is building a sources.json every day! The file is available at https://nix-community.github.io/nixpkgs-swh/sources.json ;)

Awesome!

If you are going to the FOSDEM, would be nice to meet you there to talk about next steps!

I'm already in Brussels and would be happy to meet!

I can try and get a sources.json generated soon as well.

I guess support for type = "git" will come later, right?

Thank you,
Ludo'.

A CI job is building a sources.json every day! The file is available at https://nix-community.github.io/nixpkgs-swh/sources.json ;)

This is a community CI (not hosted on main NixOS infrascture) which will allow me to iterate quickly on this file.

nice.

If you are going to the FOSDEM, would be nice to meet you there to talk about next steps!

It would but i'm not going.

Most of the team are going though @zack, @douardda and @olasd (they can talk about the next steps as well).

Cheers,

lewo added a comment.Thu, Jan 30, 8:53 PM
> If you are going to the FOSDEM, would be nice to meet you there to talk about next steps!
I'm already in Brussels and would be happy to meet!

Cool!

I can try and get a `sources.json` generated soon as well.
I guess support for `type = "git"` will come later, right?

Yes, it will come later. Note your source file can contains git sources,
but these source urls are ignored by the current lister implementation.

See you this WE;)
Antoine.

lewo added a comment.Thu, Jan 30, 8:56 PM
> If you are going to the FOSDEM, would be nice to meet you there to talk about next steps!
It would but i'm not going.

Arf!

Most of the team are going though @zack, @douardda and @olasd (they can talk about the next steps as well).

Ok. I will ask on IRC tomorrow.