Page MenuHomeSoftware Heritage

Functional Package Loader
ClosedPublic

Authored by lewo on Mar 9 2020, 5:07 PM.

Details

Summary

This diff is currently work in progress.

This loader is in charge of downloading a sources.json file and loading all referenced sources.

cc @zimoun @civodul

Related to T1991

Test Plan

tox

Diff Detail

Repository
rDLDBASE Generic VCS/Package Loader
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
lewo planned changes to this revision.Mar 10 2020, 6:31 PM
lewo added a subscriber: olasd.
lewo added inline comments.
swh/loader/package/functional/loader.py
77 โ†—(On Diff #9967)

@olasd Here is where I add a dangling pointer to the nixpkgs revision used to create the sources.json. Did you have something like that in mind?

I created a new diff containing the evaluation branch: see D2807 instead.

swh/loader/package/functional/loader.py
77 โ†—(On Diff #9967)

I actually created another diff with the "evaluation revision" feature: see D2807 instead.

To fix the ci build:

E + where False = <built-in method startswith of str object at 0x7f3c706974e0>('Usage: list [OPTIONS] archive|cran|debian|deposit|npm|pypi\n\n List supported loaders and optionally their arguments\n\nOptions:\n -h, --help Show this message and exit.\n')
E + where <built-in method startswith of str object at 0x7f3c706974e0> = 'Usage: list [OPTIONS] archive|cran|debian|deposit|npm|pypi|functional\n\n List

My guess is to you need to fix the help message in the cli...
That happens because in the setup.py, we are referencing endpoints and that adapt the cli to allow you to run the loader through the cli:
swh loader run functional ...

Also, i prefer reviewing "green" diffs ;)

Nonethesless, if you stack diff (as you did), you also need those to be green to avoid yourself some confusion ;)

Cheers,

  • functional: order entry points
  • cli: add the functional loader in the cli tests
  • cli: add the functional loader in the cli tests
  • cli: add the functional loader in the cli tests
  • package.loader: add origin argument
  • Add the functional loader
  • cli: add the functional loader in the cli tests

@ardumont I squashed commits and do some cleaning and remove the WIP status.

swh/loader/package/functional/loader.py
32 โ†—(On Diff #9934)

It looks strange to me to put a TODO in the the docstring :/

lewo retitled this revision from wip: Functional Package Loader to Functional Package Loader.Mar 11 2020, 3:00 PM
  • functional: add test for non json sources files

I actually don't know how should I manage errors. It seems exception are generally just emitted, without a dedicated log message.
How do you catch errors on production?

For instance, when a source is not a archive, an exception is raised but there is not a specific error message. I'm wondering how we could get metrics on these errors once on the production line.

In D2792#67565, @lewo wrote:

I actually don't know how should I manage errors. It seems exception are generally just emitted, without a dedicated log message.

If you raise a ValueError with a significant error message, it should be fine.
Error will be caught within the package loader's run method and the task will have its status failed.

How do you catch errors on production?

sentry does now [1]
(going on the link, you can ask for an invite with the form ;)

For instance, when a source is not a archive, an exception is raised but there is not a specific error message. I'm wondering how we could get metrics on these errors once on the production line.

I recall the webapp captures specifically exception and send them to sentry so we may want to do that as well [2]

[1] https://sentry.softwareheritage.org

[2] https://forge.softwareheritage.org/source/swh-web/browse/master/swh/web/browse/utils.py$138

Almost ready to land.

As a general rule of thumb, you need to add test scenarios around the functionality added.
So here, i'm missing:

  • loading scenario which test the incremental nature of the visit (that'd test resolve_revision_from)
  • edge cases:
    • e.g you added the case where there is a failure when uncompressing, that means something for the task and the snapshot, this needs to be tested :)

Oh and you should test the resolution of the origin_visit to check its status within the test.

Cheers,

swh/loader/package/functional/loader.py
39 โ†—(On Diff #10025)

I'd make this a function with the url as parameter.

This way, this can be tested independently of the loader instantiation.

84 โ†—(On Diff #10025)

Even though the main functionality is incomplete, this feels pretty empty.

@douardda what do you think, couldn't be add more stuff in there?

32 โ†—(On Diff #9934)

Well, given that we sayed we'd incrementally merge the incomplete functionality.
It's not a shocker those happened.

It'd be annoying the fixme stayed forever though ;)

This revision now requires changes to proceed.Mar 13 2020, 10:47 AM
lewo marked an inline comment as done.
  • functional: improve test_loader_two_visits
  • functional: add test_loader_incremental
  • functional: move retrieve_sources out of the FunctionalLoader class
  • functional: test the origin_visit
  • functional: add test_uncompress_failure

So here, i'm missing:

  • loading scenario which test the incremental nature of the visit (that'd test resolve_revision_from)

Done.

  • edge cases:
    • e.g you added the case where there is a failure when uncompressing, that means something for the task and the snapshot, this needs to be tested :)

Done in the test_functional.py file

Oh and you should test the resolution of the origin_visit to check its status within the test.

Done as part of the test test_loader_one_visit.

swh/loader/package/functional/tests/test_functional.py
92 โ†—(On Diff #10056)

as we are instantiating another loader anyway, it's not useful.

Plus it's at least pypi specific :)

120 โ†—(On Diff #10056)

+1 (for that test scenario ;)

162 โ†—(On Diff #10056)

This should be removed as per my previous comment.

swh/loader/package/loader.py
46

Please move the type in the constructor.

lewo marked 4 inline comments as done.
  • package.loader: add origin argument
  • Add the functional loader
  • cli: add the functional loader in the cli tests
  • functional: add test for non json sources files
  • functional: improve test_loader_two_visits
  • functional: add test_loader_incremental
  • functional: move retrieve_sources out of the FunctionalLoader class
  • functional: test the origin_visit
  • functional: add test_uncompress_failure
  • functional: minor fixes in tests
swh/loader/package/functional/tests/test_functional.py
92 โ†—(On Diff #10056)

Removed.

162 โ†—(On Diff #10056)

Done

This revision is now accepted and ready to land.Mar 14 2020, 10:00 AM
  • package.loader: ignore non tarball source
  • package.loader: add origin argument
  • Add the functional loader
  • cli: add the functional loader in the cli tests
  • functional: add test for non json sources files
  • functional: improve test_loader_two_visits
  • functional: add test_loader_incremental
  • functional: move retrieve_sources out of the FunctionalLoader class
  • functional: test the origin_visit
  • functional: add test_uncompress_failure
  • functional: minor fixes in tests

I think I need to implement a test for the task also. WDYT?

I think I need to implement a test for the task also. WDYT?

right

as said on irc, you need to register the task in the conftest fixture [1]

[1] https://forge.softwareheritage.org/source/swh-loader-core/browse/master/conftest.py$75

Also, if you could rewrite a bit into logical commits whose build is fine, that'd be great.

  • Add the functional loader

@ardumont I squashed almost all commits and added a task test.

@ardumont I squashed almost all commits and added a task test.

Thanks.

I'm not so sure about refactoring the base package loader to have two url and origin arguments that are the same most of the time but not all the time; In concrete terms, this means that we should refactor all loaders to use self.origin in the places where they currently use self.url.

For instance, we have the same issue with the pypi loader, where we generate an api url (which we're downloading) from the url of the origin (which is just a user-facing link).

All in all, I think we need to think about doing a refactoring to do this consistently, but I don't think this diff is the right place to do this refactoring; For now, we can just have a hardcoded map from origin url to index url.

swh/loader/package/functional/tests/test_functional.py
121โ€“122 โ†—(On Diff #10091)

Comments should probably be more explicit about what changes between the two visits. I guess that the first visit only manages to fetch one tarball, and the second one manages to fetch both?

I'm not so sure about refactoring the base package loader to have two url and origin arguments that are the same most of the time but not all the time; In concrete terms, this means that we should refactor all loaders to use self.origin in the places where they currently use self.url.

For instance, we have the same issue with the pypi loader, where we generate an api url (which we're downloading) from the url of the origin (which is just a user-facing link).

All in all, I think we need to think about doing a refactoring to do this consistently, but I don't think this diff is the right place to do this refactoring; For now, we can just have a hardcoded map from origin url to index url.

For now, I just removed the origin parameter in order to move forward. I then use the url of the sources.json as origin "name".
But I'll create a task to discuss about this origin name.

  • Add the functional loader
lewo marked an inline comment as done.Mar 17 2020, 2:55 PM

@olasd Thanks for your comments. I addressed all of them.