Page MenuHomeSoftware Heritage

Functional Package Loader
ClosedPublic

Authored by lewo on Mon, Mar 9, 5:07 PM.

Details

Summary

This diff is currently work in progress.

This loader is in charge of downloading a sources.json file and loading all referenced sources.

cc @zimoun @civodul

Related to T1991

Test Plan

tox

Diff Detail

Repository
rDLDBASE Generic VCS/Package Loader
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
lewo planned changes to this revision.Tue, Mar 10, 6:31 PM
lewo added a subscriber: olasd.
lewo added inline comments.
swh/loader/package/functional/loader.py
77 ↗(On Diff #9967)

@olasd Here is where I add a dangling pointer to the nixpkgs revision used to create the sources.json. Did you have something like that in mind?

lewo updated this revision to Diff 9969.Tue, Mar 10, 6:51 PM

I created a new diff containing the evaluation branch: see D2807 instead.

lewo added inline comments.Tue, Mar 10, 6:52 PM
swh/loader/package/functional/loader.py
77 ↗(On Diff #9967)

I actually created another diff with the "evaluation revision" feature: see D2807 instead.

ardumont added a comment.EditedWed, Mar 11, 9:55 AM

To fix the ci build:

E + where False = <built-in method startswith of str object at 0x7f3c706974e0>('Usage: list [OPTIONS] archive|cran|debian|deposit|npm|pypi\n\n List supported loaders and optionally their arguments\n\nOptions:\n -h, --help Show this message and exit.\n')
E + where <built-in method startswith of str object at 0x7f3c706974e0> = 'Usage: list [OPTIONS] archive|cran|debian|deposit|npm|pypi|functional\n\n List

My guess is to you need to fix the help message in the cli...
That happens because in the setup.py, we are referencing endpoints and that adapt the cli to allow you to run the loader through the cli:
swh loader run functional ...

Also, i prefer reviewing "green" diffs ;)

Nonethesless, if you stack diff (as you did), you also need those to be green to avoid yourself some confusion ;)

Cheers,

lewo updated this revision to Diff 9991.Wed, Mar 11, 11:28 AM
  • functional: order entry points
  • cli: add the functional loader in the cli tests
lewo updated this revision to Diff 9992.Wed, Mar 11, 11:34 AM
  • cli: add the functional loader in the cli tests
lewo updated this revision to Diff 9993.Wed, Mar 11, 11:56 AM
  • cli: add the functional loader in the cli tests
lewo edited the summary of this revision. (Show Details)Wed, Mar 11, 12:19 PM
lewo updated this revision to Diff 9998.Wed, Mar 11, 2:59 PM
  • package.loader: add origin argument
  • Add the functional loader
  • cli: add the functional loader in the cli tests
lewo added a comment.Wed, Mar 11, 3:00 PM

@ardumont I squashed commits and do some cleaning and remove the WIP status.

swh/loader/package/functional/loader.py
32 ↗(On Diff #9934)

It looks strange to me to put a TODO in the the docstring :/

lewo retitled this revision from wip: Functional Package Loader to Functional Package Loader.Wed, Mar 11, 3:00 PM
lewo updated this revision to Diff 10025.Thu, Mar 12, 3:06 PM
  • functional: add test for non json sources files
lewo added a comment.Thu, Mar 12, 3:12 PM

I actually don't know how should I manage errors. It seems exception are generally just emitted, without a dedicated log message.
How do you catch errors on production?

For instance, when a source is not a archive, an exception is raised but there is not a specific error message. I'm wondering how we could get metrics on these errors once on the production line.

ardumont added a comment.EditedThu, Mar 12, 5:14 PM
In D2792#67565, @lewo wrote:

I actually don't know how should I manage errors. It seems exception are generally just emitted, without a dedicated log message.

If you raise a ValueError with a significant error message, it should be fine.
Error will be caught within the package loader's run method and the task will have its status failed.

How do you catch errors on production?

sentry does now [1]
(going on the link, you can ask for an invite with the form ;)

For instance, when a source is not a archive, an exception is raised but there is not a specific error message. I'm wondering how we could get metrics on these errors once on the production line.

I recall the webapp captures specifically exception and send them to sentry so we may want to do that as well [2]

[1] https://sentry.softwareheritage.org

[2] https://forge.softwareheritage.org/source/swh-web/browse/master/swh/web/browse/utils.py$138

Almost ready to land.

As a general rule of thumb, you need to add test scenarios around the functionality added.
So here, i'm missing:

  • loading scenario which test the incremental nature of the visit (that'd test resolve_revision_from)
  • edge cases:
    • e.g you added the case where there is a failure when uncompressing, that means something for the task and the snapshot, this needs to be tested :)

Oh and you should test the resolution of the origin_visit to check its status within the test.

Cheers,

swh/loader/package/functional/loader.py
39 ↗(On Diff #10025)

I'd make this a function with the url as parameter.

This way, this can be tested independently of the loader instantiation.

84 ↗(On Diff #10025)

Even though the main functionality is incomplete, this feels pretty empty.

@douardda what do you think, couldn't be add more stuff in there?

32 ↗(On Diff #9934)

Well, given that we sayed we'd incrementally merge the incomplete functionality.
It's not a shocker those happened.

It'd be annoying the fixme stayed forever though ;)

ardumont requested changes to this revision.Fri, Mar 13, 10:47 AM
This revision now requires changes to proceed.Fri, Mar 13, 10:47 AM
lewo updated this revision to Diff 10056.Fri, Mar 13, 5:14 PM
lewo marked an inline comment as done.
  • functional: improve test_loader_two_visits
  • functional: add test_loader_incremental
  • functional: move retrieve_sources out of the FunctionalLoader class
  • functional: test the origin_visit
  • functional: add test_uncompress_failure
lewo added a comment.Fri, Mar 13, 5:16 PM

So here, i'm missing:

  • loading scenario which test the incremental nature of the visit (that'd test resolve_revision_from)

Done.

  • edge cases:
    • e.g you added the case where there is a failure when uncompressing, that means something for the task and the snapshot, this needs to be tested :)

Done in the test_functional.py file

Oh and you should test the resolution of the origin_visit to check its status within the test.

Done as part of the test test_loader_one_visit.

ardumont added inline comments.Fri, Mar 13, 6:12 PM
swh/loader/package/functional/tests/test_functional.py
92 ↗(On Diff #10056)

as we are instantiating another loader anyway, it's not useful.

Plus it's at least pypi specific :)

120 ↗(On Diff #10056)

+1 (for that test scenario ;)

162 ↗(On Diff #10056)

This should be removed as per my previous comment.

swh/loader/package/loader.py
47

Please move the type in the constructor.

lewo updated this revision to Diff 10057.Fri, Mar 13, 11:58 PM
lewo marked 4 inline comments as done.
  • package.loader: add origin argument
  • Add the functional loader
  • cli: add the functional loader in the cli tests
  • functional: add test for non json sources files
  • functional: improve test_loader_two_visits
  • functional: add test_loader_incremental
  • functional: move retrieve_sources out of the FunctionalLoader class
  • functional: test the origin_visit
  • functional: add test_uncompress_failure
  • functional: minor fixes in tests
swh/loader/package/functional/tests/test_functional.py
92 ↗(On Diff #10056)

Removed.

162 ↗(On Diff #10056)

Done

ardumont accepted this revision.Sat, Mar 14, 10:00 AM
This revision is now accepted and ready to land.Sat, Mar 14, 10:00 AM
lewo updated this revision to Diff 10089.Mon, Mar 16, 5:13 PM
  • package.loader: ignore non tarball source
  • package.loader: add origin argument
  • Add the functional loader
  • cli: add the functional loader in the cli tests
  • functional: add test for non json sources files
  • functional: improve test_loader_two_visits
  • functional: add test_loader_incremental
  • functional: move retrieve_sources out of the FunctionalLoader class
  • functional: test the origin_visit
  • functional: add test_uncompress_failure
  • functional: minor fixes in tests
lewo added a comment.Mon, Mar 16, 5:34 PM

I think I need to implement a test for the task also. WDYT?

I think I need to implement a test for the task also. WDYT?

right

as said on irc, you need to register the task in the conftest fixture [1]

[1] https://forge.softwareheritage.org/source/swh-loader-core/browse/master/conftest.py$75

Also, if you could rewrite a bit into logical commits whose build is fine, that'd be great.

lewo updated this revision to Diff 10091.Mon, Mar 16, 6:41 PM
  • Add the functional loader
lewo added a comment.Mon, Mar 16, 7:07 PM

@ardumont I squashed almost all commits and added a task test.

ardumont accepted this revision.Mon, Mar 16, 9:49 PM

@ardumont I squashed almost all commits and added a task test.

Thanks.

I'm not so sure about refactoring the base package loader to have two url and origin arguments that are the same most of the time but not all the time; In concrete terms, this means that we should refactor all loaders to use self.origin in the places where they currently use self.url.

For instance, we have the same issue with the pypi loader, where we generate an api url (which we're downloading) from the url of the origin (which is just a user-facing link).

All in all, I think we need to think about doing a refactoring to do this consistently, but I don't think this diff is the right place to do this refactoring; For now, we can just have a hardcoded map from origin url to index url.

swh/loader/package/functional/tests/test_functional.py
121–122 ↗(On Diff #10091)

Comments should probably be more explicit about what changes between the two visits. I guess that the first visit only manages to fetch one tarball, and the second one manages to fetch both?

lewo added a comment.Tue, Mar 17, 2:44 PM

I'm not so sure about refactoring the base package loader to have two url and origin arguments that are the same most of the time but not all the time; In concrete terms, this means that we should refactor all loaders to use self.origin in the places where they currently use self.url.

For instance, we have the same issue with the pypi loader, where we generate an api url (which we're downloading) from the url of the origin (which is just a user-facing link).

All in all, I think we need to think about doing a refactoring to do this consistently, but I don't think this diff is the right place to do this refactoring; For now, we can just have a hardcoded map from origin url to index url.

For now, I just removed the origin parameter in order to move forward. I then use the url of the sources.json as origin "name".
But I'll create a task to discuss about this origin name.

lewo updated this revision to Diff 10104.Tue, Mar 17, 2:54 PM
  • Add the functional loader
lewo marked an inline comment as done.Tue, Mar 17, 2:55 PM

@olasd Thanks for your comments. I addressed all of them.

This revision was automatically updated to reflect the committed changes.