Details
- Reviewers
ardumont - Group Reviewers
Reviewers - Commits
- rDLDBASE03d7dbf69279: Add the functional loader
rDLDBASE09373d23b2fe: package.loader: ignore non tarball source
tox
Diff Detail
- Repository
- rDLDBASE Generic VCS/Package Loader
- Branch
- functional-loader
- Lint
No Linters Available - Unit
No Unit Test Coverage - Build Status
Buildable 11172 Build 16868: tox-on-jenkins Jenkins Build 16867: arc lint + arc unit
Event Timeline
Build has FAILED
Link to build: https://jenkins.softwareheritage.org/job/DLDBASE/job/tox/344/
See console output for more information: https://jenkins.softwareheritage.org/job/DLDBASE/job/tox/344/console
Build has FAILED
Link to build: https://jenkins.softwareheritage.org/job/DLDBASE/job/tox/346/
See console output for more information: https://jenkins.softwareheritage.org/job/DLDBASE/job/tox/346/console
To fix the ci build:
E + where False = <built-in method startswith of str object at 0x7f3c706974e0>('Usage: list [OPTIONS] archive|cran|debian|deposit|npm|pypi\n\n List supported loaders and optionally their arguments\n\nOptions:\n -h, --help Show this message and exit.\n')
E + where <built-in method startswith of str object at 0x7f3c706974e0> = 'Usage: list [OPTIONS] archive|cran|debian|deposit|npm|pypi|functional\n\n List
My guess is to you need to fix the help message in the cli...
That happens because in the setup.py, we are referencing endpoints and that adapt the cli to allow you to run the loader through the cli:
swh loader run functional ...
Also, i prefer reviewing "green" diffs ;)
Nonethesless, if you stack diff (as you did), you also need those to be green to avoid yourself some confusion ;)
Cheers,
Build has FAILED
Link to build: https://jenkins.softwareheritage.org/job/DLDBASE/job/tox/347/
See console output for more information: https://jenkins.softwareheritage.org/job/DLDBASE/job/tox/347/console
Build has FAILED
Link to build: https://jenkins.softwareheritage.org/job/DLDBASE/job/tox/348/
See console output for more information: https://jenkins.softwareheritage.org/job/DLDBASE/job/tox/348/console
Build is green
See https://jenkins.softwareheritage.org/job/DLDBASE/job/tox/349/ for more details.
- package.loader: add origin argument
- Add the functional loader
- cli: add the functional loader in the cli tests
Build is green
See https://jenkins.softwareheritage.org/job/DLDBASE/job/tox/350/ for more details.
Build is green
See https://jenkins.softwareheritage.org/job/DLDBASE/job/tox/355/ for more details.
I actually don't know how should I manage errors. It seems exception are generally just emitted, without a dedicated log message.
How do you catch errors on production?
For instance, when a source is not a archive, an exception is raised but there is not a specific error message. I'm wondering how we could get metrics on these errors once on the production line.
If you raise a ValueError with a significant error message, it should be fine.
Error will be caught within the package loader's run method and the task will have its status failed.
How do you catch errors on production?
sentry does now [1]
(going on the link, you can ask for an invite with the form ;)
For instance, when a source is not a archive, an exception is raised but there is not a specific error message. I'm wondering how we could get metrics on these errors once on the production line.
I recall the webapp captures specifically exception and send them to sentry so we may want to do that as well [2]
[1] https://sentry.softwareheritage.org
[2] https://forge.softwareheritage.org/source/swh-web/browse/master/swh/web/browse/utils.py$138
Almost ready to land.
As a general rule of thumb, you need to add test scenarios around the functionality added.
So here, i'm missing:
- loading scenario which test the incremental nature of the visit (that'd test resolve_revision_from)
- edge cases:
- e.g you added the case where there is a failure when uncompressing, that means something for the task and the snapshot, this needs to be tested :)
Oh and you should test the resolution of the origin_visit to check its status within the test.
Cheers,
swh/loader/package/functional/loader.py | ||
---|---|---|
32 | Well, given that we sayed we'd incrementally merge the incomplete functionality. It'd be annoying the fixme stayed forever though ;) | |
39 | I'd make this a function with the url as parameter. This way, this can be tested independently of the loader instantiation. | |
84 | Even though the main functionality is incomplete, this feels pretty empty. @douardda what do you think, couldn't be add more stuff in there? |
- functional: improve test_loader_two_visits
- functional: add test_loader_incremental
- functional: move retrieve_sources out of the FunctionalLoader class
- functional: test the origin_visit
- functional: add test_uncompress_failure
Build is green
See https://jenkins.softwareheritage.org/job/DLDBASE/job/tox/356/ for more details.
So here, i'm missing:
- loading scenario which test the incremental nature of the visit (that'd test resolve_revision_from)
Done.
- edge cases:
- e.g you added the case where there is a failure when uncompressing, that means something for the task and the snapshot, this needs to be tested :)
Done in the test_functional.py file
Oh and you should test the resolution of the origin_visit to check its status within the test.
Done as part of the test test_loader_one_visit.
swh/loader/package/functional/tests/test_functional.py | ||
---|---|---|
92 | as we are instantiating another loader anyway, it's not useful. Plus it's at least pypi specific :) | |
120 | +1 (for that test scenario ;) | |
162 | This should be removed as per my previous comment. | |
swh/loader/package/loader.py | ||
46 | Please move the type in the constructor. |
- package.loader: add origin argument
- Add the functional loader
- cli: add the functional loader in the cli tests
- functional: add test for non json sources files
- functional: improve test_loader_two_visits
- functional: add test_loader_incremental
- functional: move retrieve_sources out of the FunctionalLoader class
- functional: test the origin_visit
- functional: add test_uncompress_failure
- functional: minor fixes in tests
swh/loader/package/functional/tests/test_functional.py | ||
---|---|---|
92 | Removed. | |
162 | Done |
Build is green
See https://jenkins.softwareheritage.org/job/DLDBASE/job/tox/357/ for more details.
- package.loader: ignore non tarball source
- package.loader: add origin argument
- Add the functional loader
- cli: add the functional loader in the cli tests
- functional: add test for non json sources files
- functional: improve test_loader_two_visits
- functional: add test_loader_incremental
- functional: move retrieve_sources out of the FunctionalLoader class
- functional: test the origin_visit
- functional: add test_uncompress_failure
- functional: minor fixes in tests
Build is green
See https://jenkins.softwareheritage.org/job/DLDBASE/job/tox/361/ for more details.
I think I need to implement a test for the task also. WDYT?
right
as said on irc, you need to register the task in the conftest fixture [1]
[1] https://forge.softwareheritage.org/source/swh-loader-core/browse/master/conftest.py$75
Also, if you could rewrite a bit into logical commits whose build is fine, that'd be great.
Build is green
See https://jenkins.softwareheritage.org/job/DLDBASE/job/tox/363/ for more details.
I'm not so sure about refactoring the base package loader to have two url and origin arguments that are the same most of the time but not all the time; In concrete terms, this means that we should refactor all loaders to use self.origin in the places where they currently use self.url.
For instance, we have the same issue with the pypi loader, where we generate an api url (which we're downloading) from the url of the origin (which is just a user-facing link).
All in all, I think we need to think about doing a refactoring to do this consistently, but I don't think this diff is the right place to do this refactoring; For now, we can just have a hardcoded map from origin url to index url.
swh/loader/package/functional/tests/test_functional.py | ||
---|---|---|
121–122 | Comments should probably be more explicit about what changes between the two visits. I guess that the first visit only manages to fetch one tarball, and the second one manages to fetch both? |
I'm not so sure about refactoring the base package loader to have two url and origin arguments that are the same most of the time but not all the time; In concrete terms, this means that we should refactor all loaders to use self.origin in the places where they currently use self.url.
For instance, we have the same issue with the pypi loader, where we generate an api url (which we're downloading) from the url of the origin (which is just a user-facing link).
All in all, I think we need to think about doing a refactoring to do this consistently, but I don't think this diff is the right place to do this refactoring; For now, we can just have a hardcoded map from origin url to index url.
For now, I just removed the origin parameter in order to move forward. I then use the url of the sources.json as origin "name".
But I'll create a task to discuss about this origin name.
Build is green
See https://jenkins.softwareheritage.org/job/DLDBASE/job/tox/365/ for more details.