Given a pypi origin, load its release artifacts (as synthetic
revision).
The first time, the visit results in a snapshot (targetting created
revisions). Further visits with no change results in the same latest
snapshot. Visit with new release artifacts results in a new snapshot
(new snapshot share the same branches as the last one, plus the new
ones)
# Functional note
- Missing release artifact information are skipped
- Release artifacts whose PKG-INFO file is missing are skipped
# Modules
- swh.loader.pypi.client: Client interface to query pypi.org (and also
somewhat manipulating the artifact local representations). It's the
PyPiProject's collaborator to fetch missing information on the
project.
- swh.loader.pypi.model: PyPiProject representation of a pypi
origin. It's the loader's collaborator to manipulate the origin
(filter releases, etc...)
- swh.loader.pypi.loader: The main entry point for the loader
(fetch_data fetches, store_data stores... ;)
# Technical Note
The client cache is there for local runs and post data analysis (also,
it has been used for tests). That setting should be set to off for
production.
## Edge cases expected
- Pypi project can resolve in no longer existing origin (404)
- Hash checksum divergences stops the loading (maybe this can be improved later)
## Possible improvments
- Add some retry around the fetch artifacts routine (client)
- Add more tests
- simplify PyPiProject class (it was created at first to do more than
it just do now)
Branch is remote on the repository: loader-pypi