Page MenuHomeSoftware Heritage

tar.loader: Make the loader-tar able to download remote artifact
ClosedPublic

Authored by ardumont on Dec 8 2018, 5:54 PM.

Details

Summary

Also:

  • Keep the legacy behavior (use the revision and snapshot's branch provided by the caller, e.g. the deposit loader depends on this)
  • Increase code coverage (50 -> 93%)
  • Remove dir loader inheritance (nightmare to maintain). We still need the dir loader for some functions but the main issue is resolved).
  • Remove deprecated producer code (we use the scheduler now)

Related T1351
Related T1431
Depends on D794

Use sample:

>>> url = 'https://ftp.gnu.org/gnu/8sync/8sync-0.1.0.tar.gz'
>>> origin = {'url': url, 'type': 'tar'}
>>> visit_date = 'Tue, 3 May 2017 17:16:32 +0200'
>>> last_modified = '2016-04-22 16:35'
>>> import logging
>>> logging.basicConfig(level=logging.DEBUG)
>>>
>>> from swh.loader.tar.tasks import LoadTarRepository
>>> l = LoadTarRepository()
>>> l.run_task(origin=origin, visit_date=visit_date,
...            last_modified=last_modified)
DEBUG:swh.scheduler.task.LoadTarRepository:Creating tar origin for https://ftp.gnu.org/gnu/8sync/8sync-0.1.0.tar.gz
DEBUG:swh.scheduler.task.LoadTarRepository:Done creating tar origin for https://ftp.gnu.org/gnu/8sync/8sync-0.1.0.tar.gz
DEBUG:swh.scheduler.task.LoadTarRepository:Creating origin_visit for origin 3 at time Tue, 3 May 2017 17:16:32 +0200
DEBUG:swh.scheduler.task.LoadTarRepository:Done Creating origin_visit for origin 3 at time Tue, 3 May 2017 17:16:32 +0200
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): ftp.gnu.org:443
DEBUG:urllib3.connectionpool:https://ftp.gnu.org:443 "GET /gnu/8sync/8sync-0.1.0.tar.gz HTTP/1.1" 200 221837
INFO:swh.scheduler.task.LoadTarRepository:Uncompress /tmp/swh.loader.tar._nh29j5e-18075/8sync-0.1.0.tar.gz to /tmp/swh.loader.tar._nh29j5e-18075/swh.loader.tar-7wox5_rx
DEBUG:swh.scheduler.task.LoadTarRepository:Sending 28 contents
DEBUG:swh.scheduler.task.LoadTarRepository:Done sending 28 contents
DEBUG:swh.scheduler.task.LoadTarRepository:Sending 7 directories
DEBUG:swh.scheduler.task.LoadTarRepository:Done sending 7 directories
DEBUG:swh.scheduler.task.LoadTarRepository:Sending 1 revisions
DEBUG:swh.scheduler.task.LoadTarRepository:Done sending 1 revisions
DEBUG:swh.scheduler.task.LoadTarRepository:Updating origin_visit for origin 3 with status full
DEBUG:swh.scheduler.task.LoadTarRepository:Done updating origin_visit for origin 3 with status full
DEBUG:swh.scheduler.task.LoadTarRepository:Clean up /tmp/swh.loader.tar._nh29j5e-18075
{'status': 'eventful'}
Test Plan

tox

Diff Detail

Repository
rDLDTAR Tarball Loader
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

ardumont created this revision.Dec 8 2018, 5:54 PM
ardumont updated this revision to Diff 2524.Dec 8 2018, 5:57 PM

Remove print statement T.T

ardumont updated this revision to Diff 2525.Dec 8 2018, 5:59 PM

Fix d/control tab instead of spaces T.T

ardumont updated this revision to Diff 2528.Dec 8 2018, 6:07 PM

Dependent diff landed so trigger the new build for this

ardumont edited the summary of this revision. (Show Details)Dec 8 2018, 6:21 PM
ardumont updated this revision to Diff 2529.Dec 8 2018, 6:43 PM
ardumont marked an inline comment as done.
  • tar.tasks: Improve docstring and test the task
  • README: Update documentation and remove obsolete documentation
ardumont updated this revision to Diff 2530.Dec 8 2018, 7:04 PM

Remove wrong comment

ardumont marked 3 inline comments as done.Dec 8 2018, 7:07 PM
ardumont added inline comments.
swh/loader/tar/loader.py
171

That will need to be provided by the caller (so the lister here).

swh/loader/tar/tests/test_build.py
48

Apparently my initial mtime was wrong, thus the slight change here.
Other than that, i used the same date as before with another readable format.

53

I used the same date as before with another format.

swh/loader/tar/tests/test_loader.py
74

Same i use the same date as before but with another format.

Heads up, it's cool and all but:

  • the diff is too big so i'll split this in multiple ones.
  • i need to update the diff (done locally and tests ok \m/) to add back the legacy behavior as deposit depends on it.
ardumont planned changes to this revision.Dec 9 2018, 1:04 PM
ardumont updated this revision to Diff 2531.Dec 9 2018, 1:04 PM
  • loader.tar: Add back the legacy behavior (deposit depends on it)
ardumont edited the summary of this revision. (Show Details)Dec 9 2018, 1:09 PM
ardumont edited the summary of this revision. (Show Details)Dec 9 2018, 1:17 PM
ardumont updated this revision to Diff 2532.Dec 9 2018, 1:24 PM
  • tar.loader: Fix typos
  • resources/producer: Remove dead configuration
vlorentz accepted this revision.Dec 10 2018, 9:34 AM
vlorentz added a subscriber: vlorentz.
vlorentz added inline comments.
debian/rules
4

Why this change?

swh/loader/tar/loader.py
158

Why not self.log.debug(...)?

168

with urllib.parse

171

Maybe rename the method to get_tarball_url_to_retrieve, to make it obvious it returns the value (rather than setting an object attribute)

178

raise NotImplementedError() (or make the class abstract)

220

rename it to LocalTarLoader?

226

s/tarballs/tarball/

This revision is now accepted and ready to land.Dec 10 2018, 9:34 AM
ardumont marked 8 inline comments as done.Dec 10 2018, 10:44 AM
ardumont added inline comments.
debian/rules
4

The initial version is nose related.
That new one is pytest ;)

swh/loader/tar/loader.py
158

because i want the user to be warned it will mess up it's local environment for debug reasons.
That matches other implementations btw (pypi, etc...)

168

Thank you, will look into it.

171

Right

178

Right, i forgot!

220

I wanted to avoid having to migrate the loader deposit.
I could do as you did with aliasing if you want.

226

indeed.

ardumont updated this revision to Diff 2533.Dec 10 2018, 10:44 AM
ardumont marked an inline comment as done.
  • tar.loader: Rename method with clearer name
  • tar.loader: Explicit the need for an implementation
  • tar.loader: Fix typo
  • tar.loader: Improve class name and use alias for retro-compatibility
  • tar.loader: Improve class docstrings
  • loader: Only explicit needed params & adapt docstring accordingly
  • tar.loader: Improve docstring
ardumont updated this revision to Diff 2534.Dec 10 2018, 10:49 AM
  • tar.loader: Use urllib.parse.urlparse to actually parse url
This revision was automatically updated to reflect the committed changes.