Page MenuHomeSoftware Heritage

loader.debian: Send only the origin url instead of the origin dict
ClosedPublic

Authored by ardumont on Oct 14 2019, 3:39 PM.

Details

Summary

This is the first debian loader implementation.
The goal is to align right now the behaviors in-between implementations (D2135)
to decrease the friction area.

Also update the loader's cli sample from the main entry point with something
that actually runs independently of the lister-db's state. The previous sample
would not create any content/directory/revision... (in docker-dev because
the lister-db is not pre-populated).

Related D2137
Related D2135
Related T2025

Test Plan
  • tox
  • docker-dev:
swh@978d62a12876:/$ python3 -m swh.loader.debian.loader --origin-url deb://cicero
2019-10-14 13:29:43,293 45 Loading config file /loader.yml
2019-10-14 13:29:43,298 45 Converted retries value: 3 -> Retry(total=3, connect=None, read=None, redirect=None, status=None)
2019-10-14 13:29:43,311 45 Creating deb origin for deb://cicero
2019-10-14 13:29:43,315 45 Starting new HTTP connection (1): swh-storage:5002
2019-10-14 13:29:45,408 45 http://swh-storage:5002 "POST /origin/add HTTP/1.1" 200 1
2019-10-14 13:29:45,409 45 Done creating deb origin for deb://cicero
2019-10-14 13:29:45,409 45 Creating origin_visit for origin deb://cicero at time 2019-10-14 13:29:45.409554+00:00
2019-10-14 13:29:46,375 45 http://swh-storage:5002 "POST /origin/visit/add HTTP/1.1" 200 16
2019-10-14 13:29:46,377 45 Done Creating deb origin_visit for origin deb://cicero at time 2019-10-14 13:29:45.409554+00:00
2019-10-14 13:29:46,377 45 Processing package cicero_0.7.2-3
2019-10-14 13:29:46,382 45 Starting new HTTP connection (1): deb.debian.org:80
2019-10-14 13:29:46,594 45 http://deb.debian.org:80 "GET /debian//pool/contrib/c/cicero/cicero_0.7.2-3.diff.gz HTTP/1.1" 302 332
2019-10-14 13:29:46,600 45 Starting new HTTP connection (1): cdn-fastly.deb.debian.org:80
2019-10-14 13:29:46,637 45 http://cdn-fastly.deb.debian.org:80 "GET /debian/pool/contrib/c/cicero/cicero_0.7.2-3.diff.gz HTTP/1.1" 200 3964
2019-10-14 13:29:46,647 45 Starting new HTTP connection (1): deb.debian.org:80
2019-10-14 13:29:46,946 45 http://deb.debian.org:80 "GET /debian//pool/contrib/c/cicero/cicero_0.7.2-3.dsc HTTP/1.1" 302 328
2019-10-14 13:29:46,954 45 Starting new HTTP connection (1): cdn-fastly.deb.debian.org:80
2019-10-14 13:29:46,963 45 http://cdn-fastly.deb.debian.org:80 "GET /debian/pool/contrib/c/cicero/cicero_0.7.2-3.dsc HTTP/1.1" 200 1864
2019-10-14 13:29:46,972 45 Starting new HTTP connection (1): deb.debian.org:80
2019-10-14 13:29:47,167 45 http://deb.debian.org:80 "GET /debian//pool/contrib/c/cicero/cicero_0.7.2.orig.tar.gz HTTP/1.1" 302 334
2019-10-14 13:29:47,173 45 Starting new HTTP connection (1): cdn-fastly.deb.debian.org:80
2019-10-14 13:29:47,181 45 http://cdn-fastly.deb.debian.org:80 "GET /debian/pool/contrib/c/cicero/cicero_0.7.2.orig.tar.gz HTTP/1.1" 200 96527
2019-10-14 13:29:47,191 45 extract Debian source package /tmp/swh.loader.debian.cicero.759kou47/cicero_0.7.2-3.dsc in /tmp/swh.loader.debian.cicero.759kou47/extracted
2019-10-14 13:29:47,334 45 http://swh-storage:5002 "POST /content/missing HTTP/1.1" 200 1431
2019-10-14 13:29:47,339 45 http://swh-storage:5002 "POST /directory/missing HTTP/1.1" 200 45
2019-10-14 13:29:47,343 45 http://swh-storage:5002 "POST /revision/missing HTTP/1.1" 200 23
2019-10-14 13:29:47,343 45 Sending 42 contents
2019-10-14 13:29:49,608 45 http://swh-storage:5002 "POST /content/add HTTP/1.1" 200 58
2019-10-14 13:29:49,609 45 Done sending 42 contents
2019-10-14 13:29:49,610 45 Sending 2 directories
2019-10-14 13:29:51,403 45 http://swh-storage:5002 "POST /directory/add HTTP/1.1" 200 16
2019-10-14 13:29:51,403 45 Done sending 2 directories
2019-10-14 13:29:51,403 45 Sending 1 revisions
2019-10-14 13:29:53,374 45 http://swh-storage:5002 "POST /revision/add HTTP/1.1" 200 15
2019-10-14 13:29:53,376 45 Done sending 1 revisions
2019-10-14 13:29:53,411 45 Loading failure, updating to `partial` status
Traceback (most recent call last):
...
sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) fe_sendauth: no password supplied

(Background on this error at: http://sqlalche.me/e/e3q8)
2019-10-14 13:29:53,416 45 Updating origin_visit for origin deb://cicero with status partial
2019-10-14 13:29:53,425 45 http://swh-storage:5002 "POST /origin/visit/update HTTP/1.1" 200 1
2019-10-14 13:29:53,425 45 Done updating origin_visit for origin deb://cicero with status partial

Note:

  • the visit ends up being partial because...
  • ... that debian loader writes to the lister db and
  • ... within docker-dev, the lister db is protected by credentials
  • ... and the loader (debian)'s setup is too basic to handle it

but ^ that's not the point here anyway

Diff Detail

Repository
rDLDDEB Debian package loader
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.