Paths

Table of Contentst

Differential D1729

swh.loader.gnu: Implement gnu loader
AbandonedPublic
Actions

Authored by ardumont on Jul 11 2019, 6:13 PM.

Tags

None

Subscribers

Details

Reviewers

Group Reviewers

Summary

Implemented gnu loader using the base loader
Depends on D1744

Diff Detail

Repository

rDLDBASE Generic VCS/Package Loader

Branch

gnu

Lint

No Linters Available

Unit

No Unit Test Coverage

Build Status

Buildable 7016
Build 9874: tox-on-jenkins	Jenkins
Build 9873: arc lint + arc unit

Event Timeline

nahimilega created this revision.Jul 11 2019, 6:13 PM

Herald added a reviewer: Reviewers. · View Herald TranscriptJul 11 2019, 6:13 PM

Build is green
See https://jenkins.softwareheritage.org/job/DLDBASE/job/tox/126/ for more details.

Harbormaster completed remote builds in B6846: Diff 5806.Jul 11 2019, 6:14 PM

nahimilega mentioned this in D1694: swh.loader.package: Implement a method to prepare package visit.Jul 11 2019, 6:15 PM

nahimilega added subscribers: ardumont, olasd.Jul 11 2019, 6:38 PM

nahimilega edited the summary of this revision. (Show Details)Jul 18 2019, 9:41 AM

nahimilega edited the summary of this revision. (Show Details)

nahimilega added a parent revision: D1694: swh.loader.package: Implement a method to prepare package visit.

nahimilega edited the summary of this revision. (Show Details)

nahimilega added a parent revision: D1744: swh.loader.base: Add tests.

swh.loader.gnu: Add tests

Build has FAILED

Link to build: https://jenkins.softwareheritage.org/job/DLDBASE/job/tox/137/
See console output for more information: https://jenkins.softwareheritage.org/job/DLDBASE/job/tox/137/console

Harbormaster failed remote builds in B7016: Diff 5975!Jul 24 2019, 8:12 PM

nahimilega added inline comments.Jul 25 2019, 9:50 AM

swh/loader/gnu/tests/test_loader.py
44–82	These hashes are not updated for gnu loader. I found the swh identify method. But I didn't understand what exactly these hashes represent. I mean that swh identify tool return only one hash whereas we have a list of hashes here. Can you please explain what the hashes in `_expected_new_contents_first_visit`, `_expected_new_directories_first_visit`, `_expected_new_revisions_first_visit` mean in a bit detaild way. Is there any documentation for the reference? (if no then I suppose we would need it)

ardumont added inline comments.Jul 25 2019, 11:48 AM

swh/loader/gnu/tests/test_loader.py
44–82	These hashes are not updated for gnu loader. You need to have the correct hashes according to the archive's tree structure (raw file as content, directories for tree, etc...) you ingest. I found the swh identify method. But I didn't understand what exactly these hashes represent. That's a tad problematic since that's what the loader is all about ;) I mean that swh identify tool return only one hash whereas we have a list of hashes here. Yes, because each hash represents the hash (sha1 or sha1_git, don't remember which) of the raw content of the files found in the tarballs. Can you please explain what the hashes in _expected_new_contents_first_visit, _expected_new_directories_first_visit, _expected_new_revisions_first_visit mean in a bit detaild way. Aren't you the author of this? (I don't understand the question ;') If no, where does this come from then? The names seems to be explicit enough. The `first visit` is the first time you try to load the origin (the gnu tarball tried to be ingested in the archive is fetched from the origin url as input)... Indeed, reading back the documentation (persistent identifier and data model) would be a good hint (see below). Is there any documentation for the reference? (if no then I suppose we would need it) There is: identifier: [1] data model: [2] the code in swh-model (starting from swh-identify following it through down to the identifiers/hashutil modules). [1] https://docs.softwareheritage.org/devel/swh-model/persistent-identifiers.html#persistent-identifiers [2] https://docs.softwareheritage.org/devel/swh-model/data-model.html#data-model

anlambert added a subscriber: anlambert.Jul 26 2019, 12:01 PM

anlambert added inline comments.

swh/loader/gnu/loader.py
15	Here you should add these: CONFIG_BASE_FILENAME = 'loader/gnu' ADDITIONAL_CONFIG = { 'temp_directory': ('str', '/tmp/swh.loader.gnu/'), 'debug': ('bool', False) } in order to declare the config filename and some default config entries for the loader.
20	cf https://forge.softwareheritage.org/D1694#inline-11749
swh/loader/gnu/tasks.py
12	You need to use keyword arguments here: return GNULoader().load(name=name, origin_url=origin_url, tarballs=tarballs)

nahimilega added inline comments.Jul 28 2019, 3:29 PM

swh/loader/gnu/loader.py
15	This was present in base loader class itself, it would assign config filename and some default config entries on the bases of loader name. but after a discussion with @ardumont in IRC and via these comments(https://forge.softwareheritage.org/D1694?id=5884#inline-11570 ) it was removed. There are three things what we can do add it again in the base loader add them separately for each loader Discard these config filenames and some default config entries completely and use environment variable stanza(as discussed in comment) What should I do?

nahimilega edited the summary of this revision. (Show Details)Aug 3 2019, 10:56 PM

Please either abandon the diff or mark this as planned changes like the others ;)

nahimilega planned changes to this revision.Aug 6 2019, 2:44 PM

ardumont commandeered this revision.Oct 1 2019, 1:26 PM

ardumont abandoned this revision.

ardumont added a reviewer: nahimilega.

Revision Contents
Changeset List

Path

Size

swh/

loader/

gnu/

37 lines

12 lines

tests/

69 lines

resources/

tarballs/

8sync-0.1.0.tar.gz

8sync-0.2.0.tar.gz

8sync-0.3.0.tar.gz

8sync-0.4.0.tar.gz

8sync-0.4.1.tar.gz

8sync-0.4.2.tar.gz

test_download.py

46 lines

232 lines

Diff 5975

swh/loader/gnu/loader.py

Loading...

swh/loader/gnu/tasks.py

Loading...

swh/loader/gnu/tests/common.py

Loading...

swh/loader/gnu/tests/resources/tarballs/8sync-0.1.0.tar.gz

Loading...

swh/loader/gnu/tests/resources/tarballs/8sync-0.2.0.tar.gz

Loading...

swh/loader/gnu/tests/resources/tarballs/8sync-0.3.0.tar.gz

Loading...

swh/loader/gnu/tests/resources/tarballs/8sync-0.4.0.tar.gz

Loading...

swh/loader/gnu/tests/resources/tarballs/8sync-0.4.1.tar.gz

Loading...

swh/loader/gnu/tests/resources/tarballs/8sync-0.4.2.tar.gz

Loading...

swh/loader/gnu/tests/test_download.py

Loading...

swh/loader/gnu/tests/test_loader.py

Loading...