Page MenuHomeSoftware Heritage

cran loader: Add implementation
ClosedPublic

Authored by ardumont on Dec 17 2019, 3:03 PM.

Details

Summary

which means:

  • add loader implementation with cli auto-registration (swh loader cran -h)
  • add task implementation with scheduler task auto-registration (task-type load-cran)
  • tested

Note:
The diff is annoyingly a tad big because i extracted common behavior functions in package.utils modules...

Related T2026

Test Plan

tox

Diff Detail

Repository
rDLDBASE Generic VCS/Package Loader
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

ardumont created this revision.Dec 17 2019, 3:03 PM
ardumont added inline comments.Dec 17 2019, 3:04 PM
swh/loader/package/cran/tests/test_cran.py
82

Need to add loading tests scenario here

ardumont planned changes to this revision.Dec 17 2019, 3:05 PM
ardumont added inline comments.Dec 17 2019, 3:15 PM
swh/loader/package/cran/loader.py
47

Dict!

56

Mapping[str, Any]... -> Dict[str, Any]

124

Add gazillion tests on this.

ardumont updated this revision to Diff 8763.Dec 17 2019, 7:10 PM
  • loader: Implement more robust parsing date
  • Fix cli test
  • npm: Migrate interesting parsing function to utils module
  • cran: Build the loader cran's revision correctly + add loading test
ardumont added inline comments.Dec 17 2019, 11:36 PM
swh/loader/package/cran/loader.py
146

I guess we should continue using format string for logging...
I mean here, we should continue using:

logger.warning('Fail to parse %s. Reason: %s', (date, e))
ardumont updated this revision to Diff 8770.Dec 18 2019, 6:52 PM
  • cran: Improve typing
  • cran.loader: Deal with date parsing corner case
ardumont added inline comments.Dec 18 2019, 6:54 PM
swh/loader/package/cran/tests/test_cran.py
31

oh no!
This peculiar case is dependent on the day this is computed.

It was on the 17th:

datetime(2011, 1, 17, 0, 0, tzinfo=timezone.utc)),

Today, the 18th, this fails with a result of:

datetime(2011, 1, 18, 0, 0, tzinfo=timezone.utc)

Adding a corner case workaround for that (T.T)

douardda requested changes to this revision.Dec 19 2019, 11:19 AM
douardda added a subscriber: douardda.
douardda added inline comments.
swh/loader/package/cran/loader.py
26

why SPECIFIC? also, you can use a raw string instead of escaping \ chars.

102

since we do not deal with debian packages here, there should be an explanation for this. Even rename the function with comments explaining why we use Deb822 in there.

138

did I miss something? this is declared as Optional[str] just above, so the isinstance is quite suspicious...

145

since you do actually use the regex, this later should use named matches instead of using this ugly line of code (if I may ;-) ).

I mean use [SPECIFIC_]DATE_PATTERN=r'(?P<year>\d{4})-(?P<month>\d{2})' as regex.

151

no need for this import to be local

This revision now requires changes to proceed.Dec 19 2019, 11:19 AM
ardumont added inline comments.Dec 19 2019, 11:24 AM
swh/loader/package/cran/loader.py
26

because it's a specific ugly pattern that does strange things.
I want that to stand out somehow.

I don't know how to call it properly though.

102

indeed, i explained it in the task but not in the code.

will adapt.

138

it misses an or indeed ;)

nice catch.

145

nice, thx

(you may, i believe that it's the diff review's point ;)

151

i need to stop doing that...

ardumont updated this revision to Diff 8797.Dec 19 2019, 11:56 AM

Adapt according to review

(still need to rework the git history but let's agree on the implementation
first ;)

ardumont updated this revision to Diff 8826.Dec 20 2019, 9:57 AM

Squash commits

ardumont retitled this revision from cran: Add loader implementation to cran loader: Add implementation.Dec 20 2019, 10:00 AM
ardumont edited the summary of this revision. (Show Details)
ardumont edited the test plan for this revision. (Show Details)
ardumont added a project: Origin-CRAN.
ardumont updated this revision to Diff 8828.Dec 20 2019, 10:49 AM
  • Resolve revision from already seen artifacts
  • Improve loading tests
ardumont edited the summary of this revision. (Show Details)Dec 20 2019, 11:01 AM
douardda accepted this revision.Mon, Jan 6, 2:00 PM

did only a quick review, lgtm. Note however my comment on using path.basename() instead of path.split()[-1]

swh/loader/package/cran/loader.py
60

is there a reason not to use [os.]path.basename() here?

This revision is now accepted and ready to land.Mon, Jan 6, 2:00 PM
ardumont added inline comments.Mon, Jan 6, 2:23 PM
swh/loader/package/cran/loader.py
60

i do not remember.

The things that come to mind is possibly a trailing slash which got in the way...
If that's the case, a comment would have been nice indeed.

ardumont added inline comments.Mon, Jan 6, 2:26 PM
swh/loader/package/cran/loader.py
60

nvm, that yields the same result.

ardumont updated this revision to Diff 8862.Mon, Jan 6, 2:30 PM
  • cran.loader: Simplify filename computation

Build has FAILED

missing a rebase on latest adaptations
in-progress

ardumont updated this revision to Diff 8863.EditedMon, Jan 6, 2:38 PM

Rebase to latest origin/master

ardumont updated this revision to Diff 8864.Mon, Jan 6, 3:02 PM

Rebase to master