Page MenuHomeSoftware Heritage

cran loader: Add implementation
ClosedPublic

Authored by ardumont on Dec 17 2019, 3:03 PM.

Details

Summary

which means:

  • add loader implementation with cli auto-registration (swh loader cran -h)
  • add task implementation with scheduler task auto-registration (task-type load-cran)
  • tested

Note:
The diff is annoyingly a tad big because i extracted common behavior functions in package.utils modules...

Related T2026

Test Plan

tox

Diff Detail

Repository
rDLDBASE Generic VCS/Package Loader
Branch
cran-loader
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 9845
Build 14531: tox-on-jenkinsJenkins
Build 14530: arc lint + arc unit

Event Timeline

swh/loader/package/cran/tests/test_cran.py
81

Need to add loading tests scenario here

swh/loader/package/cran/loader.py
46

Dict!

55

Mapping[str, Any]... -> Dict[str, Any]

123

Add gazillion tests on this.

  • loader: Implement more robust parsing date
  • Fix cli test
  • npm: Migrate interesting parsing function to utils module
  • cran: Build the loader cran's revision correctly + add loading test
swh/loader/package/cran/loader.py
145

I guess we should continue using format string for logging...
I mean here, we should continue using:

logger.warning('Fail to parse %s. Reason: %s', (date, e))
  • cran: Improve typing
  • cran.loader: Deal with date parsing corner case
swh/loader/package/cran/tests/test_cran.py
30

oh no!
This peculiar case is dependent on the day this is computed.

It was on the 17th:

datetime(2011, 1, 17, 0, 0, tzinfo=timezone.utc)),

Today, the 18th, this fails with a result of:

datetime(2011, 1, 18, 0, 0, tzinfo=timezone.utc)

Adding a corner case workaround for that (T.T)

douardda added a subscriber: douardda.
douardda added inline comments.
swh/loader/package/cran/loader.py
25

why SPECIFIC? also, you can use a raw string instead of escaping \ chars.

101

since we do not deal with debian packages here, there should be an explanation for this. Even rename the function with comments explaining why we use Deb822 in there.

137

did I miss something? this is declared as Optional[str] just above, so the isinstance is quite suspicious...

144

since you do actually use the regex, this later should use named matches instead of using this ugly line of code (if I may ;-) ).

I mean use [SPECIFIC_]DATE_PATTERN=r'(?P<year>\d{4})-(?P<month>\d{2})' as regex.

150

no need for this import to be local

This revision now requires changes to proceed.Dec 19 2019, 11:19 AM
swh/loader/package/cran/loader.py
25

because it's a specific ugly pattern that does strange things.
I want that to stand out somehow.

I don't know how to call it properly though.

101

indeed, i explained it in the task but not in the code.

will adapt.

137

it misses an or indeed ;)

nice catch.

144

nice, thx

(you may, i believe that it's the diff review's point ;)

150

i need to stop doing that...

Adapt according to review

(still need to rework the git history but let's agree on the implementation
first ;)

ardumont retitled this revision from cran: Add loader implementation to cran loader: Add implementation.Dec 20 2019, 10:00 AM
ardumont edited the summary of this revision. (Show Details)
ardumont edited the test plan for this revision. (Show Details)
ardumont added a project: Origin-CRAN.
  • Resolve revision from already seen artifacts
  • Improve loading tests

did only a quick review, lgtm. Note however my comment on using path.basename() instead of path.split()[-1]

swh/loader/package/cran/loader.py
59

is there a reason not to use [os.]path.basename() here?

This revision is now accepted and ready to land.Jan 6 2020, 2:00 PM
swh/loader/package/cran/loader.py
59

i do not remember.

The things that come to mind is possibly a trailing slash which got in the way...
If that's the case, a comment would have been nice indeed.

swh/loader/package/cran/loader.py
59

nvm, that yields the same result.

  • cran.loader: Simplify filename computation

Build has FAILED

missing a rebase on latest adaptations
in-progress

Rebase to latest origin/master