Page MenuHomeSoftware Heritage

CRAN loader: set revision dates
Open, NormalPublic

Description

Currently, the CRAN loader always sets revision dates to None.

It makes sense for the date to be optional, as it may be missing; but we should use the Date/Publication and/or Packaged fields when they are available (and possibly others).

Motivation:

  • "time-less revisions also break the provenance model as you wouldn't know if a time-less revision comes before or after one with a time" -- @zack
  • the Debian loader already does it

Related Objects

Event Timeline

vlorentz triaged this task as Normal priority.Aug 24 2020, 5:10 PM
vlorentz created this task.

the CRAN loader always sets revision dates to None.

That's not what the code does though... (Or i misunderstood).
It tries to parse the Date if provided...
If the parsing fails because the input is not parseable (provided dates can be ... erratic... [3]), then None is returned...

I gather the parsing is rather limited then and could be improved...
Also fallback behavior can be added... but we need to decide what...

[1] https://forge.softwareheritage.org/source/swh-loader-core/browse/master/swh/loader/package/cran/loader.py$106

[2] https://forge.softwareheritage.org/source/swh-loader-core/browse/master/swh/loader/package/cran/loader.py$176-203

[3] https://forge.softwareheritage.org/D2463#inline-16397

Thanks for the pointers @ardumont !

What the code does looks reasonable to me. One can always be more thorough with the heuristic, but it's also a neverending job.

The main change I'd recommend is to separate author and commiter dates, always using wallclock time as committer date and "Software Heritage something" as committer. Ditto for author/committer that could be separated in the same way, for consistency.

Isn't that what we do in other package loaders as well? (I haven't reviewed for comparison before answering, sorry about that !)

Doing this will also implicitly provide the fallback that @ardumont was mentioning, when the data is not parsable.

Just my 2 cents.

(tag removed by mistake, adding it back)

tbh, I only used one data point to claim it "always" sets it to None (0000d4ef5e166122aee6862ad38a18ce5386cc3e, which is the first CRAN commit in id order)