Page MenuHomeSoftware Heritage

deposit: Adapt revision dates setting according to new rules
ClosedPublic

Authored by ardumont on Apr 13 2019, 12:25 PM.

Details

Summary

Use respectively codemeta.dateCreated/codemeta.datePublished as revision.author_date/revision.committer_date if provided.
Else use codemeta.dateCreated for those field dates if provided
Else use codemeta.datePublished for those field dates if provided
Else use deposit.complete_date for those fields

Related T1637

Test Plan

tox

Diff Detail

Repository
rDDEP Push deposit
Branch
update-commiter-date
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 5432
Build 7359: tox-on-jenkinsJenkins
Build 7358: arc lint + arc unit

Event Timeline

swh/deposit/tests/loader/test_loader.py
79

This was actually wrong to trigger a load of a deposit with a wrong status (partial here which is only possible in tests).

Remove spurrious print statements

Fix irrelevant order failure in deposit check tests

swh/deposit/tests/loader/test_loader.py
79

That's why i changed the test ;)

I started reviewing but the question is more complex than I first stated
I asked HAL to fix dateCreated to their real metadata 'date de création/écriture'
and to add datePublished as the deposit date
While dateCreated isn't mandatory on HAL, datePublished is implicitly mandatory
And the date of arrival to SWH isn't the same of datePublished, because of the moderation process on HAL side
It can differ by hours, days or even weeks

swh/deposit/api/private/deposit_check.py
130

I'm against making this metadata obligatory because I think it will break the chain with HAL

swh/deposit/tests/api/test_deposit_check.py
141

again, shouldn't be mandatory

The diff is outdated now.
This needs rework.

swh/deposit/api/private/deposit_read.py
183

According to our discussion anyway, it's no longer the commit date, it's the author date.

Use new committer/author dates rules to create the revision's date entries

ardumont retitled this revision from deposit: Change commit date to the date of software creation (instead of the deposit's completed date) to deposit: Adapt revision dates setting according to new rules.Apr 16 2019, 9:54 AM
ardumont edited the summary of this revision. (Show Details)
ardumont added a project: SWORD deposit.

Seems great!
tox passes locally.
I would add/change the date formats in tests (because we can recieve dates in different styles)
and/or test only the _parse_date and compute_date functions.
Here some examples:

<date type="whenSubmitted">2018-02-27 10:45:10</date>
<date type="whenWritten">2012</date>
<date type="whenModified">2018-06-11 17:02:02</date>
<date type="whenReleased">2018-03-20 13:09:08</date>
<date type="whenProduced">2012</date>
<date type="whenEndEmbargoed">2018-02-27</date>

This is directly from HAL's TEI, which shows the different formats it can send.

swh/deposit/api/private/deposit_read.py
166–176

nitpicking a bit, but IMHO the t temp var is not needed here. The whole if/elif block can manipulate {author,commit}_date variables directly.

swh/deposit/tests/api/test_deposit_read_metadata.py
385

Couldn't we share this big chunk of xml among tests? I'm a big advocate of copy/paste in tests (rather than clever abstractions/genericity that make the test hard to read for a human), but in this case, it's hard to guess what the differences are between these tests. The docstrings/comments do describe them, but ...

swh/deposit/api/private/deposit_read.py
166–176

oh, righty right.
initially, i just returned the tuple t and did not do anything on it
up until i realized i needed to do stuff on it...

swh/deposit/tests/api/test_deposit_read_metadata.py
385

I'll see what i can do, it sure irritates me a lot...
I was focused on making the thing works first ;)

I would add/change the date formats in tests (because we can recieve dates in different styles)
and/or test only the _parse_date and compute_date functions.
Here some examples:

<date type="whenSubmitted">2018-02-27 10:45:10</date>
<date type="whenWritten">2012</date>
<date type="whenModified">2018-06-11 17:02:02</date>
<date type="whenReleased">2018-03-20 13:09:08</date>
<date type="whenProduced">2012</date>
<date type="whenEndEmbargoed">2018-02-27</date>

I'm pretty sure some of those formats are not parsed correctly already.
The ones with only the year for example gives:

In [1]: from dateutil.parser import parse

In [2]: date = '2012'

In [3]: parse(date)
Out[3]: datetime.datetime(2012, 4, 16, 0, 0)

The others are fine though.

  • deposit_read: Refactor to clarify intents and docstring

Also rebase on latest master

  • tests: Factorize metadata xml duplication to explicit tested date
  • Refactor: Extract date utils manipulation function into utils module
swh/deposit/tests/test_utils.py
182 ↗(On Diff #4598)

here!

======================= 112 passed, 1 warnings in 16.33 seconds ========================
Exception ignored in: <function WeakValueDictionary.__init__.<locals>.remove at 0x7ff254794840>
Traceback (most recent call last):
  File "/home/morane/Documents/code/swh-environment/swh-deposit/.tox/py3/lib/python3.5/weakref.py", line 117, in remove
TypeError: 'NoneType' object is not callable
Exception ignored in: <function WeakValueDictionary.__init__.<locals>.remove at 0x7ff254794840>
Traceback (most recent call last):
  File "/home/morane/Documents/code/swh-environment/swh-deposit/.tox/py3/lib/python3.5/weakref.py", line 117, in remove
TypeError: 'NoneType' object is not callable
WARNING [pifpaf.drivers] `psutil.Popen(pid=5224, status='terminated')` is already gone, sending SIGKILL to its process group
WARNING [pifpaf.drivers] `psutil.Popen(pid=5207, status='terminated')` is already gone, sending SIGKILL to its process group
WARNING [pifpaf.drivers] `psutil.Popen(pid=5202, status='terminated')` is already gone, sending SIGKILL to its process group
_______________________________________ summary ________________________________________
  flake8: commands succeeded
  py3: commands succeeded
  congratulations :)

Looks good, waiting for the 182 line comment about the choice of a date when only a year is entered .

swh/deposit/tests/api/test_deposit_read_metadata.py
459

The design choice of taking the first date in the list is good for me

swh/deposit/tests/test_utils.py
146 ↗(On Diff #4598)

that's good.

163 ↗(On Diff #4598)

tha's good, no?

180 ↗(On Diff #4598)

should be 2017-01-01 00:00:00, if possible?

  • utils: Make parsing date result in a reasonable date

Update test docstring to match the reality of the test

swh/deposit/tests/test_utils.py
180 ↗(On Diff #4598)

That's fixed now thanks to the discussion with @douardda (iso8601)

This revision was not accepted when it landed; it landed in state Needs Review.Apr 17 2019, 10:52 AM
This revision was automatically updated to reflect the committed changes.

This revision was not accepted when it landed; it landed in state Needs Review.Wed, Apr 17, 10:52

seen with @douardda orally ;)