Page MenuHomeSoftware Heritage

deposit: Adapt revision dates setting according to new rules
ClosedPublic

Authored by ardumont on Apr 13 2019, 12:25 PM.

Details

Summary

Use respectively codemeta.dateCreated/codemeta.datePublished as revision.author_date/revision.committer_date if provided.
Else use codemeta.dateCreated for those field dates if provided
Else use codemeta.datePublished for those field dates if provided
Else use deposit.complete_date for those fields

Related T1637

Test Plan

tox

Diff Detail

Repository
rDDEP swh-deposit
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

ardumont created this revision.Apr 13 2019, 12:25 PM
ardumont added inline comments.Apr 13 2019, 12:45 PM
swh/deposit/tests/loader/test_loader.py
80

This was actually wrong to trigger a load of a deposit with a wrong status (partial here which is only possible in tests).

ardumont updated this revision to Diff 4566.Apr 13 2019, 12:45 PM

Remove spurrious print statements

ardumont updated this revision to Diff 4570.Apr 15 2019, 10:36 AM

Fix irrelevant order failure in deposit check tests

ardumont added inline comments.Apr 15 2019, 10:45 AM
swh/deposit/tests/loader/test_loader.py
80

That's why i changed the test ;)

I started reviewing but the question is more complex than I first stated
I asked HAL to fix dateCreated to their real metadata 'date de création/écriture'
and to add datePublished as the deposit date
While dateCreated isn't mandatory on HAL, datePublished is implicitly mandatory
And the date of arrival to SWH isn't the same of datePublished, because of the moderation process on HAL side
It can differ by hours, days or even weeks

swh/deposit/api/private/deposit_check.py
130 ↗(On Diff #4570)

I'm against making this metadata obligatory because I think it will break the chain with HAL

swh/deposit/tests/api/test_deposit_check.py
141 ↗(On Diff #4570)

again, shouldn't be mandatory

ardumont planned changes to this revision.Apr 15 2019, 2:30 PM

The diff is outdated now.
This needs rework.

swh/deposit/api/private/deposit_read.py
217–218

According to our discussion anyway, it's no longer the commit date, it's the author date.

ardumont updated this revision to Diff 4584.Apr 15 2019, 6:21 PM

Use new committer/author dates rules to create the revision's date entries

ardumont retitled this revision from deposit: Change commit date to the date of software creation (instead of the deposit's completed date) to deposit: Adapt revision dates setting according to new rules.Apr 16 2019, 9:54 AM
ardumont edited the summary of this revision. (Show Details)
ardumont added a project: SWORD deposit.

Seems great!
tox passes locally.
I would add/change the date formats in tests (because we can recieve dates in different styles)
and/or test only the _parse_date and compute_date functions.
Here some examples:

<date type="whenSubmitted">2018-02-27 10:45:10</date>
<date type="whenWritten">2012</date>
<date type="whenModified">2018-06-11 17:02:02</date>
<date type="whenReleased">2018-03-20 13:09:08</date>
<date type="whenProduced">2012</date>
<date type="whenEndEmbargoed">2018-02-27</date>

This is directly from HAL's TEI, which shows the different formats it can send.

douardda added inline comments.Apr 16 2019, 12:35 PM
swh/deposit/api/private/deposit_read.py
167–177

nitpicking a bit, but IMHO the t temp var is not needed here. The whole if/elif block can manipulate {author,commit}_date variables directly.

swh/deposit/tests/api/test_deposit_read_metadata.py
400

Couldn't we share this big chunk of xml among tests? I'm a big advocate of copy/paste in tests (rather than clever abstractions/genericity that make the test hard to read for a human), but in this case, it's hard to guess what the differences are between these tests. The docstrings/comments do describe them, but ...

ardumont added inline comments.Apr 16 2019, 1:49 PM
swh/deposit/api/private/deposit_read.py
167–177

oh, righty right.
initially, i just returned the tuple t and did not do anything on it
up until i realized i needed to do stuff on it...

swh/deposit/tests/api/test_deposit_read_metadata.py
400

I'll see what i can do, it sure irritates me a lot...
I was focused on making the thing works first ;)

I would add/change the date formats in tests (because we can recieve dates in different styles)
and/or test only the _parse_date and compute_date functions.
Here some examples:

<date type="whenSubmitted">2018-02-27 10:45:10</date>
<date type="whenWritten">2012</date>
<date type="whenModified">2018-06-11 17:02:02</date>
<date type="whenReleased">2018-03-20 13:09:08</date>
<date type="whenProduced">2012</date>
<date type="whenEndEmbargoed">2018-02-27</date>

I'm pretty sure some of those formats are not parsed correctly already.
The ones with only the year for example gives:

In [1]: from dateutil.parser import parse

In [2]: date = '2012'

In [3]: parse(date)
Out[3]: datetime.datetime(2012, 4, 16, 0, 0)

The others are fine though.

ardumont updated this revision to Diff 4596.Apr 16 2019, 2:39 PM
  • deposit_read: Refactor to clarify intents and docstring

Also rebase on latest master

ardumont updated this revision to Diff 4597.Apr 16 2019, 2:50 PM
  • tests: Factorize metadata xml duplication to explicit tested date
ardumont updated this revision to Diff 4598.Apr 16 2019, 3:11 PM
  • Refactor: Extract date utils manipulation function into utils module
ardumont added inline comments.Apr 16 2019, 3:13 PM
swh/deposit/tests/test_utils.py
182 ↗(On Diff #4598)

here!

======================= 112 passed, 1 warnings in 16.33 seconds ========================
Exception ignored in: <function WeakValueDictionary.__init__.<locals>.remove at 0x7ff254794840>
Traceback (most recent call last):
  File "/home/morane/Documents/code/swh-environment/swh-deposit/.tox/py3/lib/python3.5/weakref.py", line 117, in remove
TypeError: 'NoneType' object is not callable
Exception ignored in: <function WeakValueDictionary.__init__.<locals>.remove at 0x7ff254794840>
Traceback (most recent call last):
  File "/home/morane/Documents/code/swh-environment/swh-deposit/.tox/py3/lib/python3.5/weakref.py", line 117, in remove
TypeError: 'NoneType' object is not callable
WARNING [pifpaf.drivers] `psutil.Popen(pid=5224, status='terminated')` is already gone, sending SIGKILL to its process group
WARNING [pifpaf.drivers] `psutil.Popen(pid=5207, status='terminated')` is already gone, sending SIGKILL to its process group
WARNING [pifpaf.drivers] `psutil.Popen(pid=5202, status='terminated')` is already gone, sending SIGKILL to its process group
_______________________________________ summary ________________________________________
  flake8: commands succeeded
  py3: commands succeeded
  congratulations :)

Looks good, waiting for the 182 line comment about the choice of a date when only a year is entered .

swh/deposit/tests/api/test_deposit_read_metadata.py
474

The design choice of taking the first date in the list is good for me

swh/deposit/tests/test_utils.py
146 ↗(On Diff #4598)

that's good.

163 ↗(On Diff #4598)

tha's good, no?

180 ↗(On Diff #4598)

should be 2017-01-01 00:00:00, if possible?

ardumont updated this revision to Diff 4600.Apr 16 2019, 5:02 PM
  • utils: Make parsing date result in a reasonable date
ardumont updated this revision to Diff 4604.Apr 16 2019, 7:26 PM

Update test docstring to match the reality of the test

ardumont added inline comments.Apr 17 2019, 9:34 AM
swh/deposit/tests/test_utils.py
180 ↗(On Diff #4598)

That's fixed now thanks to the discussion with @douardda (iso8601)

ardumont updated this revision to Diff 4606.Apr 17 2019, 10:46 AM

Plug on master branch

This revision was not accepted when it landed; it landed in state Needs Review.Apr 17 2019, 10:52 AM
This revision was automatically updated to reflect the committed changes.

This revision was not accepted when it landed; it landed in state Needs Review.Wed, Apr 17, 10:52

seen with @douardda orally ;)