Page MenuHomeSoftware Heritage

cgit: Add support for last_update information during listing
ClosedPublic

Authored by vsellier on Jan 26 2021, 6:33 PM.

Diff Detail

Event Timeline

Build is green

Patch application report for D4953 (id=17645)

Rebasing onto bb0184c004...

Current branch diff-target is up to date.
Changes applied before test
commit 623653e60386c658608abbbc46bf20ab5c6fc55b
Author: Vincent SELLIER <vincent.sellier@softwareheritage.org>
Date:   Tue Jan 26 18:03:27 2021 +0100

    cgit: Add support for last_update information during listing
    
    Related to T2988

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/189/ for more details.

Inline unecessary indirection

Build is green

Patch application report for D4953 (id=17647)

Rebasing onto bb0184c004...

Current branch diff-target is up to date.
Changes applied before test
commit 63f14538bc39d9f6ac34eedeebeecc7aee8d8d44
Author: Vincent SELLIER <vincent.sellier@softwareheritage.org>
Date:   Tue Jan 26 18:03:27 2021 +0100

    cgit: Add support for last_update information during listing
    
    Related to T2988

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/190/ for more details.

Build is green

Patch application report for D4953 (id=17648)

Rebasing onto bb0184c004...

Current branch diff-target is up to date.
Changes applied before test
commit 40ce67d80ad5ded093fc244dcd5320bbc58579d9
Author: Vincent SELLIER <vincent.sellier@softwareheritage.org>
Date:   Tue Jan 26 18:03:27 2021 +0100

    cgit: Add support for last_update information during listing
    
    Related to T2988

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/191/ for more details.

anlambert added a subscriber: anlambert.

Thanks for tackling this !

I added some inline comments about how the date parsing function could be simplified.

Also no naive datetimes should be returned as swh data models forbid them.

swh/lister/cgit/lister.py
22–54

You could simplify that function by directly using the datetime module from Python standard library.
Something like:

from datetime import datetime, timezone

parsed_date = None
for date_format in ("%Y-%m-%d %H:%M:%S %z", "%Y-%m-%d %H:%M:%S (%Z)"):
    try:
        parsed_date = datetime.strptime(date, date_format)
        # force UTC to avoid naive datetime
        if not parsed_date.tzinfo:
            parsed_date = parsed_date.replace(tzinfo=timezone.utc)
        break
    except Exception:
        pass

return parsed_date

The "%Y-%m-%d %H:%M:%S (%Z)" format will not create a timezone aware datetime
after parsing so a force to UTC is required here as swh data models forbid naive datetimes.
This will not correspond to the real timezone most of the times but that's not really an issue
from my point of view and hopefully, it looks like most of the cgit instances use the
"%Y-%m-%d %H:%M:%S %z" date format.

swh/lister/cgit/tests/test_lister.py
106–122

Here I would rather test the parsing result against a datetime object.

@pytest.mark.parametrize(
"date_str,expected_datetime",
[
    ({}, None),
    ("unexpected date", None),
    ("2020-0140-10 10:10:10 (GMT)", None),
    (
        "2020-01-10 10:10:10 (UTC)",
        datetime(
            year=2020,
            month=1,
            day=10,
            hour=10,
            minute=10,
            second=10,
            tzinfo=timezone.utc,
        ),
    ),
    (
        "2019-08-04 05:10:41 +0100",
        datetime(
            year=2019,
            month=8,
            day=4,
            hour=5,
            minute=10,
            second=41,
            tzinfo=timezone(timedelta(seconds=3600)),
        ),
    ),
],
)
def test_lister_cgit_date_parsing(date_str, expected_datetime):
"""test cgit lister date parsing"""

repository = {"url": "url", "last_updated_date": date_str}

assert _parse_last_updated_date(repository) == expected_datetime
This revision now requires changes to proceed.Jan 27 2021, 12:09 PM
  • Reorder methods
  • Adapt date parsing according the review

Build is green

Patch application report for D4953 (id=17654)

Rebasing onto bb0184c004...

Current branch diff-target is up to date.
Changes applied before test
commit f1602ab87217f105e3245dc5058c281cb39a1b82
Author: Vincent SELLIER <vincent.sellier@softwareheritage.org>
Date:   Tue Jan 26 18:03:27 2021 +0100

    cgit: Add support for last_update information during listing
    
    Related to T2988

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/192/ for more details.

Looks good to me !

swh/lister/cgit/lister.py
23–25

You can remove this variable as it is no more used.

This revision is now accepted and ready to land.Jan 27 2021, 1:35 PM

Build is green

Patch application report for D4953 (id=17667)

Rebasing onto bb0184c004...

Current branch diff-target is up to date.
Changes applied before test
commit 743bff42190060c7522b8d00ead1f2bfcd10f5f6
Author: Vincent SELLIER <vincent.sellier@softwareheritage.org>
Date:   Tue Jan 26 18:03:27 2021 +0100

    cgit: Add support for last_update information during listing
    
    Related to T2988

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/194/ for more details.

Restore missing log when the date can't be parsed

Build is green

Patch application report for D4953 (id=17669)

Rebasing onto bb0184c004...

Current branch diff-target is up to date.
Changes applied before test
commit 91fcde83410d41daf0dbe4f431b55c33c69f8544
Author: Vincent SELLIER <vincent.sellier@softwareheritage.org>
Date:   Tue Jan 26 18:03:27 2021 +0100

    cgit: Add support for last_update information during listing
    
    Related to T2988

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/196/ for more details.