Page MenuHomeSoftware Heritage

maven: Handle null mtime value in index for jar archive
ClosedPublic

Authored by anlambert on Apr 29 2022, 1:49 PM.

Details

Summary

It exists cases where the modification time for a jar archive in
a maven index is null which was leading to a processing error
by the lister.

So handle that case to avoid premature exit of the listing process.

swh-lister_1                        | [2022-04-29 10:26:13,222: DEBUG/ForkPoolWorker-1] * Yielding jar http://apps.geomajas.org/nexus/content/repositories/public/org/mobicents/protocols/mgcp/mgcp-impl/2.0.0.GA/mgcp-impl-2.0.0.GA-sources.jar: {'type': 'maven', 'url': 'http://apps.geomajas.org/nexus/content/repositories/public/org/mobicents/protocols/mgcp/mgcp-impl/2.0.0.GA/mgcp-impl-2.0.0.GA-sources.jar', 'doc': 547574, 'gid': 'org.mobicents.protocols.mgcp', 'aid': 'mgcp-impl', 'version': '2.0.0.GA', 'time': 0}
swh-lister_1                        | [2022-04-29 10:26:13,227: ERROR/ForkPoolWorker-1] Task swh.lister.maven.tasks.FullMavenLister[6551d966-a28e-42fb-9efb-fb56e48093f8] raised unexpected: ValueError("invalid literal for int() with base 10: ''")
swh-lister_1                        | Traceback (most recent call last):
swh-lister_1                        |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/celery/app/trace.py", line 451, in trace_task
swh-lister_1                        |     R = retval = fun(*args, **kwargs)
swh-lister_1                        |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/scheduler/task.py", line 61, in __call__
swh-lister_1                        |     result = super().__call__(*args, **kwargs)
swh-lister_1                        |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/celery/app/trace.py", line 734, in __protected_call__
swh-lister_1                        |     return self.run(*args, **kwargs)
swh-lister_1                        |   File "/src/swh-lister/swh/lister/maven/tasks.py", line 16, in list_maven_full
swh-lister_1                        |     return lister.run().dict()
swh-lister_1                        |   File "/src/swh-lister/swh/lister/pattern.py", line 130, in run
swh-lister_1                        |     full_stats.origins += self.send_origins(origins)
swh-lister_1                        |   File "/src/swh-lister/swh/lister/pattern.py", line 233, in send_origins
swh-lister_1                        |     for batch_origins in grouper(origins, n=1000):
swh-lister_1                        |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/core/utils.py", line 53, in grouper
swh-lister_1                        |     for _data in itertools.zip_longest(*args, fillvalue=stop_value):
swh-lister_1                        |   File "/src/swh-lister/swh/lister/maven/lister.py", line 309, in get_origins_from_page
swh-lister_1                        |     last_update_dt = datetime.fromtimestamp(int(last_update_seconds))
swh-lister_1                        | ValueError: invalid literal for int() with base 10: ''

Related to T3874

Diff Detail

Repository
rDLS Listers
Branch
master
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 28973
Build 45293: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 45292: arc lint + arc unit

Event Timeline

Build is green

Patch application report for D7716 (id=27908)

Rebasing onto 378613ad82...

Current branch diff-target is up to date.
Changes applied before test
commit 58cc90ddacdb77d12a3ca13f636435ae512e119b
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Fri Apr 29 13:47:30 2022 +0200

    maven: Handle null mtime value in index for jar archive
    
    It exists cases where the modification time for a jar archive in
    a maven index is null which was leading to a processing error
    by the lister.
    
    So handle that case to avoid premature exit of the listing process.

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/515/ for more details.

This revision is now accepted and ready to land.Apr 29 2022, 1:59 PM
anlambert removed a reviewer: ardumont.

Reference task in commit message

This revision now requires review to proceed.Apr 29 2022, 1:59 PM
This revision is now accepted and ready to land.Apr 29 2022, 2:00 PM

Build is green

Patch application report for D7716 (id=27909)

Rebasing onto 378613ad82...

Current branch diff-target is up to date.
Changes applied before test
commit 0222a8f5c474910e5968f0646a5aea75c860a961
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Fri Apr 29 13:47:30 2022 +0200

    maven: Handle null mtime value in index for jar archive
    
    It exists cases where the modification time for a jar archive in
    a maven index is null which was leading to a processing error
    by the lister.
    
    So handle that case to avoid premature exit of the listing process.
    
    Related to T3874

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/516/ for more details.