Page MenuHomeSoftware Heritage

maven: Remove extraction of groupId and artifactId from pom files
ClosedPublic

Authored by anlambert on Apr 29 2022, 11:20 AM.

Details

Summary

When parsing pom files, we are only interested to extract a VCS URL
(git, hg, svn) in order to create associated loading tasks.

In that case, the groupId and artifactId are not used by the lister
so better removing their extraction, plus it will prevent errors when
those info are missing in pom files.

See for instance that error when listing jboss maven:

swh-lister_1                        | [2022-04-29 09:02:04,598: INFO/ForkPoolWorker-1] Fetching URL https://repository.jboss.org/maven2/org/jboss/ejb3/jboss-ejb3-tutorial-enterprise_webapp/0.1.0/jboss-ejb3-tutorial-enterprise_webapp-0.1.0.pom with params {}
swh-lister_1                        | [2022-04-29 09:02:04,748: ERROR/ForkPoolWorker-1] Task swh.lister.maven.tasks.FullMavenLister[45b54b16-ed7a-4b9c-80a3-b8adb25b8fe0] raised unexpected: KeyError('groupId')
swh-lister_1                        | Traceback (most recent call last):
swh-lister_1                        |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/celery/app/trace.py", line 451, in trace_task
swh-lister_1                        |     R = retval = fun(*args, **kwargs)
swh-lister_1                        |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/scheduler/task.py", line 61, in __call__
swh-lister_1                        |     result = super().__call__(*args, **kwargs)
swh-lister_1                        |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/celery/app/trace.py", line 734, in __protected_call__
swh-lister_1                        |     return self.run(*args, **kwargs)
swh-lister_1                        |   File "/src/swh-lister/swh/lister/maven/tasks.py", line 16, in list_maven_full
swh-lister_1                        |     return lister.run().dict()
swh-lister_1                        |   File "/src/swh-lister/swh/lister/pattern.py", line 127, in run
swh-lister_1                        |     for page in self.get_pages():
swh-lister_1                        |   File "/src/swh-lister/swh/lister/maven/lister.py", line 256, in get_pages
swh-lister_1                        |     gid = project_d["groupId"]
swh-lister_1                        | KeyError: 'groupId'

Diff Detail

Repository
rDLS Listers
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

ardumont added a subscriber: ardumont.
ardumont added inline comments.
swh/lister/maven/lister.py
295

right ^!

This revision is now accepted and ready to land.Apr 29 2022, 11:23 AM

Build is green

Patch application report for D7715 (id=27905)

Rebasing onto 22bcd9deb2...

Current branch diff-target is up to date.
Changes applied before test
commit 378613ad82fc00b6585d00afd1c814f3f7c5ccb6
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Fri Apr 29 11:14:54 2022 +0200

    maven: Remove extraction of groupId and artifactId from pom files
    
    When parsing pom files, we are only interested to extract a VCS URL
    (git, hg, svn) in order to create associated loading tasks.
    
    In that case, the groupId and artifactId are not used by the lister
    so better removing their extraction, plus it will prevent errors when
    those info are missing in pom files.

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/514/ for more details.