Differential D7715

maven: Remove extraction of groupId and artifactId from pom files
ClosedPublic
Actions

Authored by anlambert on Apr 29 2022, 11:20 AM.

Details

Reviewers

ardumont

Group Reviewers

Reviewers

Commits

rDLS378613ad82fc: maven: Remove extraction of groupId and artifactId from pom files

Summary

When parsing pom files, we are only interested to extract a VCS URL
(git, hg, svn) in order to create associated loading tasks.

In that case, the groupId and artifactId are not used by the lister
so better removing their extraction, plus it will prevent errors when
those info are missing in pom files.

See for instance that error when listing jboss maven:

swh-lister_1                        | [2022-04-29 09:02:04,598: INFO/ForkPoolWorker-1] Fetching URL https://repository.jboss.org/maven2/org/jboss/ejb3/jboss-ejb3-tutorial-enterprise_webapp/0.1.0/jboss-ejb3-tutorial-enterprise_webapp-0.1.0.pom with params {}
swh-lister_1                        | [2022-04-29 09:02:04,748: ERROR/ForkPoolWorker-1] Task swh.lister.maven.tasks.FullMavenLister[45b54b16-ed7a-4b9c-80a3-b8adb25b8fe0] raised unexpected: KeyError('groupId')
swh-lister_1                        | Traceback (most recent call last):
swh-lister_1                        |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/celery/app/trace.py", line 451, in trace_task
swh-lister_1                        |     R = retval = fun(*args, **kwargs)
swh-lister_1                        |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/scheduler/task.py", line 61, in __call__
swh-lister_1                        |     result = super().__call__(*args, **kwargs)
swh-lister_1                        |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/celery/app/trace.py", line 734, in __protected_call__
swh-lister_1                        |     return self.run(*args, **kwargs)
swh-lister_1                        |   File "/src/swh-lister/swh/lister/maven/tasks.py", line 16, in list_maven_full
swh-lister_1                        |     return lister.run().dict()
swh-lister_1                        |   File "/src/swh-lister/swh/lister/pattern.py", line 127, in run
swh-lister_1                        |     for page in self.get_pages():
swh-lister_1                        |   File "/src/swh-lister/swh/lister/maven/lister.py", line 256, in get_pages
swh-lister_1                        |     gid = project_d["groupId"]
swh-lister_1                        | KeyError: 'groupId'

Diff Detail

Repository

rDLS Listers

Lint

Automatic diff as part of commit; lint not applicable.

Unit

Automatic diff as part of commit; unit tests not applicable.

Event Timeline

anlambert created this revision.Apr 29 2022, 11:20 AM

Herald added a reviewer: Reviewers. · View Herald TranscriptApr 29 2022, 11:20 AM

ardumont published this revision for review.Apr 29 2022, 11:23 AM

ardumont added a subscriber: ardumont.

ardumont added inline comments.

swh/lister/maven/lister.py
295	right ^!

ardumont accepted this revision.Apr 29 2022, 11:23 AM

This revision is now accepted and ready to land.Apr 29 2022, 11:23 AM

Build is green

Patch application report for D7715 (id=27905)

Rebasing onto 22bcd9deb2...

Current branch diff-target is up to date.

Changes applied before test

commit 378613ad82fc00b6585d00afd1c814f3f7c5ccb6
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Fri Apr 29 11:14:54 2022 +0200

    maven: Remove extraction of groupId and artifactId from pom files
    
    When parsing pom files, we are only interested to extract a VCS URL
    (git, hg, svn) in order to create associated loading tasks.
    
    In that case, the groupId and artifactId are not used by the lister
    so better removing their extraction, plus it will prevent errors when
    those info are missing in pom files.

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/514/ for more details.

Harbormaster completed remote builds in B28970: Diff 27905.Apr 29 2022, 11:25 AM

Closed by commit rDLS378613ad82fc: maven: Remove extraction of groupId and artifactId from pom files (authored by anlambert). · Explain WhyApr 29 2022, 11:28 AM

This revision was automatically updated to reflect the committed changes.

anlambert added a commit: rDLS378613ad82fc: maven: Remove extraction of groupId and artifactId from pom files.