Page MenuHomeSoftware Heritage

utils: Fix some author parsing errors
ClosedPublic

Authored by anlambert on May 20 2019, 4:56 PM.

Details

Summary

Digging into the production logs of the npm loader, the following errors are reported:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/swh/loader/core/loader.py", line 895, in load
    more_data_to_fetch = self.fetch_data()
  File "/usr/lib/python3/dist-packages/swh/loader/npm/loader.py", line 203, in fetch_data
    data = next(self.new_versions)
  File "/usr/lib/python3/dist-packages/swh/loader/npm/client.py", line 149, in prepare_package_versions
    version_data)
  File "/usr/lib/python3/dist-packages/swh/loader/npm/client.py", line 210, in _prepare_package_version
    author = extract_npm_package_author(package_json)
  File "/usr/lib/python3/dist-packages/swh/loader/npm/utils.py", line 95, in extract_npm_package_author
    author_data = parse_npm_package_author(author_str)
  File "/usr/lib/python3/dist-packages/swh/loader/npm/utils.py", line 52, in parse_npm_package_author
    author_str.replace('<>', '').replace('()', ''),
AttributeError: 'list' object has no attribute 'replace'
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/swh/loader/core/loader.py", line 895, in load
    more_data_to_fetch = self.fetch_data()
  File "/usr/lib/python3/dist-packages/swh/loader/npm/loader.py", line 203, in fetch_data
    data = next(self.new_versions)
  File "/usr/lib/python3/dist-packages/swh/loader/npm/client.py", line 149, in prepare_package_versions
    version_data)
  File "/usr/lib/python3/dist-packages/swh/loader/npm/client.py", line 210, in _prepare_package_version
    author = extract_npm_package_author(package_json)
  File "/usr/lib/python3/dist-packages/swh/loader/npm/utils.py", line 97, in extract_npm_package_author
    author_str = _author_str(package_json['authors'][0])
KeyError: 0

The first error happens when the author field in a package.json file corresponds to a list instead of a string,
while the second one happens when the author field corresponds to a dict instead of a string,

Diff Detail

Repository
rDLDNPM npm loader
Branch
more-author-parsing-fixes
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 5831
Build 7987: tox-on-jenkinsJenkins
Build 7986: arc lint + arc unit

Event Timeline

anlambert created this revision.May 20 2019, 4:56 PM
anlambert updated this revision to Diff 4894.May 20 2019, 5:19 PM

Update: Handle more author layout

anlambert retitled this revision from utils: Fix parsing when author field is a list in package.json to utils: Fix some author parsing errors.May 20 2019, 5:21 PM
anlambert edited the summary of this revision. (Show Details)
anlambert updated this revision to Diff 4895.May 20 2019, 5:24 PM

Update commit message

ardumont accepted this revision.May 21 2019, 9:59 AM
ardumont added a subscriber: ardumont.
ardumont added inline comments.
swh/loader/npm/utils.py
89

I see that you are consistent with the other conditional.
Still, out of curiosity, what's the difference with isinstance(author, list)?

I checked [1] to have a feel of it ;)

[1] https://stackoverflow.com/questions/1549801/what-are-the-differences-between-type-and-isinstance

This revision is now accepted and ready to land.May 21 2019, 9:59 AM
anlambert added inline comments.May 21 2019, 10:49 AM
swh/loader/npm/utils.py
89

I could have used isinstance indeed but as no type inheritance is implied here but only base Python types check, simply using type does also the job.

This revision was automatically updated to reflect the committed changes.