Page MenuHomeSoftware Heritage

utils: Fix some author parsing errors
ClosedPublic

Authored by anlambert on May 20 2019, 4:56 PM.

Details

Summary

Digging into the production logs of the npm loader, the following errors are reported:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/swh/loader/core/loader.py", line 895, in load
    more_data_to_fetch = self.fetch_data()
  File "/usr/lib/python3/dist-packages/swh/loader/npm/loader.py", line 203, in fetch_data
    data = next(self.new_versions)
  File "/usr/lib/python3/dist-packages/swh/loader/npm/client.py", line 149, in prepare_package_versions
    version_data)
  File "/usr/lib/python3/dist-packages/swh/loader/npm/client.py", line 210, in _prepare_package_version
    author = extract_npm_package_author(package_json)
  File "/usr/lib/python3/dist-packages/swh/loader/npm/utils.py", line 95, in extract_npm_package_author
    author_data = parse_npm_package_author(author_str)
  File "/usr/lib/python3/dist-packages/swh/loader/npm/utils.py", line 52, in parse_npm_package_author
    author_str.replace('<>', '').replace('()', ''),
AttributeError: 'list' object has no attribute 'replace'
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/swh/loader/core/loader.py", line 895, in load
    more_data_to_fetch = self.fetch_data()
  File "/usr/lib/python3/dist-packages/swh/loader/npm/loader.py", line 203, in fetch_data
    data = next(self.new_versions)
  File "/usr/lib/python3/dist-packages/swh/loader/npm/client.py", line 149, in prepare_package_versions
    version_data)
  File "/usr/lib/python3/dist-packages/swh/loader/npm/client.py", line 210, in _prepare_package_version
    author = extract_npm_package_author(package_json)
  File "/usr/lib/python3/dist-packages/swh/loader/npm/utils.py", line 97, in extract_npm_package_author
    author_str = _author_str(package_json['authors'][0])
KeyError: 0

The first error happens when the author field in a package.json file corresponds to a list instead of a string,
while the second one happens when the author field corresponds to a dict instead of a string,

Diff Detail

Repository
rDLDNPM npm loader
Branch
more-author-parsing-fixes
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 5831
Build 7987: tox-on-jenkinsJenkins
Build 7986: arc lint + arc unit

Event Timeline

Update: Handle more author layout

anlambert retitled this revision from utils: Fix parsing when author field is a list in package.json to utils: Fix some author parsing errors.May 20 2019, 5:21 PM
anlambert edited the summary of this revision. (Show Details)
ardumont added a subscriber: ardumont.
ardumont added inline comments.
swh/loader/npm/utils.py
88

I see that you are consistent with the other conditional.
Still, out of curiosity, what's the difference with isinstance(author, list)?

I checked [1] to have a feel of it ;)

[1] https://stackoverflow.com/questions/1549801/what-are-the-differences-between-type-and-isinstance

This revision is now accepted and ready to land.May 21 2019, 9:59 AM
swh/loader/npm/utils.py
88

I could have used isinstance indeed but as no type inheritance is implied here but only base Python types check, simply using type does also the job.

This revision was automatically updated to reflect the committed changes.