Page MenuHomeSoftware Heritage

extract_npm_package_author: Handle list of dict authors layout

Authored by anlambert on Apr 11 2019, 2:50 PM.



Some package.json files may contain an authors field consisting in
a list of dict. So handle that case to avoid errors such as:

[2019-04-11 12:03:21,650: ERROR/ForkPoolWorker-19] Loading failure, updating to `partial` status
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/swh/loader/core/", line 893, in load
    more_data_to_fetch = self.fetch_data()
  File "/usr/lib/python3/dist-packages/swh/loader/npm/", line 203, in fetch_data
    data = next(self.new_versions)
  File "/usr/lib/python3/dist-packages/swh/loader/npm/", line 145, in prepare_package_versions
  File "/usr/lib/python3/dist-packages/swh/loader/npm/", line 200, in _prepare_package_version
    author = extract_npm_package_author(package_json)
  File "/usr/lib/python3/dist-packages/swh/loader/npm/", line 92, in extract_npm_package_author
    author_data = parse_npm_package_author(package_json['authors'][0])
  File "/usr/lib/python3/dist-packages/swh/loader/npm/", line 52, in parse_npm_package_author
    author_str.replace('<>', '').replace('()', ''),
AttributeError: 'dict' object has no attribute 'replace'

Related T1644

Diff Detail

rDLDNPM npm loader
Automatic diff as part of commit; lint not applicable.
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

olasd added a subscriber: olasd.

This change looks sensible in the context of what's already here, but I'm not sure if favoring the first author over other authors is the best choice here.

I don't remember how we've solved that issue in the case of the deposit, for instance.

Could we turn that question into an issue so we can make a consistent decision across our loaders?

This revision is now accepted and ready to land.Apr 11 2019, 4:11 PM

The deposit loader uses the tar loader under the hood which makes the author or each produced revision to Software Heritage <>[1].
The real information about authors can be found in the revision metadata, see [2] as an example.

For npm, a dump of the package.json file is also available in each produced revision metadata, including the full authors list.
I agree that how handling the multiple authors case should be discussed, I have created T1645 on the subject.


This revision was automatically updated to reflect the committed changes.