Page MenuHomeSoftware Heritage

Add classmethod Person.from_address, to parse from 'name <email>' strings.
ClosedPublic

Authored by vlorentz on Mar 2 2020, 3:58 PM.

Details

Summary

This will allow deduplicating code across loaders.

Diff Detail

Repository
rDMOD Data model
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

olasd requested changes to this revision.Mar 2 2020, 5:08 PM
olasd added a subscriber: olasd.
olasd added inline comments.
swh/model/model.py
91

I guess it's really a fullname rather than an address.

Please add some documentation. Notably, please mention that

  • name and email are only for advisory / presentation purposes and that the actual data is in the fullname
  • The parsing is very lenient and lossy: for instance, all data after a closing bracket is dropped.
98–99

How many users actually need this?

I'm a bit worried about the str/bytes dual support bubbling up to callers, and I'd really prefer for the str operation to be an explicit request, that is, either having a from_fullname_str function, or having the callers call .encode() themselves.

109–114

At that point, as we never guarantee that name/email and fullname can round-trip, we should probably just strip() the contents of the full name and the email. Of course that's introducing a subtle change of behavior everywhere, which isn't /great/.

117

Probably should use rindex (i.e. find the last closing bracket).

This revision now requires changes to proceed.Mar 2 2020, 5:08 PM

I think you should simply reuse the parse_author function that you removed in D2743.

It handles a lot of author strings format and is properly tested.

Its purpose was to handle all the weird things that you can find in package.json files in the wild.

swh/model/model.py
98–99

I copy-pasted this from the git loader, which probably doesn't need it.

I think you should simply reuse the parse_author function that you removed in D2743.

It handles a lot of author strings format and is properly tested.

Its purpose was to handle all the weird things that you can find in package.json files in the wild.

The goal is to phase out name and email from the model/storage and push it to frontends, and this is the simplest way to preserve compatibility with non-NPM sources in case there are parentheses in the name.

This revision was not accepted when it landed; it landed in state Needs Review.Mar 4 2020, 3:42 PM
This revision was automatically updated to reflect the committed changes.

Oops... I didn't mean to push this