Page MenuHomeSoftware Heritage

Add Python PKG-INFO mapping.
ClosedPublic

Authored by vlorentz on Dec 21 2018, 2:45 PM.

Details

Reviewers
olasd
Group Reviewers
Reviewers
Maniphest Tasks
T1327: Add Python metadata indexer
Summary

Also update crosswalk.csv from CodeMeta, so it includes changes from
https://github.com/codemeta/codemeta/pull/203

Diff Detail

Repository
rDCIDX Object indexer
Branch
python-pkginfo
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 3249
Build 4176: tox-on-jenkinsJenkins
Build 4175: arc lint + arc unit

Event Timeline

vlorentz created this revision.Dec 21 2018, 2:45 PM
olasd accepted this revision.Dec 21 2018, 3:25 PM
olasd added a subscriber: olasd.

It's too bad the classifiers don't get mapped to anything useful!

Looks like we could import more metadata than what's currently there (e.g. license info), is it expected that it doesn't appear in the test data?

swh/indexer/metadata_dictionary.py
365–373

I'm surprised flake8 is happy about those +es

This revision is now accepted and ready to land.Dec 21 2018, 3:25 PM

It's too bad the classifiers don't get mapped to anything useful!

Yes. I'll probably add it in the future, I just need a nice way to write it in CodeMeta's crosswalk table (which is designed to be human-readable (including non-programmers), but we use as machine input anyway)

Looks like we could import more metadata than what's currently there (e.g. license info), is it expected that it doesn't appear in the test data?

I copy-pasted the PKG-INFO from PyPI. We don't have license info because that field is missing from swh.core's setup.py.

olasd added a comment.Dec 21 2018, 3:34 PM

Looks like we could import more metadata than what's currently there (e.g. license info), is it expected that it doesn't appear in the test data?

I copy-pasted the PKG-INFO from PyPI. We don't have license info because that field is missing from swh.core's setup.py.

I meant that the (UNKNOWN) info didn't get mapped in the metadata that we're importing in swh.

Or maybe that's just because the initial data is UNKNOWN? If that's the case it'd be nicer to use a more complete PKG-INFO file for our tests.

vlorentz updated this revision to Diff 2794.Dec 21 2018, 4:03 PM
  • Add test for license/small file + fix license normalization.
In D879#18793, @olasd wrote:

I meant that the (UNKNOWN) info didn't get mapped in the metadata that we're importing in swh.

Or maybe that's just because the initial data is UNKNOWN?

Yes it is. As I said, I copy-pasted from PyPI

If that's the case it'd be nicer to use a more complete PKG-INFO file for our tests.

Indeed, done. Good catch, there was a bug in the handling of licenses :)

olasd accepted this revision.Dec 21 2018, 4:06 PM
In D879#18793, @olasd wrote:

I meant that the (UNKNOWN) info didn't get mapped in the metadata that we're importing in swh.

Or maybe that's just because the initial data is UNKNOWN?

Yes it is.

Duh, I missed the if value != 'UNKNOWN' line.

If that's the case it'd be nicer to use a more complete PKG-INFO file for our tests.

Indeed, done. Good catch, there was a bug in the handling of licenses :)

Well, that stub is hardly a "more complete" PKG-INFO file, but better and more focused coverage is good nonetheless.

vlorentz updated this revision to Diff 2800.Jan 7 2019, 11:13 AM
  • rebase