Page MenuHomeSoftware Heritage

Adds NuGet Mappings
Needs ReviewPublic

Authored by VickyMerzOwn on Wed, Jul 20, 7:24 AM.

Details

Summary

Fixes T4392

Diff Detail

Repository
rDCIDX Metadata indexer
Branch
master
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 30590
Build 47826: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 47825: arc lint + arc unit

Unit TestsFailed

TimeTest
967 msJenkins > .tox.py3.lib.python3.7.site-packages.swh.indexer.tests.storage.test_api_client.TestIndexerStorageContentMetadata::test_add_deadlock
self = <contextlib.ExitStack object at 0x7fc33276b9b0> exc_details = (<class 'ValueError'>, ValueError("<Token var=<ContextVar name='flask.request_ctx' at 0x7fc345038eb8> at 0x7fc3320c2360> was created in a different Context"), <traceback object at 0x7fc332128f88>) received_exc = False
109 msJenkins > .tox.py3.lib.python3.7.site-packages.swh.indexer.tests.storage.test_api_client.TestIndexerStorageContentMetadata::test_add_deadlock
def teardown(): > ctx.pop()
902 msJenkins > .tox.py3.lib.python3.7.site-packages.swh.indexer.tests.storage.test_api_client.TestIndexerStorageContentMimetypes::test_add_deadlock
self = <contextlib.ExitStack object at 0x7fc33271d2e8> exc_details = (<class 'ValueError'>, ValueError("<Token var=<ContextVar name='flask.request_ctx' at 0x7fc345038eb8> at 0x7fc3324b5d38> was created in a different Context"), <traceback object at 0x7fc3324b8348>) received_exc = False
209 msJenkins > .tox.py3.lib.python3.7.site-packages.swh.indexer.tests.storage.test_api_client.TestIndexerStorageContentMimetypes::test_add_deadlock
def teardown(): > ctx.pop()
1,000 msJenkins > .tox.py3.lib.python3.7.site-packages.swh.indexer.tests.storage.test_api_client.TestIndexerStorageDirectoryIntrinsicMetadata::test_add_deadlock
self = <contextlib.ExitStack object at 0x7fc332770cf8> exc_details = (<class 'ValueError'>, ValueError("<Token var=<ContextVar name='flask.request_ctx' at 0x7fc345038eb8> at 0x7fc33219ad38> was created in a different Context"), <traceback object at 0x7fc3318c0f48>) received_exc = False
View Full Test Results (10 Failed · 326 Passed · 9 Skipped)

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Build is green

Patch application report for D8144 (id=29395)

Rebasing onto fa67b73d6a...

Current branch diff-target is up to date.
Changes applied before test
commit 158772a114e68f1d80f2a848d3ef16ee9f73f54b
Author: Satvik Vemuganti <vemugantisesha@iitbhilai.ac.in>
Date:   Wed Jul 20 10:52:34 2022 +0530

    Adds nuget.csv

See https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/362/ for more details.

Adds test for NuGet mapping expected working

Build has FAILED

Patch application report for D8144 (id=29412)

Rebasing onto fa67b73d6a...

Current branch diff-target is up to date.
Changes applied before test
commit bce8921ca7a98fc1705cfc18a30d3c22aeed2b76
Author: Satvik Vemuganti <vemugantisesha@iitbhilai.ac.in>
Date:   Wed Jul 20 18:22:19 2022 +0530

    Adds test_nuget.py

commit 158772a114e68f1d80f2a848d3ef16ee9f73f54b
Author: Satvik Vemuganti <vemugantisesha@iitbhilai.ac.in>
Date:   Wed Jul 20 10:52:34 2022 +0530

    Adds nuget.csv

Link to build: https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/363/
See console output for more information: https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/363/console

Adds class for XmlMappings and child classes:

  • MavenMapping
  • NuGetMapping

Build has FAILED

Patch application report for D8144 (id=29417)

Rebasing onto fa67b73d6a...

Current branch diff-target is up to date.
Changes applied before test
commit 2f5145794412b82d857e12a78764ae0dd97e3a61
Author: Satvik Vemuganti <vemugantisesha@iitbhilai.ac.in>
Date:   Thu Jul 21 02:03:08 2022 +0530

    Create class for XmlMappings

commit bce8921ca7a98fc1705cfc18a30d3c22aeed2b76
Author: Satvik Vemuganti <vemugantisesha@iitbhilai.ac.in>
Date:   Wed Jul 20 18:22:19 2022 +0530

    Adds test_nuget.py

commit 158772a114e68f1d80f2a848d3ef16ee9f73f54b
Author: Satvik Vemuganti <vemugantisesha@iitbhilai.ac.in>
Date:   Wed Jul 20 10:52:34 2022 +0530

    Adds nuget.csv

Link to build: https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/366/
See console output for more information: https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/366/console

Working indexer for NuGet packages

Build is green

Patch application report for D8144 (id=29465)

Rebasing onto 230e89d8ee...

First, rewinding head to replay your work on top of it...
Applying: metadata_dictionary: Add mappings for .nuspec
Changes applied before test
commit 064731d674107f80e49527f739748dac487be75b
Author: Satvik Vemuganti <vemugantisesha@iitbhilai.ac.in>
Date:   Wed Jul 20 10:52:34 2022 +0530

    metadata_dictionary: Add mappings for .nuspec

See https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/384/ for more details.

Restores maven.py to state prior to diff

Build is green

Patch application report for D8144 (id=29466)

Rebasing onto 230e89d8ee...

First, rewinding head to replay your work on top of it...
Applying: metadata_dictionary: Add mappings for .nuspec
Changes applied before test
commit 410b86c2a8224de6d9099dc4cd31c725afb71240
Author: Satvik Vemuganti <vemugantisesha@iitbhilai.ac.in>
Date:   Wed Jul 20 10:52:34 2022 +0530

    metadata_dictionary: Add mappings for .nuspec

See https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/385/ for more details.

normalize_license instead of translate_license

Build is green

Patch application report for D8144 (id=29467)

Rebasing onto 230e89d8ee...

First, rewinding head to replay your work on top of it...
Applying: metadata_dictionary: Add mappings for .nuspec
Applying: reset maven.py
Changes applied before test
commit 096896d5c95721e52826d4cf94d292d22c85ed08
Author: Satvik Vemuganti <vemugantisesha@iitbhilai.ac.in>
Date:   Sat Jul 23 03:06:32 2022 +0530

    reset maven.py

commit 42d8891997296c58322966861e817075798e8a75
Author: Satvik Vemuganti <vemugantisesha@iitbhilai.ac.in>
Date:   Wed Jul 20 10:52:34 2022 +0530

    metadata_dictionary: Add mappings for .nuspec

See https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/386/ for more details.

Could you add support for http://schema.org/inLanguage?

swh/indexer/metadata_dictionary/nuget.py
40–42

please add a test for that

55

Can @url be absent?

60–61

can #text be absent?

65–70

you can use a list comprehension to rewrite this as a single line

This revision now requires changes to proceed.Fri, Jul 22, 11:49 PM

Please read the documentation, this is explicitly given as a valid example:

<license type="expression">BSD-2-Clause OR MIT</license>

but would not result in a valid license URI.

And I see the following fields that are missing from the mapping (in addition to language) :

  • licenseUrl (even though it is deprecated)
  • summary
  • releaseNotes
  • copyright
  • tags
swh/indexer/data/nuget.csv
68

according to https://docs.microsoft.com/en-us/nuget/reference/nuspec , it is (or can be) a relative path. Please validate it.

VickyMerzOwn marked an inline comment as done.

Applies list comprehension and checks for empty repository

swh/indexer/metadata_dictionary/nuget.py
40–42

I'm not sure how I can test for this. Thinking about it

55

Yes. I'll change that so the translation will happen only if @url is present.

60–61

No. if v.get("@type") == "expression" then #text is always present.

Build is green

Patch application report for D8144 (id=29472)

Rebasing onto 230e89d8ee...

First, rewinding head to replay your work on top of it...
Applying: metadata_dictionary: Add mappings for .nuspec
Applying: reset maven.py
Applying: list comprehension
Changes applied before test
commit c54e910a3ffdbe002e8dd91d22defe1fa5dbf19d
Author: Satvik Vemuganti <vemugantisesha@iitbhilai.ac.in>
Date:   Mon Jul 25 16:37:01 2022 +0530

    list comprehension

commit 53c3e3157a6947b8c098b45985d5b49482215db7
Author: Satvik Vemuganti <vemugantisesha@iitbhilai.ac.in>
Date:   Sat Jul 23 03:06:32 2022 +0530

    reset maven.py

commit ab3318960e24aa3de4f9c0455ff8b07c80e64a2a
Author: Satvik Vemuganti <vemugantisesha@iitbhilai.ac.in>
Date:   Wed Jul 20 10:52:34 2022 +0530

    metadata_dictionary: Add mappings for .nuspec

See https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/387/ for more details.

swh/indexer/metadata_dictionary/nuget.py
40–42

try using unitest test suites. it provides you assertLogs which you can use like this:

python
from unitest import TestCase

class SomeTests(TestCase):
   def test_something():
      with self.assertLogs(level="DEBUG") as captured:
         assert captured.records[0] == "..."

there are other ways of doing the same thing. iirc, I did it once with a cleaner approach. so feel free to explore further.

swh/indexer/metadata_dictionary/nuget.py
40–42

no, we use pytest instead of unittest now. See https://docs.pytest.org/en/6.2.x/logging.html for testing logs.

VickyMerzOwn added inline comments.
swh/indexer/data/nuget.csv
68

How can I convert this relative path into a URL?
https://sourcegraph.com/search?q=context:global+%3Creadme+file:.nuspec&patternType=literal

Or do I just leave readme field out of it?

swh/indexer/metadata_dictionary/nuget.py
40–42

I'm still not clear how xmltodict.parse can output a non dict object. I added the check initially from MavenMapping's translate method.

Excludes test and subsequent log for xmltodict.parse to be dict

Build is green

Patch application report for D8144 (id=29495)

Rebasing onto f31f1f5b34...

First, rewinding head to replay your work on top of it...
Applying: metadata_dictionary: Add mappings for .nuspec
Changes applied before test
commit 478e555574b9e3f0e03e29fec42584f3e64170cd
Author: Satvik Vemuganti <vemugantisesha@iitbhilai.ac.in>
Date:   Wed Jul 20 10:52:34 2022 +0530

    metadata_dictionary: Add mappings for .nuspec

See https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/391/ for more details.

KShivendu added inline comments.
swh/indexer/metadata_dictionary/nuget.py
23

I think this should be ... for Nuget (.nuspec) mapping ...

38

.get("package") defaults to None instead of {} so .get("metadata") won't work if package isn't found in the parsed dictionary. It's interesting that mypy didn't point out this.

Please use .get("package", {}) instead.

Also, why did we drop the try/except blocks?

48

We use isinstance(v, dict) everywhere. Please follow that.

54

We use isinstance(v, dict) everywhere. Please follow that.

60

nitpick: You don't need to pass " " to strip. It's the default value.

This revision now requires changes to proceed.Tue, Aug 2, 8:09 AM
swh/indexer/metadata_dictionary/nuget.py
39–41

Unlike maven, we aren't performing any operation after _translate_dict. So we can directly use normalization (i.e. jsonld compaction)

You can just directly use return self._translate_dict(d) (since normalize=True is the default)

Skim through the code of self._translate_dict to learn more.

Excludes test and subsequent log for xmltodict.parse to be dict

Why? D5799 shows it is needed. If you cannot find a way to test it then so be it; but don't remove the code.

Also, please re-add the XmlMapping base class you removed in the "Working indexer for NuGet packages" update

swh/indexer/data/nuget.csv
68

yes, leave it out.

But keep the row in the CSV, though...

swh/indexer/metadata_dictionary/nuget.py
27

Isn't .nuspec the extension?

34

not anymore

60

It isn't:

>>> "foo  bar ".split()
['foo', 'bar']
>>> "foo  bar ".split(" ")
['foo', '', 'bar', '']

See https://docs.python.org/3/library/stdtypes.html#str.split

60–61

then use ["text"] instead of .get("text"). (and elsewhere too)

swh/indexer/tests/metadata_dictionary/test_nuget.py
12

please use the right namespace

swh/indexer/metadata_dictionary/nuget.py
60

I think there's a small misunderstanding here.

he's using strip not split

I also got confused when I first saw it :p

swh/indexer/metadata_dictionary/nuget.py
60

ah yes, my bad. strip removes all whitespaces by default though; but that's probably a good behavior anyway.

VickyMerzOwn marked 4 inline comments as done.

Use dict instead of Dict
return _translate_dict rather than normalize_dict(_translate_dict)
Use .get("package", {}) instead of .get("package").get("metadata", {})

return _translate_dict rather than normalize_dict(_translate_dict)

Build has FAILED

Patch application report for D8144 (id=29518)

Rebasing onto f31f1f5b34...

First, rewinding head to replay your work on top of it...
Applying: metadata_dictionary: Add mappings for .nuspec
Changes applied before test
commit e5b39f9fc5ccbf5adb8f2ceeaf631e694f596f31
Author: Satvik Vemuganti <vemugantisesha@iitbhilai.ac.in>
Date:   Wed Jul 20 10:52:34 2022 +0530

    metadata_dictionary: Add mappings for .nuspec

Link to build: https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/392/
See console output for more information: https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/392/console

Build has FAILED

Patch application report for D8144 (id=29519)

Rebasing onto f31f1f5b34...

First, rewinding head to replay your work on top of it...
Applying: metadata_dictionary: Add mappings for .nuspec
Changes applied before test
commit c88ea6c22e45f41b3c5e5446f87c4403716e89d7
Author: Satvik Vemuganti <vemugantisesha@iitbhilai.ac.in>
Date:   Wed Jul 20 10:52:34 2022 +0530

    metadata_dictionary: Add mappings for .nuspec

Link to build: https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/393/
See console output for more information: https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/393/console

swh/indexer/metadata_dictionary/nuget.py
27

oh, yes. Do I use a regex then?

38

I think you mean .get("package").get("metadata", {}) to simplify using .get()'s parameter
Adding the try-except block back. I dropped it because I couldn't think of a test case.

39–41

oh, yeah! makes sense. Updating it..

swh/indexer/tests/metadata_dictionary/test_nuget.py
12

Sorry, didn't understand what you mean. Could you help me? What is a namespace? found some definitions on google but didn't understand what to change here.

swh/indexer/metadata_dictionary/nuget.py
27

you can't use SingleFileIntrinsicMapping, so you have to reimplement its file-detection method

38

if xmltodict.parse(content.strip(b" \n ")) returns a dict without the package key, then .get("package") returns None, so you call None.get("metadata", {}) which doesn't exist.

swh/indexer/tests/metadata_dictionary/test_nuget.py
12

https://developer.mozilla.org/en-US/docs/Web/SVG/Namespaces_Crash_Course

You replaced the namespace used by nuspec with http://package.example.org/, which changes the meaning of every tag in the file.

Corrects namespace used by nuspec

Build has FAILED

Patch application report for D8144 (id=29520)

Rebasing onto f31f1f5b34...

First, rewinding head to replay your work on top of it...
Applying: metadata_dictionary: Add mappings for .nuspec
Changes applied before test
commit 1c878121a629cf7c38b15ee106d90ad7a8ddc3a8
Author: Satvik Vemuganti <vemugantisesha@iitbhilai.ac.in>
Date:   Wed Jul 20 10:52:34 2022 +0530

    metadata_dictionary: Add mappings for .nuspec

Link to build: https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/394/
See console output for more information: https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/394/console

Reference dict using square brackets (<dict_name>["<key>"]) instead of <dict_name>.get("<key>") where "<key>" is always in <dict_name>

Build has FAILED

Patch application report for D8144 (id=29523)

Rebasing onto f31f1f5b34...

First, rewinding head to replay your work on top of it...
Applying: metadata_dictionary: Add mappings for .nuspec
Changes applied before test
commit 68f8f6ba422b129aae030ad761cfb33d4c1d9426
Author: Satvik Vemuganti <vemugantisesha@iitbhilai.ac.in>
Date:   Wed Jul 20 10:52:34 2022 +0530

    metadata_dictionary: Add mappings for .nuspec

Link to build: https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/395/
See console output for more information: https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/395/console