Fixes T4376
Details
- Reviewers
vlorentz - Maniphest Tasks
- T4376: Metadata Indexer for Pub (pubspec.yaml)
- Commits
- rDCIDXfa67b73d6a88: metadata_dictionary: Add mappings for pubspec.yaml
Write a test case for the basic pub mapping
Test edge cases with examples (maybe from sourcegraph)
Diff Detail
- Repository
- rDCIDX Metadata indexer
- Branch
- master
- Lint
No Linters Available - Unit
No Unit Test Coverage - Build Status
Buildable 30411 Build 47539: Phabricator diff pipeline on jenkins Jenkins console · Jenkins Build 47538: arc lint + arc unit
Event Timeline
Build is green
Patch application report for D8079 (id=29228)
Rebasing onto c2742b5b75...
First, rewinding head to replay your work on top of it... Applying: metadata_dictionary: Add mappings for pubspec.yaml
Changes applied before test
commit bb56919c89c6953421789ba72bda6b4f61610adf Author: Satvik Vemuganti <vemugantisesha@iitbhilai.ac.in> Date: Tue Jul 5 14:33:19 2022 +0530 metadata_dictionary: Add mappings for pubspec.yaml
See https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/350/ for more details.
Made the requested changes. The 2 methods that are in PubMapping are also deduplicated from CffMapping. However it felt intuitive to leave them that way (and not include them in YamlMapping).
Am I wrong? I'll change it if so.
Build is green
Patch application report for D8079 (id=29230)
Rebasing onto c2742b5b75...
First, rewinding head to replay your work on top of it... Applying: metadata_dictionary: Add mappings for pubspec.yaml Using index info to reconstruct a base tree... M swh/indexer/metadata_dictionary/cff.py Falling back to patching base and 3-way merge... Auto-merging swh/indexer/metadata_dictionary/cff.py CONFLICT (content): Merge conflict in swh/indexer/metadata_dictionary/cff.py Patch failed at 0001 metadata_dictionary: Add mappings for pubspec.yaml Resolve all conflicts manually, mark them as resolved with "git add/rm <conflicted_files>", then run "git rebase --continue". You can instead skip this commit: run "git rebase --skip". To abort and get back to the state before "git rebase", run "git rebase --abort".
Rebase failed (ret=1)!
Could not rebase; Attempt merge onto c2742b5b75...
Already up to date.
Changes applied before test
commit 94f4a5f6cb5aa2e95f7abdf9c51712ee31cbbae8 Author: Satvik Vemuganti <vemugantisesha@iitbhilai.ac.in> Date: Tue Jul 5 14:33:19 2022 +0530 metadata_dictionary: Add mappings for pubspec.yaml
See https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/351/ for more details.
Thanks. Some more comments now that we have the basics covered:
Hmm, I wonder what we could translate the pubspec documentation key to. codemeta:readme and codemeta:buildInstructions both feel slightly off, and schema:documentation seems to be only for web APIs for some reason...
Also, could you rename the mapping to call this "pubspec"? "pub" alone is rather ambiguous. And probably rename the file to "dart.py", as file names tend to describe communities rather than formats (hypothetically, we would have all three mappings for PKG-INFO/setup.cfg/pyproject.toml in python.py for example)
swh/indexer/data/pub.csv | ||
---|---|---|
4 ↗ | (On Diff #29230) | could translate this from the platforms field |
21 ↗ | (On Diff #29230) | didn't you say there is also an authors field? |
34 ↗ | (On Diff #29230) | I this field documented? I don't see it on https://dart.dev/tools/pub/pubspec |
swh/indexer/metadata_dictionary/cff.py | ||
32–33 | please rebase before submitting diffs; this code was simplified; so you can simplify the new base YamlMapping similarly | |
swh/indexer/metadata_dictionary/pub.py | ||
1 ↗ | (On Diff #29230) | only 2022 as far as I can tell |
swh/indexer/data/pub.csv | ||
---|---|---|
21 ↗ | (On Diff #29230) | Yeah, but both author and authors can be mapped to author. |
34 ↗ | (On Diff #29230) | You're right, my bad. |
swh/indexer/metadata_dictionary/pub.py | ||
1 ↗ | (On Diff #29230) | Sorry, I didn't understand what you mean. |
swh/indexer/data/pub.csv | ||
---|---|---|
34 ↗ | (On Diff #29230) | probably because some project boilerplate generator writes it and/or people copy this from each other without checking the documentation |
swh/indexer/metadata_dictionary/pub.py | ||
1 ↗ | (On Diff #29230) | I meant replacing "2017-2022" with only "2022" (and in test_pub.py too) |
Build is green
Patch application report for D8079 (id=29232)
Rebasing onto c2742b5b75...
Current branch diff-target is up to date.
Changes applied before test
commit 68170dda98fda949d2988743838140680afe6789 Author: Satvik Vemuganti <vemugantisesha@iitbhilai.ac.in> Date: Tue Jul 5 14:33:19 2022 +0530 metadata_dictionary: Add mappings for pubspec.yaml
See https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/352/ for more details.
swh/indexer/data/pub.csv | ||
---|---|---|
34 ↗ | (On Diff #29230) | anyway, it's worth adding given the popularity. Please check people only use it with SPDX id though; and document it in dart.py (a one-line comment is probably enough) |
Build has FAILED
Patch application report for D8079 (id=29233)
Rebasing onto c2742b5b75...
Current branch diff-target is up to date.
Changes applied before test
commit efaf1d810d0a3d9aff5ee1e534ee64af9146bd93 Author: Satvik Vemuganti <vemugantisesha@iitbhilai.ac.in> Date: Tue Jul 5 14:33:19 2022 +0530 metadata_dictionary: Add mappings for pubspec.yaml
Link to build: https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/353/
See console output for more information: https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/353/console
swh/indexer/data/pub.csv | ||
---|---|---|
34 ↗ | (On Diff #29230) | Would you like that I import the table from here: https://github.com/spdx/license-list-data/blob/master/licenses.md |
Build is green
Patch application report for D8079 (id=29236)
Rebasing onto c2742b5b75...
Current branch diff-target is up to date.
Changes applied before test
commit 6ac7d51386b6f64b2b4972e10d4441525fcbb2b6 Author: Satvik Vemuganti <vemugantisesha@iitbhilai.ac.in> Date: Tue Jul 5 14:33:19 2022 +0530 metadata_dictionary: Add mappings for pubspec.yaml
See https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/354/ for more details.
swh/indexer/data/pub.csv | ||
---|---|---|
34 ↗ | (On Diff #29230) | No, this table changes too frequently. I meant you should check that people use SPDX ids in practice |
swh/indexer/data/pub.csv | ||
---|---|---|
34 ↗ | (On Diff #29230) | Just got another idea. Even if there isn't a license field, the first line of the license file gives the type. Can I use that or is it diverging a lot from using mappings to translate metadata? |
swh/indexer/data/pub.csv | ||
---|---|---|
34 ↗ | (On Diff #29230) | it's actually more complicated than this, but possible with tools like fossology. It will need to be a separate mapping, though. |
Build is green
Patch application report for D8079 (id=29265)
Rebasing onto c2742b5b75...
Current branch diff-target is up to date.
Changes applied before test
commit 381cf1d53ab595fac7483b5621829397f91696ce Author: Satvik Vemuganti <vemugantisesha@iitbhilai.ac.in> Date: Tue Jul 5 14:33:19 2022 +0530 metadata_dictionary: Add mappings for pubspec.yaml
See https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/355/ for more details.
Could you add a test with both author and authors?
swh/indexer/metadata_dictionary/dart.py | ||
---|---|---|
53–59 | str.rsplit takes a maxsplit argument, so you can simplify your code like above. However, neither variant supports authors hiding their email address, and you should either add support for that, or make sure this never actually happens. (To implement it without making the code too convoluted, you can use this regexp: (?P<name>.*?)( <(?P<email>.*)>)?) | |
swh/indexer/tests/metadata_dictionary/test_pubspec.py | ||
1 ↗ | (On Diff #29265) | you should rename this file to match the other one |
79 ↗ | (On Diff #29265) | avoid real domain names in tests; use those defined in RFC6761 instead, such as "example.org" |
89–90 ↗ | (On Diff #29265) | same here |
Changes test filename to test_dart.py
Checks for presence of email while normalizing author
Build is green
Patch application report for D8079 (id=29298)
Rebasing onto 2dd2be9c1b...
First, rewinding head to replay your work on top of it... Applying: metadata_dictionary: Add mappings for pubspec.yaml Using index info to reconstruct a base tree... M swh/indexer/metadata_dictionary/__init__.py M swh/indexer/metadata_dictionary/base.py M swh/indexer/metadata_dictionary/cff.py Falling back to patching base and 3-way merge... Auto-merging swh/indexer/metadata_dictionary/cff.py CONFLICT (content): Merge conflict in swh/indexer/metadata_dictionary/cff.py Auto-merging swh/indexer/metadata_dictionary/base.py Auto-merging swh/indexer/metadata_dictionary/__init__.py CONFLICT (content): Merge conflict in swh/indexer/metadata_dictionary/__init__.py Patch failed at 0001 metadata_dictionary: Add mappings for pubspec.yaml Resolve all conflicts manually, mark them as resolved with "git add/rm <conflicted_files>", then run "git rebase --continue". You can instead skip this commit: run "git rebase --skip". To abort and get back to the state before "git rebase", run "git rebase --abort".
Rebase failed (ret=1)!
Could not rebase; Attempt merge onto 2dd2be9c1b...
Already up to date.
Changes applied before test
commit 8e800e2f2ea7568b43a8ab2cfac58a329cb47c2f Author: Satvik Vemuganti <vemugantisesha@iitbhilai.ac.in> Date: Tue Jul 5 14:33:19 2022 +0530 metadata_dictionary: Add mappings for pubspec.yaml
See https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/357/ for more details.
swh/indexer/metadata_dictionary/dart.py | ||
---|---|---|
53–59 | Checking for "@" in the string is like a check if there is an email address in the string. |
don't forget this:
swh/indexer/metadata_dictionary/dart.py | ||
---|---|---|
53–59 | It is not guaranteed. These files are untrusted input, they need to be fully validated to prevent crashes. |
I think either of the fields will be present but not both.
Not sure if this query is accurate but it gives 0 results: https://sourcegraph.com/search?q=context:global+author+authors+file:pubspec.yaml&patternType=literal
https://sourcegraph.com/search?q=context:global+author+version+file:pubspec.yaml&patternType=literal also returns no result, because it searches for the literal string author version (or author authors in your example), see https://docs.sourcegraph.com/code_search/reference/queries#literal-search-default
So this is not conclusive
- Verifies author value with regex
- Adds test for files containing both 'author' and 'authors' fields
Build is green
Patch application report for D8079 (id=29326)
Rebasing onto 2dd2be9c1b...
First, rewinding head to replay your work on top of it... Applying: metadata_dictionary: Add mappings for pubspec.yaml Using index info to reconstruct a base tree... M swh/indexer/metadata_dictionary/__init__.py M swh/indexer/metadata_dictionary/base.py M swh/indexer/metadata_dictionary/cff.py Falling back to patching base and 3-way merge... Auto-merging swh/indexer/metadata_dictionary/cff.py CONFLICT (content): Merge conflict in swh/indexer/metadata_dictionary/cff.py Auto-merging swh/indexer/metadata_dictionary/base.py Auto-merging swh/indexer/metadata_dictionary/__init__.py CONFLICT (content): Merge conflict in swh/indexer/metadata_dictionary/__init__.py Patch failed at 0001 metadata_dictionary: Add mappings for pubspec.yaml Resolve all conflicts manually, mark them as resolved with "git add/rm <conflicted_files>", then run "git rebase --continue". You can instead skip this commit: run "git rebase --skip". To abort and get back to the state before "git rebase", run "git rebase --abort".
Rebase failed (ret=1)!
Could not rebase; Attempt merge onto 2dd2be9c1b...
Already up to date.
Changes applied before test
commit 3f72e93f35d9f8de093e1d9af9a46e38b42a52b9 Author: Satvik Vemuganti <vemugantisesha@iitbhilai.ac.in> Date: Tue Jul 5 14:33:19 2022 +0530 metadata_dictionary: Add mappings for pubspec.yaml
See https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/358/ for more details.
swh/indexer/metadata_dictionary/dart.py | ||
---|---|---|
56–57 | use the regexp's capture groups, instead of parsing twice |
Build is green
Patch application report for D8079 (id=29333)
Rebasing onto 2dd2be9c1b...
First, rewinding head to replay your work on top of it... Applying: metadata_dictionary: Add mappings for pubspec.yaml Using index info to reconstruct a base tree... M swh/indexer/metadata_dictionary/__init__.py M swh/indexer/metadata_dictionary/base.py M swh/indexer/metadata_dictionary/cff.py Falling back to patching base and 3-way merge... Auto-merging swh/indexer/metadata_dictionary/cff.py CONFLICT (content): Merge conflict in swh/indexer/metadata_dictionary/cff.py Auto-merging swh/indexer/metadata_dictionary/base.py Auto-merging swh/indexer/metadata_dictionary/__init__.py CONFLICT (content): Merge conflict in swh/indexer/metadata_dictionary/__init__.py Patch failed at 0001 metadata_dictionary: Add mappings for pubspec.yaml Resolve all conflicts manually, mark them as resolved with "git add/rm <conflicted_files>", then run "git rebase --continue". You can instead skip this commit: run "git rebase --skip". To abort and get back to the state before "git rebase", run "git rebase --abort".
Rebase failed (ret=1)!
Could not rebase; Attempt merge onto 2dd2be9c1b...
Already up to date.
Changes applied before test
commit ffa4e5af7fdb1bd4f43d1ebb36ae6c48aa74ac5b Author: Satvik Vemuganti <vemugantisesha@iitbhilai.ac.in> Date: Tue Jul 12 22:25:05 2022 +0530 Capture groups commit 3f72e93f35d9f8de093e1d9af9a46e38b42a52b9 Author: Satvik Vemuganti <vemugantisesha@iitbhilai.ac.in> Date: Tue Jul 5 14:33:19 2022 +0530 metadata_dictionary: Add mappings for pubspec.yaml
See https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/359/ for more details.
thanks. one last thing:
swh/indexer/metadata_dictionary/dart.py | ||
---|---|---|
54 | redundant, None is not an instance of str |
Build is green
Patch application report for D8079 (id=29334)
Rebasing onto 2dd2be9c1b...
Current branch diff-target is up to date.
Changes applied before test
commit fa67b73d6a88461e76601c3b61ee6d33ba5a2c93 Author: Satvik Vemuganti <vemugantisesha@iitbhilai.ac.in> Date: Tue Jul 5 14:33:19 2022 +0530 metadata_dictionary: Add mappings for pubspec.yaml
See https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/360/ for more details.