Page MenuHomeSoftware Heritage

Add R DESCRIPTION indexer
Needs RevisionPublic

Authored by aastha1999 on Apr 5 2021, 12:09 AM.

Details

Reviewers
vlorentz
Group Reviewers
Reviewers
Summary

Add R metadata indexer

Diff Detail

Repository
rDCIDX Metadata indexer
Branch
add-r-indexer
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 21508
Build 33416: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 33415: arc lint + arc unit

Event Timeline

Build has FAILED

Patch application report for D5417 (id=19371)

Rebasing onto 8f1fb0f931...

Current branch diff-target is up to date.
Changes applied before test
commit fecd016150e4a3fb6b4c335386f32c07251ac4b8
Author: aastha1999 <asthana.aastha1999@gmail.com>
Date:   Sat Apr 3 21:13:53 2021 +0000

    Add R DESCRIPTION indexer

Link to build: https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/168/
See console output for more information: https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/168/console

Harbormaster returned this revision to the author for changes because remote builds failed.Apr 5 2021, 12:12 AM
Harbormaster failed remote builds in B20428: Diff 19371!

Hi @vlorentz. This is my first attempt at creating an R metadata indexer. I still need to work more on normalization. Also, I couldn't find a translation for fields such as "Imports", "Collate" etc. in codemeta.

  • Add python debian in requirements

Build has FAILED

Patch application report for D5417 (id=19375)

Rebasing onto 8f1fb0f931...

Current branch diff-target is up to date.
Changes applied before test
commit b32269e8928c78571d21519abf8fa1eb90e2c427
Author: aastha1999 <asthana.aastha1999@gmail.com>
Date:   Mon Apr 5 19:58:46 2021 +0000

    Add python-debian in requirements

commit fecd016150e4a3fb6b4c335386f32c07251ac4b8
Author: aastha1999 <asthana.aastha1999@gmail.com>
Date:   Sat Apr 3 21:13:53 2021 +0000

    Add R DESCRIPTION indexer

Link to build: https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/169/
See console output for more information: https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/169/console

Harbormaster returned this revision to the author for changes because remote builds failed.Apr 5 2021, 10:08 PM
Harbormaster failed remote builds in B20432: Diff 19375!

Build has FAILED

Patch application report for D5417 (id=19376)

Rebasing onto 8f1fb0f931...

Current branch diff-target is up to date.
Changes applied before test
commit 15af34bef18566a5d6d7a9a676e784f50deea9ce
Author: aastha1999 <asthana.aastha1999@gmail.com>
Date:   Mon Apr 5 20:13:16 2021 +0000

    Updating D5417

commit b32269e8928c78571d21519abf8fa1eb90e2c427
Author: aastha1999 <asthana.aastha1999@gmail.com>
Date:   Mon Apr 5 19:58:46 2021 +0000

    Add python-debian in requirements

commit fecd016150e4a3fb6b4c335386f32c07251ac4b8
Author: aastha1999 <asthana.aastha1999@gmail.com>
Date:   Sat Apr 3 21:13:53 2021 +0000

    Add R DESCRIPTION indexer

Link to build: https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/170/
See console output for more information: https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/170/console

Harbormaster returned this revision to the author for changes because remote builds failed.Apr 5 2021, 10:18 PM
Harbormaster failed remote builds in B20433: Diff 19376!

Also, I couldn't find a translation for fields such as "Imports", "Collate" etc. in codemeta.

It's ok, we don't have to translate everything.

Updating D5417: Add R DESCRIPTION indexer

Build is green

Patch application report for D5417 (id=20529)

Rebasing onto 8fd4846af5...

First, rewinding head to replay your work on top of it...
Fast-forwarded diff-target to base-revision-175-D5417.
Changes applied before test

See https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/175/ for more details.

Updating D5417: Add R DESCRIPTION indexer

Build is green

Patch application report for D5417 (id=20530)

Could not rebase; Attempt merge onto 8fd4846af5...

Updating 8fd4846..64936d0
Fast-forward
 requirements.txt                            |  1 +
 swh/indexer/metadata_dictionary/R.py        | 48 +++++++++++++++
 swh/indexer/metadata_dictionary/__init__.py |  3 +-
 swh/indexer/storage/__init__.py             | 10 +++-
 swh/indexer/tests/storage/test_storage.py   |  1 +
 swh/indexer/tests/test_cli.py               |  1 +
 swh/indexer/tests/test_metadata.py          | 92 +++++++++++++++++++++++++++++
 7 files changed, 154 insertions(+), 2 deletions(-)
 create mode 100644 swh/indexer/metadata_dictionary/R.py
Changes applied before test
commit 64936d037ede2d47fd2dd52564328b54dbfb8532
Author: aastha1999 <asthana.aastha1999@gmail.com>
Date:   Mon Apr 5 20:13:16 2021 +0000

    Updating D5417

commit 3ce4e0407a2f852996208a5674866bc094032990
Author: aastha1999 <asthana.aastha1999@gmail.com>
Date:   Mon Apr 5 19:58:46 2021 +0000

    Add python-debian in requirements

commit 5f4064de8a62d6e2f277b5ecb18a39aa873c089e
Author: aastha1999 <asthana.aastha1999@gmail.com>
Date:   Sat Apr 3 21:13:53 2021 +0000

    Add R DESCRIPTION indexer

See https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/176/ for more details.

Updating D5417: Add R DESCRIPTION indexer

Build is green

Patch application report for D5417 (id=20531)

Rebasing onto 8fd4846af5...

Current branch diff-target is up to date.
Changes applied before test
commit 64936d037ede2d47fd2dd52564328b54dbfb8532
Author: aastha1999 <asthana.aastha1999@gmail.com>
Date:   Mon Apr 5 20:13:16 2021 +0000

    Updating D5417

commit 3ce4e0407a2f852996208a5674866bc094032990
Author: aastha1999 <asthana.aastha1999@gmail.com>
Date:   Mon Apr 5 19:58:46 2021 +0000

    Add python-debian in requirements

commit 5f4064de8a62d6e2f277b5ecb18a39aa873c089e
Author: aastha1999 <asthana.aastha1999@gmail.com>
Date:   Sat Apr 3 21:13:53 2021 +0000

    Add R DESCRIPTION indexer

See https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/177/ for more details.

Thanks!


The schema:license field should be either:

  1. an URI
  2. a Creative Work object, or
  3. an array of 1 and 2

URIs can be obtained from license names like this: https://forge.softwareheritage.org/source/swh-indexer/browse/master/swh/indexer/metadata_dictionary/npm.py$136-143

Unfortunately, it will be harder for the + file LICENSE part... I don't see a good solution for this, the input data is just unusable :/

Can you look for other examples, to see what license field other packages use, and list them here?


A few comments below, they should be easy to fix:

swh/indexer/metadata_dictionary/R.py
17

Please fill this

19

Hmm, that's not a very clear name without context. What about "r-description"?

41–45

I don't see these in the expected output in the test. Not need to parse them if they are not translatable.

swh/indexer/tests/test_metadata.py
247

Hmm, this should be automatically converted from "schema:url" to "url". I'll look into the issue.

This revision now requires changes to proceed.May 17 2021, 10:33 AM