Closes T4536.
Details
Details
- Reviewers
- None
- Group Reviewers
Reviewers - Maniphest Tasks
- T4536: Document how swh-indexer uses Codemeta crosswalks
- Commits
- rDCIDXcdbf090b14b1: Make read_crosstable public and document it.
Diff Detail
Diff Detail
- Repository
- rDCIDX Metadata indexer
- Lint
No Linters Available - Unit
No Unit Test Coverage - Build Status
Buildable 31833 Build 49819: Phabricator diff pipeline on jenkins Jenkins console · Jenkins Build 49818: arc lint + arc unit
Event Timeline
Comment Actions
Build has FAILED
Patch application report for D8549 (id=30819)
Could not rebase; Attempt merge onto e25a2f4e4a...
Merge made by the 'recursive' strategy. docs/metadata-workflow.rst | 6 +- swh/indexer/codemeta.py | 18 ++- swh/indexer/data/Gitea.csv | 76 +++++++++++ swh/indexer/metadata_dictionary/__init__.py | 15 ++- swh/indexer/metadata_dictionary/base.py | 108 ++++++++++------ swh/indexer/metadata_dictionary/cff.py | 5 +- swh/indexer/metadata_dictionary/composer.py | 4 +- swh/indexer/metadata_dictionary/dart.py | 4 +- swh/indexer/metadata_dictionary/gitea.py | 124 ++++++++++++++++++ swh/indexer/metadata_dictionary/github.py | 19 ++- swh/indexer/metadata_dictionary/nuget.py | 4 +- .../tests/metadata_dictionary/test_gitea.py | 143 +++++++++++++++++++++ .../tests/metadata_dictionary/test_github.py | 10 +- swh/indexer/tests/metadata_dictionary/test_npm.py | 14 ++ swh/indexer/tests/test_cli.py | 2 + swh/indexer/tests/test_metadata.py | 3 +- 16 files changed, 495 insertions(+), 60 deletions(-) create mode 100644 swh/indexer/data/Gitea.csv create mode 100644 swh/indexer/metadata_dictionary/gitea.py create mode 100644 swh/indexer/tests/metadata_dictionary/test_gitea.py
Changes applied before test
commit daf82fe3b882631386d7ecfdae389216efcf4a29
Merge: e25a2f4 db48a6c
Author: Jenkins user <jenkins@localhost>
Date: Tue Sep 27 12:14:33 2022 +0000
Merge branch 'diff-target' into HEAD
commit db48a6c6cb478f625068f7d7fb2bdb747da6c711
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Tue Sep 27 14:13:56 2022 +0200
Make crosswalk_table public and document it.
commit b57c99dd89850dbe610669864a8ee003ef37bbc4
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Thu Sep 15 08:52:00 2022 +0200
npm: Add test for 'author' value that used to crash
It was only fixed as a side-effect of other changes, but it's good
to have a regression test
commit 9d7a6a47e157d443849dc749765ecb010ba856c2
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Tue Sep 13 17:06:08 2022 +0200
github and gitea: Use html_url as @id and clone_url as codeRepository
They are closer semantics as 'html_url' is the main page of the repository,
so it is the best to identify it; and 'clone_url' is the URL that should
be given to 'git clone', as documented by https://schema.org/codeRepository
Additionally, that property was missing so far; but a future commit will
need to use it to identify fork relationships (node ids are required to
representation relationships between documents as we cannot use blank
nodes for that)
commit 9f6b75cad02745311f3d29a564b3db2d5b756af7
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Tue Sep 13 13:30:54 2022 +0200
Add Gitea metadata mapping
commit 3a3a348bd86e714ab016a93617bc197010ee145d
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Tue Sep 13 12:34:22 2022 +0200
GitHub: use correct JSON-LD types for URLs and datesLink to build: https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/502/
See console output for more information: https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/502/console
Comment Actions
Build is green
Patch application report for D8549 (id=30836)
Rebasing onto e25a2f4e4a...
First, rewinding head to replay your work on top of it... Applying: GitHub: use correct JSON-LD types for URLs and dates Applying: Add Gitea metadata mapping Applying: github and gitea: Use html_url as @id and clone_url as codeRepository Applying: npm: Add test for 'author' value that used to crash Applying: Make crosswalk_table public and document it.
Changes applied before test
commit c4d2052d69587cf044243ec67bae6180bb4316ff
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Tue Sep 27 14:13:56 2022 +0200
Make crosswalk_table public and document it.
commit e2328e80a2125a57f7f471469d10ec17a17ed3c2
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Thu Sep 15 08:52:00 2022 +0200
npm: Add test for 'author' value that used to crash
It was only fixed as a side-effect of other changes, but it's good
to have a regression test
commit 66081ea913f508e996294e4349ad0b926bce5de6
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Tue Sep 13 17:06:08 2022 +0200
github and gitea: Use html_url as @id and clone_url as codeRepository
They are closer semantics as 'html_url' is the main page of the repository,
so it is the best to identify it; and 'clone_url' is the URL that should
be given to 'git clone', as documented by https://schema.org/codeRepository
Additionally, that property was missing so far; but a future commit will
need to use it to identify fork relationships (node ids are required to
representation relationships between documents as we cannot use blank
nodes for that)
commit 9f5f97b4da3322417db7dd1637068affcfa8874d
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Tue Sep 13 13:30:54 2022 +0200
Add Gitea metadata mapping
commit aae740de8c6d4913d7ab951fa9e582599b0be3d7
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Tue Sep 13 12:34:22 2022 +0200
GitHub: use correct JSON-LD types for URLs and datesSee https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/503/ for more details.
Comment Actions
Build is green
Patch application report for D8549 (id=30842)
Could not rebase; Attempt merge onto e25a2f4e4a...
Merge made by the 'recursive' strategy. docs/metadata-workflow.rst | 6 +- swh/indexer/codemeta.py | 18 ++- swh/indexer/data/Gitea.csv | 68 ++++++++++ swh/indexer/metadata_dictionary/__init__.py | 15 ++- swh/indexer/metadata_dictionary/base.py | 108 ++++++++++------ swh/indexer/metadata_dictionary/cff.py | 5 +- swh/indexer/metadata_dictionary/composer.py | 4 +- swh/indexer/metadata_dictionary/dart.py | 4 +- swh/indexer/metadata_dictionary/gitea.py | 124 ++++++++++++++++++ swh/indexer/metadata_dictionary/github.py | 19 ++- swh/indexer/metadata_dictionary/nuget.py | 4 +- .../tests/metadata_dictionary/test_gitea.py | 143 +++++++++++++++++++++ .../tests/metadata_dictionary/test_github.py | 10 +- swh/indexer/tests/metadata_dictionary/test_npm.py | 14 ++ swh/indexer/tests/test_cli.py | 2 + swh/indexer/tests/test_metadata.py | 3 +- 16 files changed, 487 insertions(+), 60 deletions(-) create mode 100644 swh/indexer/data/Gitea.csv create mode 100644 swh/indexer/metadata_dictionary/gitea.py create mode 100644 swh/indexer/tests/metadata_dictionary/test_gitea.py
Changes applied before test
commit 685e591fb151b98e72354d01fd129ff58411e93d
Merge: e25a2f4 0b0f5f4
Author: Jenkins user <jenkins@localhost>
Date: Tue Sep 27 13:20:37 2022 +0000
Merge branch 'diff-target' into HEAD
commit 0b0f5f42e95a443c58bb2156937be3848bfe6ee2
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Tue Sep 27 14:13:56 2022 +0200
Make read_crosstable public and document it.
commit b57c99dd89850dbe610669864a8ee003ef37bbc4
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Thu Sep 15 08:52:00 2022 +0200
npm: Add test for 'author' value that used to crash
It was only fixed as a side-effect of other changes, but it's good
to have a regression test
commit 9d7a6a47e157d443849dc749765ecb010ba856c2
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Tue Sep 13 17:06:08 2022 +0200
github and gitea: Use html_url as @id and clone_url as codeRepository
They are closer semantics as 'html_url' is the main page of the repository,
so it is the best to identify it; and 'clone_url' is the URL that should
be given to 'git clone', as documented by https://schema.org/codeRepository
Additionally, that property was missing so far; but a future commit will
need to use it to identify fork relationships (node ids are required to
representation relationships between documents as we cannot use blank
nodes for that)
commit 9f6b75cad02745311f3d29a564b3db2d5b756af7
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Tue Sep 13 13:30:54 2022 +0200
Add Gitea metadata mapping
commit 3a3a348bd86e714ab016a93617bc197010ee145d
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Tue Sep 13 12:34:22 2022 +0200
GitHub: use correct JSON-LD types for URLs and datesSee https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/504/ for more details.
Comment Actions
Build is green
Patch application report for D8549 (id=30860)
Could not rebase; Attempt merge onto e25a2f4e4a...
Updating e25a2f4..cdbf090 Fast-forward docs/metadata-workflow.rst | 6 +- swh/indexer/codemeta.py | 18 ++- swh/indexer/data/Gitea.csv | 68 ++++++++++ swh/indexer/metadata_dictionary/__init__.py | 15 ++- swh/indexer/metadata_dictionary/base.py | 108 ++++++++++------ swh/indexer/metadata_dictionary/cff.py | 5 +- swh/indexer/metadata_dictionary/composer.py | 4 +- swh/indexer/metadata_dictionary/dart.py | 4 +- swh/indexer/metadata_dictionary/gitea.py | 124 ++++++++++++++++++ swh/indexer/metadata_dictionary/github.py | 19 ++- swh/indexer/metadata_dictionary/nuget.py | 4 +- .../tests/metadata_dictionary/test_gitea.py | 143 +++++++++++++++++++++ .../tests/metadata_dictionary/test_github.py | 10 +- swh/indexer/tests/metadata_dictionary/test_npm.py | 14 ++ swh/indexer/tests/test_cli.py | 2 + swh/indexer/tests/test_metadata.py | 3 +- 16 files changed, 487 insertions(+), 60 deletions(-) create mode 100644 swh/indexer/data/Gitea.csv create mode 100644 swh/indexer/metadata_dictionary/gitea.py create mode 100644 swh/indexer/tests/metadata_dictionary/test_gitea.py
Changes applied before test
commit cdbf090b14b1db24b0dfb1b3cfac01fb0dbdbd4a
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Tue Sep 27 14:13:56 2022 +0200
Make read_crosstable public and document it.
commit 9b741f2f9f336c2657a1d20196139daac3fe69b1
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Thu Sep 15 08:52:00 2022 +0200
npm: Add test for 'author' value that used to crash
It was only fixed as a side-effect of other changes, but it's good
to have a regression test
commit ac0e263bbfc17ee2905b97bbbbbb4929419170cd
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Tue Sep 13 17:06:08 2022 +0200
github and gitea: Use html_url as @id and clone_url as codeRepository
They are closer semantics as 'html_url' is the main page of the repository,
so it is the best to identify it; and 'clone_url' is the URL that should
be given to 'git clone', as documented by https://schema.org/codeRepository
Additionally, that property was missing so far; but a future commit will
need to use it to identify fork relationships (node ids are required to
representation relationships between documents as we cannot use blank
nodes for that)
commit cb435e59ca91ac7b71cff18e5e6b3885e5be9ac1
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Tue Sep 13 13:30:54 2022 +0200
Add Gitea metadata mapping
commit 20becf4a90fa6b626e972bba3d57db46604cb7b2
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Tue Sep 13 12:34:22 2022 +0200
GitHub: use correct JSON-LD types for URLs and datesSee https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/509/ for more details.