Closes T4536.
Details
Details
- Reviewers
- None
- Group Reviewers
Reviewers - Maniphest Tasks
- T4536: Document how swh-indexer uses Codemeta crosswalks
- Commits
- rDCIDXcdbf090b14b1: Make read_crosstable public and document it.
Diff Detail
Diff Detail
- Repository
- rDCIDX Metadata indexer
- Lint
Automatic diff as part of commit; lint not applicable. - Unit
Automatic diff as part of commit; unit tests not applicable.
Event Timeline
Comment Actions
Build has FAILED
Patch application report for D8549 (id=30819)
Could not rebase; Attempt merge onto e25a2f4e4a...
Merge made by the 'recursive' strategy. docs/metadata-workflow.rst | 6 +- swh/indexer/codemeta.py | 18 ++- swh/indexer/data/Gitea.csv | 76 +++++++++++ swh/indexer/metadata_dictionary/__init__.py | 15 ++- swh/indexer/metadata_dictionary/base.py | 108 ++++++++++------ swh/indexer/metadata_dictionary/cff.py | 5 +- swh/indexer/metadata_dictionary/composer.py | 4 +- swh/indexer/metadata_dictionary/dart.py | 4 +- swh/indexer/metadata_dictionary/gitea.py | 124 ++++++++++++++++++ swh/indexer/metadata_dictionary/github.py | 19 ++- swh/indexer/metadata_dictionary/nuget.py | 4 +- .../tests/metadata_dictionary/test_gitea.py | 143 +++++++++++++++++++++ .../tests/metadata_dictionary/test_github.py | 10 +- swh/indexer/tests/metadata_dictionary/test_npm.py | 14 ++ swh/indexer/tests/test_cli.py | 2 + swh/indexer/tests/test_metadata.py | 3 +- 16 files changed, 495 insertions(+), 60 deletions(-) create mode 100644 swh/indexer/data/Gitea.csv create mode 100644 swh/indexer/metadata_dictionary/gitea.py create mode 100644 swh/indexer/tests/metadata_dictionary/test_gitea.py
Changes applied before test
commit daf82fe3b882631386d7ecfdae389216efcf4a29 Merge: e25a2f4 db48a6c Author: Jenkins user <jenkins@localhost> Date: Tue Sep 27 12:14:33 2022 +0000 Merge branch 'diff-target' into HEAD commit db48a6c6cb478f625068f7d7fb2bdb747da6c711 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Sep 27 14:13:56 2022 +0200 Make crosswalk_table public and document it. commit b57c99dd89850dbe610669864a8ee003ef37bbc4 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Sep 15 08:52:00 2022 +0200 npm: Add test for 'author' value that used to crash It was only fixed as a side-effect of other changes, but it's good to have a regression test commit 9d7a6a47e157d443849dc749765ecb010ba856c2 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Sep 13 17:06:08 2022 +0200 github and gitea: Use html_url as @id and clone_url as codeRepository They are closer semantics as 'html_url' is the main page of the repository, so it is the best to identify it; and 'clone_url' is the URL that should be given to 'git clone', as documented by https://schema.org/codeRepository Additionally, that property was missing so far; but a future commit will need to use it to identify fork relationships (node ids are required to representation relationships between documents as we cannot use blank nodes for that) commit 9f6b75cad02745311f3d29a564b3db2d5b756af7 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Sep 13 13:30:54 2022 +0200 Add Gitea metadata mapping commit 3a3a348bd86e714ab016a93617bc197010ee145d Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Sep 13 12:34:22 2022 +0200 GitHub: use correct JSON-LD types for URLs and dates
Link to build: https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/502/
See console output for more information: https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/502/console
Comment Actions
Build is green
Patch application report for D8549 (id=30836)
Rebasing onto e25a2f4e4a...
First, rewinding head to replay your work on top of it... Applying: GitHub: use correct JSON-LD types for URLs and dates Applying: Add Gitea metadata mapping Applying: github and gitea: Use html_url as @id and clone_url as codeRepository Applying: npm: Add test for 'author' value that used to crash Applying: Make crosswalk_table public and document it.
Changes applied before test
commit c4d2052d69587cf044243ec67bae6180bb4316ff Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Sep 27 14:13:56 2022 +0200 Make crosswalk_table public and document it. commit e2328e80a2125a57f7f471469d10ec17a17ed3c2 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Sep 15 08:52:00 2022 +0200 npm: Add test for 'author' value that used to crash It was only fixed as a side-effect of other changes, but it's good to have a regression test commit 66081ea913f508e996294e4349ad0b926bce5de6 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Sep 13 17:06:08 2022 +0200 github and gitea: Use html_url as @id and clone_url as codeRepository They are closer semantics as 'html_url' is the main page of the repository, so it is the best to identify it; and 'clone_url' is the URL that should be given to 'git clone', as documented by https://schema.org/codeRepository Additionally, that property was missing so far; but a future commit will need to use it to identify fork relationships (node ids are required to representation relationships between documents as we cannot use blank nodes for that) commit 9f5f97b4da3322417db7dd1637068affcfa8874d Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Sep 13 13:30:54 2022 +0200 Add Gitea metadata mapping commit aae740de8c6d4913d7ab951fa9e582599b0be3d7 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Sep 13 12:34:22 2022 +0200 GitHub: use correct JSON-LD types for URLs and dates
See https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/503/ for more details.
Comment Actions
Build is green
Patch application report for D8549 (id=30842)
Could not rebase; Attempt merge onto e25a2f4e4a...
Merge made by the 'recursive' strategy. docs/metadata-workflow.rst | 6 +- swh/indexer/codemeta.py | 18 ++- swh/indexer/data/Gitea.csv | 68 ++++++++++ swh/indexer/metadata_dictionary/__init__.py | 15 ++- swh/indexer/metadata_dictionary/base.py | 108 ++++++++++------ swh/indexer/metadata_dictionary/cff.py | 5 +- swh/indexer/metadata_dictionary/composer.py | 4 +- swh/indexer/metadata_dictionary/dart.py | 4 +- swh/indexer/metadata_dictionary/gitea.py | 124 ++++++++++++++++++ swh/indexer/metadata_dictionary/github.py | 19 ++- swh/indexer/metadata_dictionary/nuget.py | 4 +- .../tests/metadata_dictionary/test_gitea.py | 143 +++++++++++++++++++++ .../tests/metadata_dictionary/test_github.py | 10 +- swh/indexer/tests/metadata_dictionary/test_npm.py | 14 ++ swh/indexer/tests/test_cli.py | 2 + swh/indexer/tests/test_metadata.py | 3 +- 16 files changed, 487 insertions(+), 60 deletions(-) create mode 100644 swh/indexer/data/Gitea.csv create mode 100644 swh/indexer/metadata_dictionary/gitea.py create mode 100644 swh/indexer/tests/metadata_dictionary/test_gitea.py
Changes applied before test
commit 685e591fb151b98e72354d01fd129ff58411e93d Merge: e25a2f4 0b0f5f4 Author: Jenkins user <jenkins@localhost> Date: Tue Sep 27 13:20:37 2022 +0000 Merge branch 'diff-target' into HEAD commit 0b0f5f42e95a443c58bb2156937be3848bfe6ee2 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Sep 27 14:13:56 2022 +0200 Make read_crosstable public and document it. commit b57c99dd89850dbe610669864a8ee003ef37bbc4 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Sep 15 08:52:00 2022 +0200 npm: Add test for 'author' value that used to crash It was only fixed as a side-effect of other changes, but it's good to have a regression test commit 9d7a6a47e157d443849dc749765ecb010ba856c2 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Sep 13 17:06:08 2022 +0200 github and gitea: Use html_url as @id and clone_url as codeRepository They are closer semantics as 'html_url' is the main page of the repository, so it is the best to identify it; and 'clone_url' is the URL that should be given to 'git clone', as documented by https://schema.org/codeRepository Additionally, that property was missing so far; but a future commit will need to use it to identify fork relationships (node ids are required to representation relationships between documents as we cannot use blank nodes for that) commit 9f6b75cad02745311f3d29a564b3db2d5b756af7 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Sep 13 13:30:54 2022 +0200 Add Gitea metadata mapping commit 3a3a348bd86e714ab016a93617bc197010ee145d Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Sep 13 12:34:22 2022 +0200 GitHub: use correct JSON-LD types for URLs and dates
See https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/504/ for more details.
Comment Actions
Build is green
Patch application report for D8549 (id=30860)
Could not rebase; Attempt merge onto e25a2f4e4a...
Updating e25a2f4..cdbf090 Fast-forward docs/metadata-workflow.rst | 6 +- swh/indexer/codemeta.py | 18 ++- swh/indexer/data/Gitea.csv | 68 ++++++++++ swh/indexer/metadata_dictionary/__init__.py | 15 ++- swh/indexer/metadata_dictionary/base.py | 108 ++++++++++------ swh/indexer/metadata_dictionary/cff.py | 5 +- swh/indexer/metadata_dictionary/composer.py | 4 +- swh/indexer/metadata_dictionary/dart.py | 4 +- swh/indexer/metadata_dictionary/gitea.py | 124 ++++++++++++++++++ swh/indexer/metadata_dictionary/github.py | 19 ++- swh/indexer/metadata_dictionary/nuget.py | 4 +- .../tests/metadata_dictionary/test_gitea.py | 143 +++++++++++++++++++++ .../tests/metadata_dictionary/test_github.py | 10 +- swh/indexer/tests/metadata_dictionary/test_npm.py | 14 ++ swh/indexer/tests/test_cli.py | 2 + swh/indexer/tests/test_metadata.py | 3 +- 16 files changed, 487 insertions(+), 60 deletions(-) create mode 100644 swh/indexer/data/Gitea.csv create mode 100644 swh/indexer/metadata_dictionary/gitea.py create mode 100644 swh/indexer/tests/metadata_dictionary/test_gitea.py
Changes applied before test
commit cdbf090b14b1db24b0dfb1b3cfac01fb0dbdbd4a Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Sep 27 14:13:56 2022 +0200 Make read_crosstable public and document it. commit 9b741f2f9f336c2657a1d20196139daac3fe69b1 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Sep 15 08:52:00 2022 +0200 npm: Add test for 'author' value that used to crash It was only fixed as a side-effect of other changes, but it's good to have a regression test commit ac0e263bbfc17ee2905b97bbbbbb4929419170cd Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Sep 13 17:06:08 2022 +0200 github and gitea: Use html_url as @id and clone_url as codeRepository They are closer semantics as 'html_url' is the main page of the repository, so it is the best to identify it; and 'clone_url' is the URL that should be given to 'git clone', as documented by https://schema.org/codeRepository Additionally, that property was missing so far; but a future commit will need to use it to identify fork relationships (node ids are required to representation relationships between documents as we cannot use blank nodes for that) commit cb435e59ca91ac7b71cff18e5e6b3885e5be9ac1 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Sep 13 13:30:54 2022 +0200 Add Gitea metadata mapping commit 20becf4a90fa6b626e972bba3d57db46604cb7b2 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Sep 13 12:34:22 2022 +0200 GitHub: use correct JSON-LD types for URLs and dates
See https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/509/ for more details.