Page MenuHomeSoftware Heritage

Make read_crosstable public and document it.
ClosedPublic

Authored by vlorentz on Sep 27 2022, 2:14 PM.

Diff Detail

Repository
rDCIDX Metadata indexer
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build has FAILED

Patch application report for D8549 (id=30819)

Could not rebase; Attempt merge onto e25a2f4e4a...

Merge made by the 'recursive' strategy.
 docs/metadata-workflow.rst                         |   6 +-
 swh/indexer/codemeta.py                            |  18 ++-
 swh/indexer/data/Gitea.csv                         |  76 +++++++++++
 swh/indexer/metadata_dictionary/__init__.py        |  15 ++-
 swh/indexer/metadata_dictionary/base.py            | 108 ++++++++++------
 swh/indexer/metadata_dictionary/cff.py             |   5 +-
 swh/indexer/metadata_dictionary/composer.py        |   4 +-
 swh/indexer/metadata_dictionary/dart.py            |   4 +-
 swh/indexer/metadata_dictionary/gitea.py           | 124 ++++++++++++++++++
 swh/indexer/metadata_dictionary/github.py          |  19 ++-
 swh/indexer/metadata_dictionary/nuget.py           |   4 +-
 .../tests/metadata_dictionary/test_gitea.py        | 143 +++++++++++++++++++++
 .../tests/metadata_dictionary/test_github.py       |  10 +-
 swh/indexer/tests/metadata_dictionary/test_npm.py  |  14 ++
 swh/indexer/tests/test_cli.py                      |   2 +
 swh/indexer/tests/test_metadata.py                 |   3 +-
 16 files changed, 495 insertions(+), 60 deletions(-)
 create mode 100644 swh/indexer/data/Gitea.csv
 create mode 100644 swh/indexer/metadata_dictionary/gitea.py
 create mode 100644 swh/indexer/tests/metadata_dictionary/test_gitea.py
Changes applied before test
commit daf82fe3b882631386d7ecfdae389216efcf4a29
Merge: e25a2f4 db48a6c
Author: Jenkins user <jenkins@localhost>
Date:   Tue Sep 27 12:14:33 2022 +0000

    Merge branch 'diff-target' into HEAD

commit db48a6c6cb478f625068f7d7fb2bdb747da6c711
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 27 14:13:56 2022 +0200

    Make crosswalk_table public and document it.

commit b57c99dd89850dbe610669864a8ee003ef37bbc4
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Sep 15 08:52:00 2022 +0200

    npm: Add test for 'author' value that used to crash
    
    It was only fixed as a side-effect of other changes, but it's good
    to have a regression test

commit 9d7a6a47e157d443849dc749765ecb010ba856c2
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 13 17:06:08 2022 +0200

    github and gitea: Use html_url as @id and clone_url as codeRepository
    
    They are closer semantics as 'html_url' is the main page of the repository,
    so it is the best to identify it; and 'clone_url' is the URL that should
    be given to 'git clone', as documented by https://schema.org/codeRepository
    
    Additionally, that property was missing so far; but a future commit will
    need to use it to identify fork relationships (node ids are required to
    representation relationships between documents as we cannot use blank
    nodes for that)

commit 9f6b75cad02745311f3d29a564b3db2d5b756af7
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 13 13:30:54 2022 +0200

    Add Gitea metadata mapping

commit 3a3a348bd86e714ab016a93617bc197010ee145d
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 13 12:34:22 2022 +0200

    GitHub: use correct JSON-LD types for URLs and dates

Link to build: https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/502/
See console output for more information: https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/502/console

Harbormaster returned this revision to the author for changes because remote builds failed.Sep 27 2022, 2:15 PM
Harbormaster failed remote builds in B31795: Diff 30819!
vlorentz retitled this revision from Make crosswalk_table public and document it. to Make read_crosstable public and document it..Sep 27 2022, 3:18 PM

fix commit msg + commit set + remove useless lines

Build is green

Patch application report for D8549 (id=30836)

Rebasing onto e25a2f4e4a...

First, rewinding head to replay your work on top of it...
Applying: GitHub: use correct JSON-LD types for URLs and dates
Applying: Add Gitea metadata mapping
Applying: github and gitea: Use html_url as @id and clone_url as codeRepository
Applying: npm: Add test for 'author' value that used to crash
Applying: Make crosswalk_table public and document it.
Changes applied before test
commit c4d2052d69587cf044243ec67bae6180bb4316ff
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 27 14:13:56 2022 +0200

    Make crosswalk_table public and document it.

commit e2328e80a2125a57f7f471469d10ec17a17ed3c2
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Sep 15 08:52:00 2022 +0200

    npm: Add test for 'author' value that used to crash
    
    It was only fixed as a side-effect of other changes, but it's good
    to have a regression test

commit 66081ea913f508e996294e4349ad0b926bce5de6
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 13 17:06:08 2022 +0200

    github and gitea: Use html_url as @id and clone_url as codeRepository
    
    They are closer semantics as 'html_url' is the main page of the repository,
    so it is the best to identify it; and 'clone_url' is the URL that should
    be given to 'git clone', as documented by https://schema.org/codeRepository
    
    Additionally, that property was missing so far; but a future commit will
    need to use it to identify fork relationships (node ids are required to
    representation relationships between documents as we cannot use blank
    nodes for that)

commit 9f5f97b4da3322417db7dd1637068affcfa8874d
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 13 13:30:54 2022 +0200

    Add Gitea metadata mapping

commit aae740de8c6d4913d7ab951fa9e582599b0be3d7
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 13 12:34:22 2022 +0200

    GitHub: use correct JSON-LD types for URLs and dates

See https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/503/ for more details.

Build is green

Patch application report for D8549 (id=30842)

Could not rebase; Attempt merge onto e25a2f4e4a...

Merge made by the 'recursive' strategy.
 docs/metadata-workflow.rst                         |   6 +-
 swh/indexer/codemeta.py                            |  18 ++-
 swh/indexer/data/Gitea.csv                         |  68 ++++++++++
 swh/indexer/metadata_dictionary/__init__.py        |  15 ++-
 swh/indexer/metadata_dictionary/base.py            | 108 ++++++++++------
 swh/indexer/metadata_dictionary/cff.py             |   5 +-
 swh/indexer/metadata_dictionary/composer.py        |   4 +-
 swh/indexer/metadata_dictionary/dart.py            |   4 +-
 swh/indexer/metadata_dictionary/gitea.py           | 124 ++++++++++++++++++
 swh/indexer/metadata_dictionary/github.py          |  19 ++-
 swh/indexer/metadata_dictionary/nuget.py           |   4 +-
 .../tests/metadata_dictionary/test_gitea.py        | 143 +++++++++++++++++++++
 .../tests/metadata_dictionary/test_github.py       |  10 +-
 swh/indexer/tests/metadata_dictionary/test_npm.py  |  14 ++
 swh/indexer/tests/test_cli.py                      |   2 +
 swh/indexer/tests/test_metadata.py                 |   3 +-
 16 files changed, 487 insertions(+), 60 deletions(-)
 create mode 100644 swh/indexer/data/Gitea.csv
 create mode 100644 swh/indexer/metadata_dictionary/gitea.py
 create mode 100644 swh/indexer/tests/metadata_dictionary/test_gitea.py
Changes applied before test
commit 685e591fb151b98e72354d01fd129ff58411e93d
Merge: e25a2f4 0b0f5f4
Author: Jenkins user <jenkins@localhost>
Date:   Tue Sep 27 13:20:37 2022 +0000

    Merge branch 'diff-target' into HEAD

commit 0b0f5f42e95a443c58bb2156937be3848bfe6ee2
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 27 14:13:56 2022 +0200

    Make read_crosstable public and document it.

commit b57c99dd89850dbe610669864a8ee003ef37bbc4
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Sep 15 08:52:00 2022 +0200

    npm: Add test for 'author' value that used to crash
    
    It was only fixed as a side-effect of other changes, but it's good
    to have a regression test

commit 9d7a6a47e157d443849dc749765ecb010ba856c2
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 13 17:06:08 2022 +0200

    github and gitea: Use html_url as @id and clone_url as codeRepository
    
    They are closer semantics as 'html_url' is the main page of the repository,
    so it is the best to identify it; and 'clone_url' is the URL that should
    be given to 'git clone', as documented by https://schema.org/codeRepository
    
    Additionally, that property was missing so far; but a future commit will
    need to use it to identify fork relationships (node ids are required to
    representation relationships between documents as we cannot use blank
    nodes for that)

commit 9f6b75cad02745311f3d29a564b3db2d5b756af7
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 13 13:30:54 2022 +0200

    Add Gitea metadata mapping

commit 3a3a348bd86e714ab016a93617bc197010ee145d
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 13 12:34:22 2022 +0200

    GitHub: use correct JSON-LD types for URLs and dates

See https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/504/ for more details.

Build is green

Patch application report for D8549 (id=30860)

Could not rebase; Attempt merge onto e25a2f4e4a...

Updating e25a2f4..cdbf090
Fast-forward
 docs/metadata-workflow.rst                         |   6 +-
 swh/indexer/codemeta.py                            |  18 ++-
 swh/indexer/data/Gitea.csv                         |  68 ++++++++++
 swh/indexer/metadata_dictionary/__init__.py        |  15 ++-
 swh/indexer/metadata_dictionary/base.py            | 108 ++++++++++------
 swh/indexer/metadata_dictionary/cff.py             |   5 +-
 swh/indexer/metadata_dictionary/composer.py        |   4 +-
 swh/indexer/metadata_dictionary/dart.py            |   4 +-
 swh/indexer/metadata_dictionary/gitea.py           | 124 ++++++++++++++++++
 swh/indexer/metadata_dictionary/github.py          |  19 ++-
 swh/indexer/metadata_dictionary/nuget.py           |   4 +-
 .../tests/metadata_dictionary/test_gitea.py        | 143 +++++++++++++++++++++
 .../tests/metadata_dictionary/test_github.py       |  10 +-
 swh/indexer/tests/metadata_dictionary/test_npm.py  |  14 ++
 swh/indexer/tests/test_cli.py                      |   2 +
 swh/indexer/tests/test_metadata.py                 |   3 +-
 16 files changed, 487 insertions(+), 60 deletions(-)
 create mode 100644 swh/indexer/data/Gitea.csv
 create mode 100644 swh/indexer/metadata_dictionary/gitea.py
 create mode 100644 swh/indexer/tests/metadata_dictionary/test_gitea.py
Changes applied before test
commit cdbf090b14b1db24b0dfb1b3cfac01fb0dbdbd4a
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 27 14:13:56 2022 +0200

    Make read_crosstable public and document it.

commit 9b741f2f9f336c2657a1d20196139daac3fe69b1
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Sep 15 08:52:00 2022 +0200

    npm: Add test for 'author' value that used to crash
    
    It was only fixed as a side-effect of other changes, but it's good
    to have a regression test

commit ac0e263bbfc17ee2905b97bbbbbb4929419170cd
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 13 17:06:08 2022 +0200

    github and gitea: Use html_url as @id and clone_url as codeRepository
    
    They are closer semantics as 'html_url' is the main page of the repository,
    so it is the best to identify it; and 'clone_url' is the URL that should
    be given to 'git clone', as documented by https://schema.org/codeRepository
    
    Additionally, that property was missing so far; but a future commit will
    need to use it to identify fork relationships (node ids are required to
    representation relationships between documents as we cannot use blank
    nodes for that)

commit cb435e59ca91ac7b71cff18e5e6b3885e5be9ac1
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 13 13:30:54 2022 +0200

    Add Gitea metadata mapping

commit 20becf4a90fa6b626e972bba3d57db46604cb7b2
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Sep 13 12:34:22 2022 +0200

    GitHub: use correct JSON-LD types for URLs and dates

See https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/509/ for more details.

This revision was not accepted when it landed; it landed in state Needs Review.Sep 27 2022, 5:37 PM
This revision was automatically updated to reflect the committed changes.