A list compiled by etalab to work from: https://github.com/etalab/data-codes-sources-fr#mod%C3%A8le-de-donn%C3%A9es
Description
Event Timeline
codemeta | schema.org | type | example | Description | etalab_name | etalab type | etalab_example | comment |
---|---|---|---|---|---|---|---|---|
name | name | string | RepoName | name of the repository | nom | chaîne de caractères | nom-repertoire | |
author | author | Person | etalab | the author | organisation_nom | chaîne de caractères | etalab | |
contributor | contributor | Person | etalab | secondary authors | chaîne de | |||
URL | https://github.com/ | platform/forge used to host it | plateforme | chaîne de caractères | GitHub | not to be confused with CodeMeta’s runtimePlatform | ||
codeRepository | codeRepository | URL | https://github.com/etalab/nom-repertoire | URL to the repository | repertoire_url | chaîne de caractères (format uri) | https://github.com/etalab/nom-repertoire | |
description | description | string | This repository is useful | repository description | description | chaîne de caractères | Ce répertoire est utile | |
boolean | false | whether the repository is a fork | est_fork | booléen | false | |||
isBasedOn | URL | https://github.com/etalab/base-repo | the repo this repo is forked from (if any) | |||||
dateCreated | dateCreated | date | 2018-12-01T20:00:55Z | creation date | date_creation | date et heure | 2018-12-01T20:00:55Z | |
dateModified | dateModified | date | 2018-12-01T20:00:55Z | update date | derniere_mise_a_jour | date et heure | 2018-12-01T20:00:55Z | |
url | url | URL | https://etalab.gouv.fr | homepage URL | page_accueil | chaîne de caractères | https://etalab.gouv.fr | |
int | 42 | number of people who added the repository to their favorites | nombre_stars | nombre entier | 42 | |||
int | 13 | number of times the repository was forked | nombre_forks | nombre entier | 13 | related schema.org property: @reverse → isBasedOn | ||
license | license | URL | https://spdx.org/licenses/MIT | license of the repository, as detected or specified by the platform | licence | chaîne de caractères | MIT | |
int | 0 | number of open issues | nombre_issues_ouvertes | nombre entier | 0 | |||
issueTracker | URL | https://github.com/etalab/repo/issues | link to the bug tracker | |||||
programmingLanguage | programmingLanguage | string | Python | main language(s) as detected or specified by the platform | langage | chaîne de caractères | Python | |
keywords | keywords | string | useful,france,opendata | topics | chaîne de caractères | utile,france,opendata | ||
contIntegration | URL | https://travis.org/etalab/repo/ | link to the continuous integration service | |||||
readme | URL | link to the README file | ||||||
developmentStatus | string | active | e.g. Active, inactive, suspended |
@moranegg, @AntoineAugusti any ideas of stuff to add to this list, before I start reviewing what can actually be fetched from each forge?
In addition, it would be nice to know the number of contributors and get a sense of how active the project is. It can be a proxy to the latest commit date on the main branch.
I did a quick review of the different forges a while ago and GitHub seemed to expose the most metadata at the organisation level, which eases a lot the retrieval.
In addition, it would be nice to know the number of contributors
Good idea! I added it to the list.
get a sense of how active the project is. It can be a proxy to the latest commit date on the main branch.
There's developmentStatus, but I highly doubt many project define it. (I've seen it as a badge on some GitHub repos, but it's rare). Using "dateModified" seems like a good idea indeed.
Actually we can get the "dateModified" based on data already in the SWH archive, because on each visit of a repo we take a snapshot of the repo and hash it; so it's just a matter of listing the visits and finding the last change to this hash. But it's rather coarse-grained, we take a snapshot of each repo every one or two years.
I think that we should fetch all metadata found in its raw form (keep in xml if xml, etc.)
Apply translation techniques with CodeMeta to while identifying relevant metadata we want to keep in a translated format.
so no need to discriminate and choose what to fetch.
Here are a couple of rare metadata that are useful in certain use cases
datePublished we use it for HAL
referencedPublication used for software citation
releaseNotes will be used in the deposit use case for creating releases
Here is the list of metadata we worked on for HAL specifications:
https://forge.softwareheritage.org/P183