Page MenuHomeSoftware Heritage

Choose/define an ontology to use for indexed extrinsic origin metadata
Closed, MigratedEdits Locked

Description

Indexers will translate GitHub/Gitlab/Gogs/Gitea/...'s API response into this ontology.

The grammar itself would mostly likely be JSON-LD so it is compatible with our other metadata (CodeMeta/schema.org)

Current options:

  • create our own
  • pick an existing forge's ontology, and map everything to it -> this would eventually turn into inventing our own ontology anyway
  • ForgeFed's ontology (and ActivityPub) -> not very suitable for us; at the moment it's a disjoint set from what we want (ForgeFed only cares about issues and PRs)
  • GHTorrent -> I am told they are working on a generic way to represent data that isn't tied to GitHub, but I cannot find it
  • schema.org (not codemeta, as it is only meant to describe source code and applications, not repositories, which are different objects)
  • Wikidata (eg. https://www.wikidata.org/wiki/Property:P9100 )
  • DOAP

Namespace prefixes used below:

DescriptionGitHub (no NS)Gitlab (no NS)Gitea (no NS)schema.org / CodemetaWikidataDOAPForgeFed (and ActivityStreams)libraries.io (no NS)
Name and description
Project namenamenamename / full_nameschema:namerelies on RDFrelies on RDF + doap:nameas:name
Project avataravatar_urlschema:imageP18 (image)logo_url
Owner nameowner.loginnamespace.namebeing discussed
Owner avatarowner.avatar_urlnamespace.avatar_urlbeing discussed
Owner homepage??being discussed
descriptiondescriptiondescriptiondescriptionschema:descriptionrelies on RDFdoap:shortdesc + doap:descriptionas:description (not forge:description!)description
homepagewebsiteschema:urlP2699 (URL) P856 (official website, less accurate but more popular)doap:homepage + old-homepagehomepage
Tags/labelstopicstopicsschema:keywordsP9100 (Github topic)keywords
schema:applicationCategory + schema:applicationSubcategorydoap:category
URL to clone/checkout the repoclone_urlssh_url_to_repo / http_url_to_repoclone_urlP1324 (source code repository) (kind of)doap:repository -> (doap:anon-root / doap:location)forge:cloneUri
Dates
created_atcreated_atcreated_atschema:dateCreatedP571 (inception)doap:createdcreated_at
updated_atlast_activity_atupdated_atschema:dateModified (kind of)as:updatedupdated_at
pushed_atschema:dateModified (kind of)(doesn't make sense in its data model)pushed_at
schema:datePublishedP577 (publication date)as:published
marked_for_deletion_on
Relationship with other repositories + current status
whether it's a forkforkfork
what it's a fork ofparent + sourceparent.web_urlparent.html_urlschema:isBasedOnP144 (based on)forge:forkedFrom
number of forksforks_countforks_countforks_countforge:forks -> as:totalItemsforks (on projects) / forks_count (on repositories)
list of forksforge:forks
Whether the repository is disabledarchivedarchivedarchivedbeing discussed
empty
whether it's a mirrormirrormirror
mirror_interval
mirror_updated
what it is a mirror ofmirror_url (deprecated)(no API)original_urlbeing discussedmirror_url
Whether this is a template repositoryis_templatetemplate
Presumably, what template repository was used to create this onetemplate_repository
visibility (private/internal/public)visibility (private/internal/public)internal + private (booleans)
Social features
stargazers_countstar_countstars_countschema:interactionStatistic -> filter on schema:LikeAction -> schema:userInteractionCountas:likes -> as:totalItemsstargazers_count
watchers_countwatchers_countschema:interactionStatistic -> filter on schema:FollowAction -> schema:userInteractionCountas:followers -> as:totalItemssubscribers_count
watchersas:followers
open_issues / open_issues_countopen_issues_countopen_issues_countopen_issues_count
open_pr_counter
(always true if repo not archived)merge_request_enabledhas_pull_requests
Configuration
default_branchdefault_branchdefault_branchdefault_branch
has_issueshas_issueshas_issues
codemeta:issueTrackerP1401 bug tracking systemdoap:bug-databaseforge:ticketsTrackedBy / forge:sendPatchesTo (see also)
internal_tracker.* + external_tracker.*
doap:mailing-list
doap:support-forum
doap:developer-forum
jobs_enabled
snippets_enabled
can_create_merge_request_in
resolve_outdated_diff_discussions
(different semantics)merge_methodallow_merge_commits + allow_rebase + allow_rebase_explicit + allow_squash_merge + default_merge_style
squash_option
has_projectshas_projects
has_downloads
has_wikiwiki_enabledhas_wikihas_wiki
external_wiki.*
has_pages
merge_commit_template
squash_commit_template
Statistics
not documentedsize
not documentedsize
not documentedsize
statistics.commit_count
statistics.storage_size
statistics.repository_size
statistics.<a few more>
release_counter
License
SPDX idlicense.spdx_id
license URL (usually on a small set of domains)by dereferencing license.url then getting html_urllicense.html_url / license.source_urlschema:license
license URIschema:licenseP275 (copyright license)doap:license
possibly inconsistentlicense.nickname
possibly inconsistentlicense.keylicense.key
possibly inconsistentlicense.namelicense.name
licenses / licenses_normalized / repository_license
Other mined metadata
programming languagelanguagelanguageschema:programmingLanguageP277 (programming language)doap:programming-languagelanguage
readme_urlcodemeta:readme
readme filenamehas_readme

Event Timeline

vlorentz triaged this task as Normal priority.May 16 2022, 4:51 PM
vlorentz created this task.
vlorentz updated the task description. (Show Details)

What are the drawbacks of Wikidata?

it is (probably) currently missing many of the terms we want to use

(moved to issue description because comments are painful to edit in Phab)

vlorentz updated the task description. (Show Details)
vlorentz updated the task description. (Show Details)
vlorentz updated the task description. (Show Details)
vlorentz updated the task description. (Show Details)
vlorentz updated the task description. (Show Details)
vlorentz updated the task description. (Show Details)
vlorentz updated the task description. (Show Details)
vlorentz updated the task description. (Show Details)
vlorentz updated the task description. (Show Details)
vlorentz updated the task description. (Show Details)
vlorentz updated the task description. (Show Details)
vlorentz updated the task description. (Show Details)
vlorentz updated the task description. (Show Details)

we should also keep track of F3 (Friendly Forge Format): https://forum.forgefriends.org/t/about-the-friendly-forge-format-f3/681

there is going to be significant overlap with the data we are interested in

vlorentz updated the task description. (Show Details)
vlorentz updated the task description. (Show Details)