Changeset View
Changeset View
Standalone View
Standalone View
docs/data-model.rst
Show First 20 Lines • Show All 265 Lines • ▼ Show 20 Lines | |||||
In addition to the artifacts detailed above used to represent original software | In addition to the artifacts detailed above used to represent original software | ||||
artifacts, the Software Heritage archive stores information about these | artifacts, the Software Heritage archive stores information about these | ||||
artifacts. | artifacts. | ||||
**extid** | **extid** | ||||
a relationship between an original identifier of an artifact, in its | a relationship between an original identifier of an artifact, in its | ||||
native/upstream environment, and a `core SWHID <persistent-identifiers>`, | native/upstream environment, and a `core SWHID <persistent-identifiers>`, | ||||
which is specific to Software Heritage. As such, it is a triple made of: | which is specific to Software Heritage. As such, it includes: | ||||
* the external identifier, stored as bytes whose format is opaque to the | * the external identifier, stored as bytes whose format is opaque to the | ||||
data model | data model | ||||
* a type (a simple name and a version), to identify the type of relationship | * a type (a simple name and a version), to identify the type of relationship | ||||
* the "target", which is a core SWHID | * the "target", which is a core SWHID | ||||
An extid may also include a "payload", which is arbitrary data about | |||||
the relationship. For example, an extid might link a directory to | |||||
the cryptographic hash of the tarball that originally contained it. | |||||
In this case, the payload could include data useful for | |||||
reconstructing the original tarball from the directory. | |||||
olasd: Maybe worth adding a practical note that the extid payload actually refers to a content node in… | |||||
**raw extrinsic metadata** | **raw extrinsic metadata** | ||||
an opaque bytestring, along with its format (a simple name), an identifier | an opaque bytestring, along with its format (a simple name), an identifier | ||||
of the object the metadata is about and in which context (similar to a | of the object the metadata is about and in which context (similar to a | ||||
`qualified SWHID <persistent-identifiers>`), and provenance information | `qualified SWHID <persistent-identifiers>`), and provenance information | ||||
(the authority who provided it, the fetcher tool used to get it, and the | (the authority who provided it, the fetcher tool used to get it, and the | ||||
data it was discovered at). | data it was discovered at). | ||||
It provides both a way to store information about an artifact contributed by | It provides both a way to store information about an artifact contributed by | ||||
external entities, after the artifact was created, and an escape hatch to | external entities, after the artifact was created, and an escape hatch to | ||||
store metadata that would not otherwise fit in the data model. | store metadata that would not otherwise fit in the data model. |
Maybe worth adding a practical note that the extid payload actually refers to a content node in the archive, rather than being stored inline.