Page MenuHomeSoftware Heritage
Feed Advanced Search

Oct 13 2020

vlorentz added a comment to T2667: Decide what to do with PyPI snapshot metadata.

We don't keep the binary indexes from Debian repositories, for instance.

Oct 13 2020, 10:22 AM · Extrinsic metadata, PyPI loader
olasd added a comment to T2667: Decide what to do with PyPI snapshot metadata.

So they're metadata specific to files that we don't archive at all because they're not source? That doesn't sound very useful to keep at all. We don't keep the binary indexes from Debian repositories, for instance.

Oct 13 2020, 10:18 AM · Extrinsic metadata, PyPI loader
vlorentz added a comment to T2667: Decide what to do with PyPI snapshot metadata.

They are metadata on the file itself (file name, checksums, has signature, upload time, file-specific comment (often empty), yank status), so they have nothing in common

Oct 13 2020, 10:16 AM · Extrinsic metadata, PyPI loader
olasd added a comment to T2667: Decide what to do with PyPI snapshot metadata.

In practice, is there many meaningful differences between the wheel metadata and the sdist metadata? If not then I think option 3 would be the most sensible.

Oct 13 2020, 9:59 AM · Extrinsic metadata, PyPI loader
vlorentz updated the task description for T2667: Decide what to do with PyPI snapshot metadata.
Oct 13 2020, 9:45 AM · Extrinsic metadata, PyPI loader

Oct 12 2020

vlorentz added a comment to T2668: Package loaders should write extrinsic metadata on directories instead of revisions/releases.

FTR, olasd, douardda and I discussed an inconsistency in keys used in kafka, and decided to use hashes for all origin/visits/visit statuses; and doing the same for ext metadata in both kafka and the DB solves the issue about defining unicity.

Oct 12 2020, 1:52 PM · Package Loader, Storage manager, Extrinsic metadata
vlorentz added a subtask for T2668: Package loaders should write extrinsic metadata on directories instead of revisions/releases: T2686: Use hashes for all kafka keys.
Oct 12 2020, 1:06 PM · Package Loader, Storage manager, Extrinsic metadata
vlorentz added a comment to T2668: Package loaders should write extrinsic metadata on directories instead of revisions/releases.

@rdicosmo a full example of what?

Oct 12 2020, 10:57 AM · Package Loader, Storage manager, Extrinsic metadata
rdicosmo added a comment to T2668: Package loaders should write extrinsic metadata on directories instead of revisions/releases.

The suggestion was to have extrinsic metadata on directories that come from a deposit of a bundle (e.g. .tar.gz or .zip file coming from HAL), instead of on a synthetic revision as is currently the case, so they can be accessed knowing the hash of the directory (which is an intrinsic id).

Oct 12 2020, 10:44 AM · Package Loader, Storage manager, Extrinsic metadata

Oct 8 2020

vlorentz added a comment to T2668: Package loaders should write extrinsic metadata on directories instead of revisions/releases.

Alternatively, we could keep writing the metadata on revision/releases, and use the provenance service (when it's ready) to find them from a directory SWHID. What do you think?

Oct 8 2020, 11:47 AM · Package Loader, Storage manager, Extrinsic metadata

Oct 6 2020

vlorentz updated the task description for T2668: Package loaders should write extrinsic metadata on directories instead of revisions/releases.
Oct 6 2020, 10:45 AM · Package Loader, Storage manager, Extrinsic metadata
rdicosmo updated subscribers of T2668: Package loaders should write extrinsic metadata on directories instead of revisions/releases.
Oct 6 2020, 10:37 AM · Package Loader, Storage manager, Extrinsic metadata
vlorentz renamed T2668: Package loaders should write extrinsic metadata on directories instead of revisions/releases from Package loaders write extrinsic metadata on directories instead of revisions/releases to Package loaders should write extrinsic metadata on directories instead of revisions/releases.
Oct 6 2020, 10:30 AM · Package Loader, Storage manager, Extrinsic metadata
vlorentz triaged T2668: Package loaders should write extrinsic metadata on directories instead of revisions/releases as Normal priority.
Oct 6 2020, 10:30 AM · Package Loader, Storage manager, Extrinsic metadata
vlorentz triaged T2667: Decide what to do with PyPI snapshot metadata as Normal priority.
Oct 6 2020, 10:19 AM · Extrinsic metadata, PyPI loader

Sep 18 2020

moranegg edited projects for T2202: Collect extrinsic metadata, added: Extrinsic metadata; removed Metadata workflow.
Sep 18 2020, 2:39 PM · Roadmap 2022, meta-task, Roadmap 2021, Extrinsic metadata
moranegg added a subtask for T833: When listing an origin, add origin level metadata to RMD storage: T1740: fetch extrinsic origin metadata from GitHub.
Sep 18 2020, 2:36 PM · Extrinsic metadata, Restricted Project, GitHub lister
moranegg added a parent task for T833: When listing an origin, add origin level metadata to RMD storage: T2202: Collect extrinsic metadata.
Sep 18 2020, 2:35 PM · Extrinsic metadata, Restricted Project, GitHub lister
moranegg renamed T833: When listing an origin, add origin level metadata to RMD storage from When listing an origin, add origin level metadata to storage to When listing an origin, add origin level metadata to RMD storage.
Sep 18 2020, 2:31 PM · Extrinsic metadata, Restricted Project, GitHub lister
moranegg created Extrinsic metadata.
Sep 18 2020, 2:13 PM