Page MenuHomeSoftware Heritage
Feed Advanced Search

Tue, Jan 18

vlorentz added a parent task for T75: Check integrity of directories, revisions, and releases: T3860: Document types/sources of object corruption.
Tue, Jan 18, 12:44 PM · Archive content, Restricted Project
vlorentz added a subtask for T3860: Document types/sources of object corruption: T75: Check integrity of directories, revisions, and releases.
Tue, Jan 18, 12:44 PM · Archive content, Documentation
vlorentz triaged T3860: Document types/sources of object corruption as High priority.
Tue, Jan 18, 12:44 PM · Archive content, Documentation

Dec 13 2021

ardumont moved T3722: staging: Deploy package loader v1.0, deposit server v0.16, lister v2.3 from Backlog to Deployed on the SWORD deposit board.
Dec 13 2021, 11:27 AM · System administration, SWORD deposit, Package Loader, Data Model, Archive content

Nov 26 2021

vlorentz added a comment to T3638: Make package loaders create releases objects instead of revisions.

Copy of an email I sent on 2021-11-17:

Nov 26 2021, 4:02 PM · Package Loader, Data Model, Archive content

Nov 22 2021

vlorentz closed T3636: Make the opam loader write extrinsic metadata, a subtask of T3638: Make package loaders create releases objects instead of revisions, as Resolved.
Nov 22 2021, 2:44 PM · Package Loader, Data Model, Archive content
vlorentz closed T3638: Make package loaders create releases objects instead of revisions, a subtask of T1258: Synthesize release objects for all upstream things that match the concept of a release, as Resolved.
Nov 22 2021, 2:44 PM · Archive content
vlorentz closed T3638: Make package loaders create releases objects instead of revisions as Resolved.
Nov 22 2021, 2:43 PM · Package Loader, Data Model, Archive content
ardumont closed T3745: production: Deploy package loader v1.1, deposit server v0.16, lister v2.3, a subtask of T3638: Make package loaders create releases objects instead of revisions, as Resolved.
Nov 22 2021, 2:29 PM · Package Loader, Data Model, Archive content
ardumont closed T3745: production: Deploy package loader v1.1, deposit server v0.16, lister v2.3 as Resolved.
Nov 22 2021, 2:29 PM · System administration, Package Loader, Data Model, Archive content
ardumont moved T3745: production: Deploy package loader v1.1, deposit server v0.16, lister v2.3 from deployed/landed/monitoring to Component upgrades on the System administration board.
Nov 22 2021, 2:29 PM · System administration, Package Loader, Data Model, Archive content
ardumont moved T3745: production: Deploy package loader v1.1, deposit server v0.16, lister v2.3 from in-progress to deployed/landed/monitoring on the System administration board.
Nov 22 2021, 2:05 PM · System administration, Package Loader, Data Model, Archive content
ardumont changed the status of T3745: production: Deploy package loader v1.1, deposit server v0.16, lister v2.3, a subtask of T3638: Make package loaders create releases objects instead of revisions, from Open to Work in Progress.
Nov 22 2021, 2:05 PM · Package Loader, Data Model, Archive content
ardumont changed the status of T3745: production: Deploy package loader v1.1, deposit server v0.16, lister v2.3 from Open to Work in Progress.
Nov 22 2021, 2:05 PM · System administration, Package Loader, Data Model, Archive content
ardumont added a project to T3745: production: Deploy package loader v1.1, deposit server v0.16, lister v2.3: System administration.
Nov 22 2021, 2:04 PM · System administration, Package Loader, Data Model, Archive content
ardumont updated the task description for T3745: production: Deploy package loader v1.1, deposit server v0.16, lister v2.3.
Nov 22 2021, 2:02 PM · System administration, Package Loader, Data Model, Archive content
ardumont updated the task description for T3745: production: Deploy package loader v1.1, deposit server v0.16, lister v2.3.
Nov 22 2021, 1:59 PM · System administration, Package Loader, Data Model, Archive content
ardumont updated the task description for T3745: production: Deploy package loader v1.1, deposit server v0.16, lister v2.3.
Nov 22 2021, 1:56 PM · System administration, Package Loader, Data Model, Archive content
ardumont updated the task description for T3745: production: Deploy package loader v1.1, deposit server v0.16, lister v2.3.
Nov 22 2021, 1:52 PM · System administration, Package Loader, Data Model, Archive content
ardumont updated the task description for T3745: production: Deploy package loader v1.1, deposit server v0.16, lister v2.3.
Nov 22 2021, 1:15 PM · System administration, Package Loader, Data Model, Archive content
ardumont renamed T3745: production: Deploy package loader v1.1, deposit server v0.16, lister v2.3 from production: Deploy package loader v1.0, deposit server v0.16, lister v2.3 to production: Deploy package loader v1.1, deposit server v0.16, lister v2.3.
Nov 22 2021, 1:06 PM · System administration, Package Loader, Data Model, Archive content
vlorentz placed T1260: Extend the release object model to allow synthetic objects up for grabs.
Nov 22 2021, 12:05 PM · Archive content
ardumont triaged T3745: production: Deploy package loader v1.1, deposit server v0.16, lister v2.3 as Normal priority.
Nov 22 2021, 11:30 AM · System administration, Package Loader, Data Model, Archive content

Nov 10 2021

ardumont closed T3722: staging: Deploy package loader v1.0, deposit server v0.16, lister v2.3, a subtask of T3638: Make package loaders create releases objects instead of revisions, as Resolved.
Nov 10 2021, 4:43 PM · Package Loader, Data Model, Archive content
ardumont closed T3722: staging: Deploy package loader v1.0, deposit server v0.16, lister v2.3 as Resolved.
Nov 10 2021, 4:43 PM · System administration, SWORD deposit, Package Loader, Data Model, Archive content
ardumont placed T3722: staging: Deploy package loader v1.0, deposit server v0.16, lister v2.3 up for grabs.
Nov 10 2021, 4:43 PM · System administration, SWORD deposit, Package Loader, Data Model, Archive content
ardumont moved T3722: staging: Deploy package loader v1.0, deposit server v0.16, lister v2.3 from in-progress to deployed/landed/monitoring on the System administration board.
Nov 10 2021, 4:29 PM · System administration, SWORD deposit, Package Loader, Data Model, Archive content
ardumont added a comment to T3722: staging: Deploy package loader v1.0, deposit server v0.16, lister v2.3.

At least loader deposit and npm [1] are fine.

Nov 10 2021, 4:24 PM · System administration, SWORD deposit, Package Loader, Data Model, Archive content
ardumont updated the task description for T3722: staging: Deploy package loader v1.0, deposit server v0.16, lister v2.3.
Nov 10 2021, 4:24 PM · System administration, SWORD deposit, Package Loader, Data Model, Archive content
ardumont moved T3722: staging: Deploy package loader v1.0, deposit server v0.16, lister v2.3 from Backlog to in-progress on the System administration board.
Nov 10 2021, 3:35 PM · System administration, SWORD deposit, Package Loader, Data Model, Archive content
ardumont renamed T3722: staging: Deploy package loader v1.0, deposit server v0.16, lister v2.3 from staging: Deploy package loader v1.0 and deposit server v0.16 to staging: Deploy package loader v1.0, deposit server v0.16, lister v2.3.
Nov 10 2021, 3:33 PM · System administration, SWORD deposit, Package Loader, Data Model, Archive content
ardumont changed the status of T3722: staging: Deploy package loader v1.0, deposit server v0.16, lister v2.3, a subtask of T3638: Make package loaders create releases objects instead of revisions, from Open to Work in Progress.
Nov 10 2021, 3:33 PM · Package Loader, Data Model, Archive content
ardumont changed the status of T3722: staging: Deploy package loader v1.0, deposit server v0.16, lister v2.3 from Open to Work in Progress.
Nov 10 2021, 3:33 PM · System administration, SWORD deposit, Package Loader, Data Model, Archive content
ardumont updated the task description for T3722: staging: Deploy package loader v1.0, deposit server v0.16, lister v2.3.
Nov 10 2021, 3:32 PM · System administration, SWORD deposit, Package Loader, Data Model, Archive content
vlorentz updated the task description for T3722: staging: Deploy package loader v1.0, deposit server v0.16, lister v2.3.
Nov 10 2021, 3:21 PM · System administration, SWORD deposit, Package Loader, Data Model, Archive content
vlorentz added a parent task for T3722: staging: Deploy package loader v1.0, deposit server v0.16, lister v2.3: T3636: Make the opam loader write extrinsic metadata.
Nov 10 2021, 3:20 PM · System administration, SWORD deposit, Package Loader, Data Model, Archive content
ardumont triaged T3722: staging: Deploy package loader v1.0, deposit server v0.16, lister v2.3 as Normal priority.
Nov 10 2021, 3:17 PM · System administration, SWORD deposit, Package Loader, Data Model, Archive content

Nov 9 2021

vlorentz added a revision to T3638: Make package loaders create releases objects instead of revisions: D6618: Document how each package loader populates fields..
Nov 9 2021, 12:39 PM · Package Loader, Data Model, Archive content

Nov 8 2021

vlorentz added revisions to T3638: Make package loaders create releases objects instead of revisions: D6616: Make package loaders write releases instead of revisions, D6617: Use release instead of revision as anchor in SWHID context instead..
Nov 8 2021, 11:58 AM · Package Loader, Data Model, Archive content
vlorentz added a comment to T3638: Make package loaders create releases objects instead of revisions.

Here is an overview of the fields (+ internal version name + branch name) used by each package loader, after D6616:

Nov 8 2021, 11:50 AM · Package Loader, Data Model, Archive content

Oct 22 2021

vlorentz added revisions to T1258: Synthesize release objects for all upstream things that match the concept of a release: D6529: deposit: Remove 'parent' deposit, D6530: Remove unused 'known_artifacts' code.
Oct 22 2021, 3:45 PM · Archive content
vlorentz added a comment to T75: Check integrity of directories, revisions, and releases.

Great news: of the 469k corrupt SVN revisions, all but 14 (yes, 14) can be fixed simply by adding 1 microsecond to their timestamp.

Oct 22 2021, 2:33 PM · Archive content, Restricted Project

Oct 20 2021

vlorentz added a comment to T75: Check integrity of directories, revisions, and releases.

After further investigation, I can't find any directory that is in a completely bad order; they are either ordered like git does (by adding a / at the end of dir entries) or by assuming a null byte at the end of dir entries.

Oct 20 2021, 12:18 PM · Archive content, Restricted Project

Oct 15 2021

vlorentz added a comment to T75: Check integrity of directories, revisions, and releases.

analysis on directories (some are also part of the fixable_trivial above, but I don't have the exact number, I lost it in my analysis):

Oct 15 2021, 11:21 AM · Archive content, Restricted Project
zack updated subscribers of T3656: Survey revisions/releases with partially loaded history.
Oct 15 2021, 9:34 AM · Archive content
zack added a comment to T3656: Survey revisions/releases with partially loaded history.
In T3656#72364, @grouss wrote:

according to the list of nodes provided by seirl there were ~21,000,000 revisions without ancestors according to swh-graph snapshot (2020-12-15)

Oct 15 2021, 9:33 AM · Archive content
grouss added a comment to T3656: Survey revisions/releases with partially loaded history.

according to the list of nodes provided by seirl there were ~21,000,000 revisions without ancestors according to swh-graph snapshot (2020-12-15)
checking in the current live swh DAG 2 days ago 98% have one in release or snapshot_branch.
indeed I was surprised because I did'nt have to loop over the revision history.

Oct 15 2021, 9:25 AM · Archive content
olasd added a project to T3660: Nodes with missing ancestors in SWH DAG / SWH-graph: Archive content.
Oct 15 2021, 9:17 AM · Archive content
ardumont added a comment to T3656: Survey revisions/releases with partially loaded history.

You might be interested by what @grouss just opened in T3660
(ah scratched that, zack already mentioned it)

Oct 15 2021, 9:07 AM · Archive content
zack added a subtask for T3656: Survey revisions/releases with partially loaded history: T3660: Nodes with missing ancestors in SWH DAG / SWH-graph.
Oct 15 2021, 8:56 AM · Archive content
zack updated subscribers of T3656: Survey revisions/releases with partially loaded history.

In T3660, @grouss has found many more.
Might be for a different reason (the dataset he analyzed is not the live one), but it's worth a comparison.

Oct 15 2021, 8:55 AM · Archive content
olasd added a comment to T3656: Survey revisions/releases with partially loaded history.
21:57 guest@softwareheritage => select count(distinct id) from revision_history where not exists (select 1 from revision where id=parent_id);
 count 
───────
  2218
(1 ligne)
Oct 15 2021, 8:50 AM · Archive content

Oct 14 2021

olasd triaged T3656: Survey revisions/releases with partially loaded history as Low priority.
Oct 14 2021, 11:40 AM · Archive content

Oct 13 2021

vlorentz added a comment to T75: Check integrity of directories, revisions, and releases.

My script finished running on releases. Result: all 644k releases are recoverable (mostly just missing gpg signatures), except 75k whose origin does not exist anymore.

Oct 13 2021, 6:40 PM · Archive content, Restricted Project

Oct 11 2021

vlorentz added a comment to T75: Check integrity of directories, revisions, and releases.

this commit has lots of one-line "gpgsig" headers instead of a single "gpgsig" header with a multiline value: https://github.com/git-for-windows/git/commit/5f549aa2f78314ac37bbd436c8f80aea4c752e07

Oct 11 2021, 3:07 PM · Archive content, Restricted Project

Oct 8 2021

vlorentz added a subtask for T3638: Make package loaders create releases objects instead of revisions: T3636: Make the opam loader write extrinsic metadata.
Oct 8 2021, 2:32 PM · Package Loader, Data Model, Archive content
vlorentz added projects to T3638: Make package loaders create releases objects instead of revisions: Data Model, Package Loader.
Oct 8 2021, 2:30 PM · Package Loader, Data Model, Archive content
vlorentz triaged T3638: Make package loaders create releases objects instead of revisions as Normal priority.
Oct 8 2021, 2:30 PM · Package Loader, Data Model, Archive content

Oct 5 2021

vlorentz claimed T1260: Extend the release object model to allow synthetic objects.
Oct 5 2021, 5:50 PM · Archive content
vlorentz added a comment to T75: Check integrity of directories, revisions, and releases.

Aggregated stats at the moment:

Oct 5 2021, 2:30 PM · Archive content, Restricted Project

Oct 1 2021

vlorentz added a comment to T75: Check integrity of directories, revisions, and releases.
  1. "nonce" header is *after* gpgsig
  2. double "author" field in the original, and another commit with three "committer"....
  3. "mergetag" headers with an extra newline at the end (current versions of the loader strip it, looks like older ones didn't)
  4. "author xxx <yyy@gmail.com> <type 'int'> -0200" in original commit (dulwich obviously can't parse this)
Oct 1 2021, 8:50 PM · Archive content, Restricted Project

Sep 30 2021

vlorentz added a comment to T75: Check integrity of directories, revisions, and releases.
  1. some negative UTCs don't have the boolean set to true (the only example I have is also missing a space in the same line, so this might be why)
  2. "0000" as offset (with no sign), dulwich can't even parse this
  3. "+0575" as offset; parsed as "+0615"
  4. "\r" in an email address was stripped
Sep 30 2021, 12:00 PM · Archive content, Restricted Project

Sep 29 2021

vlorentz added a comment to T75: Check integrity of directories, revisions, and releases.
  1. "author xxx<yyy@googlemail.com> <xxx<yyy@googlemail.com>> 1282956323 +0200" parsed as "author xxx<yyy@googlemail.com> <xxx <yy@googlemail.com>> 1282956323 +0200" (yes there is a fullname in the email)
  2. a few commits with this timezone: "--700", parsed as "+0700"
  3. Some weird offsets parsed as "+0000": "+1400"/"+051800"/"+1558601"
  4. looks like old git loaders tried to decode and re-encode name and/or email, which messed it up when not UTF8
Sep 29 2021, 6:38 PM · Archive content, Restricted Project
vlorentz added a comment to T75: Check integrity of directories, revisions, and releases.

some new ones:

Sep 29 2021, 5:56 PM · Archive content, Restricted Project

Sep 23 2021

vlorentz renamed T75: Check integrity of directories, revisions, and releases from Check integrity of Revisions and Releases to Check integrity of directories, revisions, and releases.
Sep 23 2021, 3:55 PM · Archive content, Restricted Project

Sep 20 2021

vlorentz added a comment to T75: Check integrity of directories, revisions, and releases.

swh:1:dir:880034219b47a123b97161de4e0d3301746cec75 (from https://github.com/kentnl/Dist-Zilla-PluginBundle-Author-KENTNL) has a single entry; we stored it with mode 40000 on that entry, but the original mode is 40755.

Sep 20 2021, 3:59 PM · Archive content, Restricted Project

Sep 17 2021

vlorentz added a comment to T75: Check integrity of directories, revisions, and releases.

We always add "gpgsig" as the last header of revisions, but sometimes it is not. eg. https://github.com/kuehnelth/secure-nyancatd/commit/10011016fc08401b6d3f05c2561a5dd3eb0a2641

Sep 17 2021, 3:11 PM · Archive content, Restricted Project
vlorentz added a comment to T75: Check integrity of directories, revisions, and releases.

Commits ingested before https://forge.softwareheritage.org/rDLDGfb03140e169f38f5ea0d64d81b524b4b967335f7 have an extra newline at the end of the mergetag header

Sep 17 2021, 2:37 PM · Archive content, Restricted Project

Sep 16 2021

vlorentz added a comment to T75: Check integrity of directories, revisions, and releases.

Two other sources of mismatched checksums:

Sep 16 2021, 7:22 PM · Archive content, Restricted Project

Sep 14 2021

vlorentz claimed T75: Check integrity of directories, revisions, and releases.
Sep 14 2021, 4:50 PM · Archive content, Restricted Project

Sep 3 2021

vlorentz added a parent task for T75: Check integrity of directories, revisions, and releases: T3552: Fix corrupted releases, revisions, and directories in the storage.
Sep 3 2021, 6:28 PM · Archive content, Restricted Project
vlorentz added a comment to T75: Check integrity of directories, revisions, and releases.

Old versions of Dulwich (eg. 0.16.3, the version in stretch), dropped newlines at the end of the gpgsig header.

Sep 3 2021, 6:25 PM · Archive content, Restricted Project

Jul 30 2021

vlorentz removed a revision from T75: Check integrity of directories, revisions, and releases: D6045: converters: Preserve GPG signatures on releases.
Jul 30 2021, 10:59 AM · Archive content, Restricted Project
vlorentz added a revision to T75: Check integrity of directories, revisions, and releases: D6045: converters: Preserve GPG signatures on releases.
Jul 30 2021, 10:58 AM · Archive content, Restricted Project
vlorentz added a comment to T75: Check integrity of directories, revisions, and releases.

Dulwich 0.19.10 (released in january 2019) changed the way they handle signatures on annotated tags, so we silently drop all signatures since we started using it (probably whenever we upgraded loaders to Buster).

Jul 30 2021, 10:58 AM · Archive content, Restricted Project

May 26 2021

vlorentz changed the status of T2564: migrate existing revisions metadata extra_headers to actual extra_headers field, a subtask of T3089: Remove the 'metadata' column of the 'revision' table, from Work in Progress to Open.
May 26 2021, 11:26 AM · Storage manager, Archive content

Apr 28 2021

vlorentz changed the status of T2564: migrate existing revisions metadata extra_headers to actual extra_headers field, a subtask of T3089: Remove the 'metadata' column of the 'revision' table, from Open to Work in Progress.
Apr 28 2021, 12:43 PM · Storage manager, Archive content

Apr 27 2021

vlorentz removed a project from T3246: Document takedown request processing workflow: Roadmap 2021.
Apr 27 2021, 2:15 PM · Archive content

Apr 23 2021

vlorentz assigned T3113: Cold storage archive to douardda.
Apr 23 2021, 4:49 PM · Roadmap 2021, Archive content, meta-task
vlorentz added a subtask for T3089: Remove the 'metadata' column of the 'revision' table: T2564: migrate existing revisions metadata extra_headers to actual extra_headers field.
Apr 23 2021, 9:58 AM · Storage manager, Archive content

Apr 20 2021

douardda added a comment to T3246: Document takedown request processing workflow.

do we also intent to have a takedown topic on kafka?

Apr 20 2021, 11:08 AM · Archive content

Apr 19 2021

vlorentz removed a parent task for T3089: Remove the 'metadata' column of the 'revision' table: T2471: NPM package angular-ts-manage fails to be properly loaded.
Apr 19 2021, 12:43 PM · Storage manager, Archive content
rdicosmo moved T3246: Document takedown request processing workflow from Backlog to Work in progress on the Roadmap 2021 board.
Apr 19 2021, 11:53 AM · Archive content
douardda added a comment to T3246: Document takedown request processing workflow.

also: what about exports we provide on git annex?

Apr 19 2021, 10:10 AM · Archive content
douardda added a comment to T3246: Document takedown request processing workflow.

do we also intent to have a takedown topic on kafka?

Apr 19 2021, 10:09 AM · Archive content

Apr 15 2021

vlorentz closed T3090: Make loaders not rely on the 'metadata' column of the 'revision' table, a subtask of T3089: Remove the 'metadata' column of the 'revision' table, as Resolved.
Apr 15 2021, 3:15 PM · Storage manager, Archive content

Apr 12 2021

olasd added a comment to T3246: Document takedown request processing workflow.

Knobs to adjust the visibility of origins in the archive and in the web API

Apr 12 2021, 4:52 PM · Archive content
olasd triaged T3246: Document takedown request processing workflow as Normal priority.
Apr 12 2021, 4:33 PM · Archive content

Apr 6 2021

vlorentz added a parent task for T3089: Remove the 'metadata' column of the 'revision' table: T3201: Mirror: unsupported Unicode escape sequence.
Apr 6 2021, 2:20 PM · Storage manager, Archive content

Mar 15 2021

vlorentz added a parent task for T3089: Remove the 'metadata' column of the 'revision' table: T2471: NPM package angular-ts-manage fails to be properly loaded.
Mar 15 2021, 12:32 PM · Storage manager, Archive content
vlorentz triaged T3113: Cold storage archive as Normal priority.
Mar 15 2021, 12:30 PM · Roadmap 2021, Archive content, meta-task

Mar 10 2021

rdicosmo moved T3113: Cold storage archive from Backlog to Work in progress on the Roadmap 2021 board.
Mar 10 2021, 4:29 PM · Roadmap 2021, Archive content, meta-task
rdicosmo created T3113: Cold storage archive.
Mar 10 2021, 4:26 PM · Roadmap 2021, Archive content, meta-task

Mar 5 2021

vlorentz added a subtask for T3089: Remove the 'metadata' column of the 'revision' table: T2513: Copy metadata on revisions to the extrinsic metadata storage.
Mar 5 2021, 3:51 PM · Storage manager, Archive content
vlorentz added a parent task for T3089: Remove the 'metadata' column of the 'revision' table: T2059: Generate (swh) releases from all git tags.
Mar 5 2021, 12:30 PM · Storage manager, Archive content
vlorentz triaged T3089: Remove the 'metadata' column of the 'revision' table as Normal priority.
Mar 5 2021, 12:27 PM · Storage manager, Archive content

Feb 4 2021

vlorentz added a parent task for T75: Check integrity of directories, revisions, and releases: T3010: Enable the validating storage proxy in production.
Feb 4 2021, 6:13 PM · Archive content, Restricted Project
vlorentz merged T3012: Check all objects in the production storage/journal have a correct hash into T75: Check integrity of directories, revisions, and releases.
Feb 4 2021, 6:13 PM · Archive content, Restricted Project

Oct 14 2020

ardumont added a comment to T994: origin_visit: distinguish "fetch date" and "injection date".

yes

Oct 14 2020, 1:49 PM · Archive content
olasd updated subscribers of T994: origin_visit: distinguish "fetch date" and "injection date".

I _think_ this usecase is solved with the origin_visit_status table (created vs. ongoing vs. completed). @vlorentz?

Oct 14 2020, 12:34 PM · Archive content