At least loader deposit and npm [1] are fine.
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Advanced Search
Nov 22 2021
Nov 10 2021
Nov 9 2021
Nov 8 2021
Here is an overview of the fields (+ internal version name + branch name) used by each package loader, after D6616:
Oct 22 2021
Great news: of the 469k corrupt SVN revisions, all but 14 (yes, 14) can be fixed simply by adding 1 microsecond to their timestamp.
Oct 20 2021
After further investigation, I can't find any directory that is in a completely bad order; they are either ordered like git does (by adding a / at the end of dir entries) or by assuming a null byte at the end of dir entries.
Oct 15 2021
analysis on directories (some are also part of the fixable_trivial above, but I don't have the exact number, I lost it in my analysis):
In T3656#72364, @grouss wrote:according to the list of nodes provided by seirl there were ~21,000,000 revisions without ancestors according to swh-graph snapshot (2020-12-15)
according to the list of nodes provided by seirl there were ~21,000,000 revisions without ancestors according to swh-graph snapshot (2020-12-15)
checking in the current live swh DAG 2 days ago 98% have one in release or snapshot_branch.
indeed I was surprised because I did'nt have to loop over the revision history.
21:57 guest@softwareheritage => select count(distinct id) from revision_history where not exists (select 1 from revision where id=parent_id); count ─────── 2218 (1 ligne)
Oct 14 2021
Oct 13 2021
My script finished running on releases. Result: all 644k releases are recoverable (mostly just missing gpg signatures), except 75k whose origin does not exist anymore.
Oct 11 2021
this commit has lots of one-line "gpgsig" headers instead of a single "gpgsig" header with a multiline value: https://github.com/git-for-windows/git/commit/5f549aa2f78314ac37bbd436c8f80aea4c752e07
Oct 8 2021
Oct 5 2021
Aggregated stats at the moment:
Oct 1 2021
- "nonce" header is *after* gpgsig
- double "author" field in the original, and another commit with three "committer"....
- "mergetag" headers with an extra newline at the end (current versions of the loader strip it, looks like older ones didn't)
- "author xxx <yyy@gmail.com> <type 'int'> -0200" in original commit (dulwich obviously can't parse this)
Sep 30 2021
- some negative UTCs don't have the boolean set to true (the only example I have is also missing a space in the same line, so this might be why)
- "0000" as offset (with no sign), dulwich can't even parse this
- "+0575" as offset; parsed as "+0615"
- "\r" in an email address was stripped
Sep 29 2021
- "author xxx<yyy@googlemail.com> <xxx<yyy@googlemail.com>> 1282956323 +0200" parsed as "author xxx<yyy@googlemail.com> <xxx <yy@googlemail.com>> 1282956323 +0200" (yes there is a fullname in the email)
- a few commits with this timezone: "--700", parsed as "+0700"
- Some weird offsets parsed as "+0000": "+1400"/"+051800"/"+1558601"
- looks like old git loaders tried to decode and re-encode name and/or email, which messed it up when not UTF8
some new ones:
Sep 23 2021
Sep 20 2021
swh:1:dir:880034219b47a123b97161de4e0d3301746cec75 (from https://github.com/kentnl/Dist-Zilla-PluginBundle-Author-KENTNL) has a single entry; we stored it with mode 40000 on that entry, but the original mode is 40755.
Sep 17 2021
We always add "gpgsig" as the last header of revisions, but sometimes it is not. eg. https://github.com/kuehnelth/secure-nyancatd/commit/10011016fc08401b6d3f05c2561a5dd3eb0a2641
Commits ingested before https://forge.softwareheritage.org/rDLDGfb03140e169f38f5ea0d64d81b524b4b967335f7 have an extra newline at the end of the mergetag header
Sep 16 2021
Two other sources of mismatched checksums:
Sep 14 2021
Sep 3 2021
Old versions of Dulwich (eg. 0.16.3, the version in stretch), dropped newlines at the end of the gpgsig header.
Jul 30 2021
Dulwich 0.19.10 (released in january 2019) changed the way they handle signatures on annotated tags, so we silently drop all signatures since we started using it (probably whenever we upgraded loaders to Buster).
May 26 2021
Apr 28 2021
Apr 27 2021
Apr 23 2021
Apr 20 2021
In T3246#63559, @douardda wrote:do we also intent to have a takedown topic on kafka?
Apr 19 2021
also: what about exports we provide on git annex?
do we also intent to have a takedown topic on kafka?
Apr 15 2021
Apr 12 2021
Knobs to adjust the visibility of origins in the archive and in the web API
Apr 6 2021
Mar 15 2021
Mar 10 2021
Mar 5 2021
Feb 4 2021
Oct 14 2020
yes
I _think_ this usecase is solved with the origin_visit_status table (created vs. ongoing vs. completed). @vlorentz?
The fetch_history table is gone since swh.storage v0.0.155 / swh-storage schema v141.
Sep 22 2020
I don't think the script to check for these has been put anywhere, I believe it was just a raw SQL query.
Sep 11 2020
Aug 27 2020
what's the status of this task today? is there a probe that tracks these broken links? (or a script one can run)
Jul 29 2020
Jul 6 2020
Jun 19 2020
PR merged. New SWHID qualifiers will be supported in release 1.1.7 of idutils.
Jun 9 2020
Le mar. 9 juin 2020 à 16:18, anlambert (Antoine Lambert) <
forge@softwareheritage.org> a écrit :