Page MenuHomeSoftware Heritage

Archive contentFolder
ActivePublic

Members

  • This project does not have any members.

Watchers

  • This project does not have any watchers.

Details

Description

stuff related to content (of all kinds, not only "blobs") that is already stored in the Software Heritage archive

Recent Activity

Dec 16 2019

anlambert closed T2148: Recreate save code now requests that failed when migrating loaders as Resolved.
Dec 16 2019, 3:43 PM · Archive content
anlambert updated the task description for T2148: Recreate save code now requests that failed when migrating loaders.
Dec 16 2019, 3:43 PM · Archive content
anlambert updated the task description for T2148: Recreate save code now requests that failed when migrating loaders.
Dec 16 2019, 2:33 PM · Archive content
anlambert updated the task description for T2148: Recreate save code now requests that failed when migrating loaders.
Dec 16 2019, 11:01 AM · Archive content

Dec 13 2019

anlambert updated the task description for T2148: Recreate save code now requests that failed when migrating loaders.
Dec 13 2019, 4:34 PM · Archive content
anlambert triaged T2148: Recreate save code now requests that failed when migrating loaders as Normal priority.
Dec 13 2019, 4:31 PM · Archive content

Nov 18 2019

zack added a comment to T1817: À la recherche du content perdu.

I've used swh-graph to lookup the 74 still missing contents, I've managed to find 67 of them, see cnt→ori mapping in (tracing them back to actual origins requires T2045):

Nov 18 2019, 5:54 PM · Archive content

Jul 3 2019

ardumont placed T958: googlecode import: Clean up googlecode origin's origin_visits up for grabs.
Jul 3 2019, 3:26 PM · SVN Loader, Origin-GoogleCode, Archive content

Jun 20 2019

olasd updated the task description for T1817: À la recherche du content perdu.
Jun 20 2019, 6:23 PM · Archive content
olasd updated subscribers of T1817: À la recherche du content perdu.

151 contents have been restored with help from the provenance index, thanks to @grouss.

Jun 20 2019, 6:22 PM · Archive content

Jun 17 2019

olasd triaged T1817: À la recherche du content perdu as Normal priority.
Jun 17 2019, 5:59 PM · Archive content

Apr 24 2019

vlorentz closed T1691: metadata indexer: investigate metadata entries with empty mappings as Resolved.

We should investigate why they are there.

Apr 24 2019, 5:22 PM · Archive content, Indexer
vlorentz closed T1691: metadata indexer: investigate metadata entries with empty mappings, a subtask of T1549: Clean up entries in {origin_intrinsic,revision}_metadata with no metadata, as Resolved.
Apr 24 2019, 5:22 PM · Archive content, Indexer
zack renamed T1691: metadata indexer: investigate metadata entries with empty mappings from metadata indexer: investigate empty mappings to metadata indexer: investigate metadata entries with empty mappings.
Apr 24 2019, 5:21 PM · Archive content, Indexer
zack triaged T1691: metadata indexer: investigate metadata entries with empty mappings as Normal priority.
Apr 24 2019, 5:20 PM · Archive content, Indexer
zack closed T1549: Clean up entries in {origin_intrinsic,revision}_metadata with no metadata as Resolved.

This is now done, aside from a minor issue noted below:

softwareheritage-indexer=# select count(*) from revision_intrinsic_metadata where metadata = '{"@context": "https://doi.org/10.5063/schema/codemeta-2.0"}'::jsonb;
 count 
-------
     0
(1 row)
Apr 24 2019, 5:18 PM · Archive content, Indexer

Apr 3 2019

vlorentz updated the task description for T1549: Clean up entries in {origin_intrinsic,revision}_metadata with no metadata.
Apr 3 2019, 11:14 AM · Archive content, Indexer
vlorentz updated the task description for T1549: Clean up entries in {origin_intrinsic,revision}_metadata with no metadata.
Apr 3 2019, 10:41 AM · Archive content, Indexer
vlorentz added a comment to T1549: Clean up entries in {origin_intrinsic,revision}_metadata with no metadata.
Apr 3 2019, 9:52 AM · Archive content, Indexer

Apr 2 2019

zack updated the task description for T1549: Clean up entries in {origin_intrinsic,revision}_metadata with no metadata.
Apr 2 2019, 4:41 PM · Archive content, Indexer
zack updated the task description for T1549: Clean up entries in {origin_intrinsic,revision}_metadata with no metadata.
Apr 2 2019, 4:40 PM · Archive content, Indexer
zack updated the task description for T1549: Clean up entries in {origin_intrinsic,revision}_metadata with no metadata.
Apr 2 2019, 4:37 PM · Archive content, Indexer
zack updated the task description for T1549: Clean up entries in {origin_intrinsic,revision}_metadata with no metadata.
Apr 2 2019, 4:37 PM · Archive content, Indexer

Mar 25 2019

olasd closed T1534: PostgreSQL replication issues between prado and somerset as Resolved.

The replication process from prado to somerset is now complete, and the archive frontend has been switched over to this database.

Mar 25 2019, 6:08 PM · System administration, Archive content
olasd updated the task description for T1534: PostgreSQL replication issues between prado and somerset.
Mar 25 2019, 6:07 PM · System administration, Archive content
olasd updated the task description for T1534: PostgreSQL replication issues between prado and somerset.
Mar 25 2019, 10:32 AM · System administration, Archive content

Mar 23 2019

olasd updated the task description for T1534: PostgreSQL replication issues between prado and somerset.
Mar 23 2019, 2:29 PM · System administration, Archive content
olasd updated the task description for T1534: PostgreSQL replication issues between prado and somerset.
Mar 23 2019, 10:02 AM · System administration, Archive content

Mar 22 2019

olasd updated the task description for T1534: PostgreSQL replication issues between prado and somerset.
Mar 22 2019, 11:26 PM · System administration, Archive content
olasd updated the task description for T1534: PostgreSQL replication issues between prado and somerset.
Mar 22 2019, 6:32 PM · System administration, Archive content
olasd updated the task description for T1534: PostgreSQL replication issues between prado and somerset.
Mar 22 2019, 6:00 PM · System administration, Archive content
olasd updated the task description for T1534: PostgreSQL replication issues between prado and somerset.
Mar 22 2019, 5:51 PM · System administration, Archive content
olasd updated the task description for T1534: PostgreSQL replication issues between prado and somerset.
Mar 22 2019, 5:31 PM · System administration, Archive content
olasd updated the task description for T1534: PostgreSQL replication issues between prado and somerset.
Mar 22 2019, 4:52 PM · System administration, Archive content
olasd changed the status of T1534: PostgreSQL replication issues between prado and somerset from Open to Work in Progress.

The replicated cluster is now clear to be taken down for a rebuild.

Mar 22 2019, 2:55 PM · System administration, Archive content

Mar 20 2019

zack reassigned T1549: Clean up entries in {origin_intrinsic,revision}_metadata with no metadata from zack to vlorentz.
Mar 20 2019, 12:10 PM · Archive content, Indexer

Mar 15 2019

zack added a comment to T1549: Clean up entries in {origin_intrinsic,revision}_metadata with no metadata.

@vlorentz: lather, rinse, repeat.

softwareheritage-indexer=# DELETE FROM revision_intrinsic_metadata
WHERE metadata = '{"@context": "https://doi.org/10.5063/schema/codemeta-2.0"}'::jsonb ;
ERROR:  deadlock detected
DETAIL:  Process 23900 waits for ShareLock on transaction 212164862; blocked by process 20175.
Process 20175 waits for ShareLock on transaction 212164381; blocked by process 23900.
HINT:  See server log for query details.
CONTEXT:  while deleting tuple (772424,55) in relation "revision_intrinsic_metadata"
Time: 33048,828 ms (00:33,049)

(just happened, after indexers have been restarted including D1218)

Mar 15 2019, 9:25 PM · Archive content, Indexer

Mar 4 2019

zack added a comment to T1549: Clean up entries in {origin_intrinsic,revision}_metadata with no metadata.

D1218
Once this is landed and deployed, ordering your DELETEs by revision_metadata.id will acquire locks in the same order as the idx_storage, solving the deadlock issue.

Mar 4 2019, 3:11 PM · Archive content, Indexer
vlorentz added a comment to T1549: Clean up entries in {origin_intrinsic,revision}_metadata with no metadata.

Once this is landed and deployed, ordering your DELETEs by revision_metadata.id will acquire locks in the same order as the idx_storage, solving the deadlock issue.

Mar 4 2019, 12:51 PM · Archive content, Indexer

Mar 2 2019

zack added a comment to T1549: Clean up entries in {origin_intrinsic,revision}_metadata with no metadata.

The update completed, but a first attempt at the second DELETE failed with a deadlock (?!):

softwareheritage-indexer=# DELETE FROM revision_metadata
WHERE translated_metadata = '{"@context": "https://doi.org/10.5063/schema/codemeta-2.0"}'::jsonb ;                                                                                            ERROR:  deadlock detected
DETAIL:  Process 10966 waits for ShareLock on transaction 197265813; blocked by process 11754.
Process 11754 waits for ShareLock on transaction 197264487; blocked by process 10966.
HINT:  See server log for query details.
CONTEXT:  while deleting tuple (1380733,15) in relation "revision_metadata"
Time: 170864,091 ms (02:50,864)
Mar 2 2019, 1:18 PM · Archive content, Indexer
zack added a comment to T1549: Clean up entries in {origin_intrinsic,revision}_metadata with no metadata.

The following fix for the above (suggested by @vlorentz ) is now running:

update revision_metadata
set translated_metadata = origin_intrinsic_metadata.metadata
from origin_intrinsic_metadata
where revision_metadata.id=origin_intrinsic_metadata.from_revision and revision_metadata.translated_metadata='{"@context": "https://doi.org/10.5063/schema/codemeta-2.0"}' and origin_intrinsic_metadata.metadata != '{"@context": "https://doi.org/10.5063/schema/codemeta-2.0"}';
Mar 2 2019, 9:53 AM · Archive content, Indexer

Mar 1 2019

zack added a comment to T1549: Clean up entries in {origin_intrinsic,revision}_metadata with no metadata.

As discussed on IRC, even after cleaning up origin_intrinsic_metadata, the DELETE on revision_metadata fails with:

softwareheritage-indexer=# DELETE FROM revision_metadata
softwareheritage-indexer-# WHERE translated_metadata = '{"@context": "https://doi.org/10.5063/schema/codemeta-2.0"}'::json
b ;
Mar 1 2019, 4:49 PM · Archive content, Indexer
zack claimed T1549: Clean up entries in {origin_intrinsic,revision}_metadata with no metadata.
Mar 1 2019, 3:12 PM · Archive content, Indexer
zack changed the status of T1549: Clean up entries in {origin_intrinsic,revision}_metadata with no metadata from Open to Work in Progress.

I've started the first of following queries on somerset (in a screen of my user):

DELETE FROM origin_intrinsic_metadata
WHERE metadata = '{"@context": "https://doi.org/10.5063/schema/codemeta-2.0"}'::jsonb ;
Mar 1 2019, 2:51 PM · Archive content, Indexer

Feb 27 2019

vlorentz added a revision to T1549: Clean up entries in {origin_intrinsic,revision}_metadata with no metadata: D1206: Prevent origin metadata indexer from writing empty records.
Feb 27 2019, 3:44 PM · Archive content, Indexer
vlorentz claimed T1549: Clean up entries in {origin_intrinsic,revision}_metadata with no metadata.
Feb 27 2019, 3:40 PM · Archive content, Indexer
vlorentz added a comment to T1549: Clean up entries in {origin_intrinsic,revision}_metadata with no metadata.

To know that we went over the origin and did not find anything, without additional tables.

Feb 27 2019, 11:44 AM · Archive content, Indexer

Feb 26 2019

zack added a comment to T1549: Clean up entries in {origin_intrinsic,revision}_metadata with no metadata.

I don't understand your comment. What are the remaining arguments for using NULL instead of just deleting rows?

Feb 26 2019, 6:42 PM · Archive content, Indexer
vlorentz added a comment to T1549: Clean up entries in {origin_intrinsic,revision}_metadata with no metadata.

and the NULL option

Feb 26 2019, 5:42 PM · Archive content, Indexer
zack added a comment to T1549: Clean up entries in {origin_intrinsic,revision}_metadata with no metadata.

Yes :-)
so, do we agree that the right fix for this task is just to get rid of empty-ish rows? or are there other arguments that we haven't considered yet?

Feb 26 2019, 5:41 PM · Archive content, Indexer