Page MenuHomeSoftware Heritage
Feed Advanced Search

Jun 9 2020

anlambert added a comment to T2441: Update SWHID regexp used by Zenodo.

I just simplified the regexp to allow qualifiers permutation: https://github.com/inveniosoftware/idutils/blob/cc09640ffb457bab3cfe8d0eeb4822dd521fd36d/idutils/__init__.py#L245-L249

Jun 9 2020, 4:18 PM · Archive content
rdicosmo added a comment to T2441: Update SWHID regexp used by Zenodo.

Is there a way to improve the regex in https://github.com/inveniosoftware/idutils/pull/60 to allow qualifiers to come in any order instead of the canonical one?

Jun 9 2020, 3:59 PM · Archive content
anlambert added a comment to T2441: Update SWHID regexp used by Zenodo.

PR submitted: https://github.com/inveniosoftware/idutils/pull/60

Jun 9 2020, 2:27 PM · Archive content
anlambert triaged T2441: Update SWHID regexp used by Zenodo as Normal priority.
Jun 9 2020, 11:30 AM · Archive content

Mar 24 2020

olasd added a project to T2333: Use non-url identifiers for origin url attribute : Archive content.
Mar 24 2020, 12:44 PM · Archive content

Feb 19 2020

vlorentz claimed T1258: Synthesize release objects for all upstream things that match the concept of a release.
Feb 19 2020, 5:53 PM · Archive content

Jan 29 2020

vlorentz moved T846: Some objects from the original GitHub import have never actually been imported. from Backlog to Work in progress on the Roadmap 2020 board.
Jan 29 2020, 5:07 PM · Roadmap 2020, Restricted Project, Archive content

Jan 23 2020

olasd changed the status of T846: Some objects from the original GitHub import have never actually been imported. from Open to Work in Progress.

List of revisions with no parents (1259):

Jan 23 2020, 6:37 PM · Roadmap 2020, Restricted Project, Archive content
douardda added a project to T846: Some objects from the original GitHub import have never actually been imported.: Roadmap 2020.
Jan 23 2020, 2:01 PM · Roadmap 2020, Restricted Project, Archive content
douardda added a parent task for T846: Some objects from the original GitHub import have never actually been imported.: T2207: Improve ingestion efficiency .
Jan 23 2020, 2:01 PM · Roadmap 2020, Restricted Project, Archive content

Dec 16 2019

anlambert closed T2148: Recreate save code now requests that failed when migrating loaders as Resolved.
Dec 16 2019, 3:43 PM · Archive content
anlambert updated the task description for T2148: Recreate save code now requests that failed when migrating loaders.
Dec 16 2019, 3:43 PM · Archive content
anlambert updated the task description for T2148: Recreate save code now requests that failed when migrating loaders.
Dec 16 2019, 2:33 PM · Archive content
anlambert updated the task description for T2148: Recreate save code now requests that failed when migrating loaders.
Dec 16 2019, 11:01 AM · Archive content

Dec 13 2019

anlambert updated the task description for T2148: Recreate save code now requests that failed when migrating loaders.
Dec 13 2019, 4:34 PM · Archive content
anlambert triaged T2148: Recreate save code now requests that failed when migrating loaders as Normal priority.
Dec 13 2019, 4:31 PM · Archive content

Nov 18 2019

zack added a comment to T1817: À la recherche du content perdu.

I've used swh-graph to lookup the 74 still missing contents, I've managed to find 67 of them, see cnt→ori mapping in (tracing them back to actual origins requires T2045):

Nov 18 2019, 5:54 PM · Archive content

Jul 3 2019

ardumont placed T958: googlecode import: Clean up googlecode origin's origin_visits up for grabs.
Jul 3 2019, 3:26 PM · SVN Loader, Origin-GoogleCode, Archive content

Jun 20 2019

olasd updated the task description for T1817: À la recherche du content perdu.
Jun 20 2019, 6:23 PM · Archive content
olasd updated subscribers of T1817: À la recherche du content perdu.

151 contents have been restored with help from the provenance index, thanks to @grouss.

Jun 20 2019, 6:22 PM · Archive content

Jun 17 2019

olasd triaged T1817: À la recherche du content perdu as Normal priority.
Jun 17 2019, 5:59 PM · Archive content

Apr 24 2019

vlorentz closed T1691: metadata indexer: investigate metadata entries with empty mappings as Resolved.

We should investigate why they are there.

Apr 24 2019, 5:22 PM · Archive content, Indexer
vlorentz closed T1691: metadata indexer: investigate metadata entries with empty mappings, a subtask of T1549: Clean up entries in {origin_intrinsic,revision}_metadata with no metadata, as Resolved.
Apr 24 2019, 5:22 PM · Archive content, Indexer
zack renamed T1691: metadata indexer: investigate metadata entries with empty mappings from metadata indexer: investigate empty mappings to metadata indexer: investigate metadata entries with empty mappings.
Apr 24 2019, 5:21 PM · Archive content, Indexer
zack triaged T1691: metadata indexer: investigate metadata entries with empty mappings as Normal priority.
Apr 24 2019, 5:20 PM · Archive content, Indexer
zack closed T1549: Clean up entries in {origin_intrinsic,revision}_metadata with no metadata as Resolved.

This is now done, aside from a minor issue noted below:

softwareheritage-indexer=# select count(*) from revision_intrinsic_metadata where metadata = '{"@context": "https://doi.org/10.5063/schema/codemeta-2.0"}'::jsonb;
 count 
-------
     0
(1 row)
Apr 24 2019, 5:18 PM · Archive content, Indexer

Apr 3 2019

vlorentz updated the task description for T1549: Clean up entries in {origin_intrinsic,revision}_metadata with no metadata.
Apr 3 2019, 11:14 AM · Archive content, Indexer
vlorentz updated the task description for T1549: Clean up entries in {origin_intrinsic,revision}_metadata with no metadata.
Apr 3 2019, 10:41 AM · Archive content, Indexer
vlorentz added a comment to T1549: Clean up entries in {origin_intrinsic,revision}_metadata with no metadata.
Apr 3 2019, 9:52 AM · Archive content, Indexer

Apr 2 2019

zack updated the task description for T1549: Clean up entries in {origin_intrinsic,revision}_metadata with no metadata.
Apr 2 2019, 4:41 PM · Archive content, Indexer
zack updated the task description for T1549: Clean up entries in {origin_intrinsic,revision}_metadata with no metadata.
Apr 2 2019, 4:40 PM · Archive content, Indexer
zack updated the task description for T1549: Clean up entries in {origin_intrinsic,revision}_metadata with no metadata.
Apr 2 2019, 4:37 PM · Archive content, Indexer
zack updated the task description for T1549: Clean up entries in {origin_intrinsic,revision}_metadata with no metadata.
Apr 2 2019, 4:37 PM · Archive content, Indexer

Mar 25 2019

olasd closed T1534: PostgreSQL replication issues between prado and somerset as Resolved.

The replication process from prado to somerset is now complete, and the archive frontend has been switched over to this database.

Mar 25 2019, 6:08 PM · System administration, Archive content
olasd updated the task description for T1534: PostgreSQL replication issues between prado and somerset.
Mar 25 2019, 6:07 PM · System administration, Archive content
olasd updated the task description for T1534: PostgreSQL replication issues between prado and somerset.
Mar 25 2019, 10:32 AM · System administration, Archive content

Mar 23 2019

olasd updated the task description for T1534: PostgreSQL replication issues between prado and somerset.
Mar 23 2019, 2:29 PM · System administration, Archive content
olasd updated the task description for T1534: PostgreSQL replication issues between prado and somerset.
Mar 23 2019, 10:02 AM · System administration, Archive content

Mar 22 2019

olasd updated the task description for T1534: PostgreSQL replication issues between prado and somerset.
Mar 22 2019, 11:26 PM · System administration, Archive content
olasd updated the task description for T1534: PostgreSQL replication issues between prado and somerset.
Mar 22 2019, 6:32 PM · System administration, Archive content
olasd updated the task description for T1534: PostgreSQL replication issues between prado and somerset.
Mar 22 2019, 6:00 PM · System administration, Archive content
olasd updated the task description for T1534: PostgreSQL replication issues between prado and somerset.
Mar 22 2019, 5:51 PM · System administration, Archive content
olasd updated the task description for T1534: PostgreSQL replication issues between prado and somerset.
Mar 22 2019, 5:31 PM · System administration, Archive content
olasd updated the task description for T1534: PostgreSQL replication issues between prado and somerset.
Mar 22 2019, 4:52 PM · System administration, Archive content
olasd changed the status of T1534: PostgreSQL replication issues between prado and somerset from Open to Work in Progress.

The replicated cluster is now clear to be taken down for a rebuild.

Mar 22 2019, 2:55 PM · System administration, Archive content

Mar 20 2019

zack reassigned T1549: Clean up entries in {origin_intrinsic,revision}_metadata with no metadata from zack to vlorentz.
Mar 20 2019, 12:10 PM · Archive content, Indexer

Mar 15 2019

zack added a comment to T1549: Clean up entries in {origin_intrinsic,revision}_metadata with no metadata.

@vlorentz: lather, rinse, repeat.

softwareheritage-indexer=# DELETE FROM revision_intrinsic_metadata
WHERE metadata = '{"@context": "https://doi.org/10.5063/schema/codemeta-2.0"}'::jsonb ;
ERROR:  deadlock detected
DETAIL:  Process 23900 waits for ShareLock on transaction 212164862; blocked by process 20175.
Process 20175 waits for ShareLock on transaction 212164381; blocked by process 23900.
HINT:  See server log for query details.
CONTEXT:  while deleting tuple (772424,55) in relation "revision_intrinsic_metadata"
Time: 33048,828 ms (00:33,049)

(just happened, after indexers have been restarted including D1218)

Mar 15 2019, 9:25 PM · Archive content, Indexer

Mar 4 2019

zack added a comment to T1549: Clean up entries in {origin_intrinsic,revision}_metadata with no metadata.

D1218

Once this is landed and deployed, ordering your DELETEs by revision_metadata.id will acquire locks in the same order as the idx_storage, solving the deadlock issue.

Mar 4 2019, 3:11 PM · Archive content, Indexer
vlorentz added a comment to T1549: Clean up entries in {origin_intrinsic,revision}_metadata with no metadata.

Once this is landed and deployed, ordering your DELETEs by revision_metadata.id will acquire locks in the same order as the idx_storage, solving the deadlock issue.

Mar 4 2019, 12:51 PM · Archive content, Indexer

Mar 2 2019

zack added a comment to T1549: Clean up entries in {origin_intrinsic,revision}_metadata with no metadata.

The update completed, but a first attempt at the second DELETE failed with a deadlock (?!):

softwareheritage-indexer=# DELETE FROM revision_metadata
WHERE translated_metadata = '{"@context": "https://doi.org/10.5063/schema/codemeta-2.0"}'::jsonb ;                                                                                            ERROR:  deadlock detected
DETAIL:  Process 10966 waits for ShareLock on transaction 197265813; blocked by process 11754.
Process 11754 waits for ShareLock on transaction 197264487; blocked by process 10966.
HINT:  See server log for query details.
CONTEXT:  while deleting tuple (1380733,15) in relation "revision_metadata"
Time: 170864,091 ms (02:50,864)
Mar 2 2019, 1:18 PM · Archive content, Indexer
zack added a comment to T1549: Clean up entries in {origin_intrinsic,revision}_metadata with no metadata.

The following fix for the above (suggested by @vlorentz ) is now running:

update revision_metadata
set translated_metadata = origin_intrinsic_metadata.metadata
from origin_intrinsic_metadata
where revision_metadata.id=origin_intrinsic_metadata.from_revision and revision_metadata.translated_metadata='{"@context": "https://doi.org/10.5063/schema/codemeta-2.0"}' and origin_intrinsic_metadata.metadata != '{"@context": "https://doi.org/10.5063/schema/codemeta-2.0"}';
Mar 2 2019, 9:53 AM · Archive content, Indexer

Mar 1 2019

zack added a comment to T1549: Clean up entries in {origin_intrinsic,revision}_metadata with no metadata.

As discussed on IRC, even after cleaning up origin_intrinsic_metadata, the DELETE on revision_metadata fails with:

softwareheritage-indexer=# DELETE FROM revision_metadata
softwareheritage-indexer-# WHERE translated_metadata = '{"@context": "https://doi.org/10.5063/schema/codemeta-2.0"}'::json
b ;
Mar 1 2019, 4:49 PM · Archive content, Indexer
zack claimed T1549: Clean up entries in {origin_intrinsic,revision}_metadata with no metadata.
Mar 1 2019, 3:12 PM · Archive content, Indexer
zack changed the status of T1549: Clean up entries in {origin_intrinsic,revision}_metadata with no metadata from Open to Work in Progress.

I've started the first of following queries on somerset (in a screen of my user):

DELETE FROM origin_intrinsic_metadata
WHERE metadata = '{"@context": "https://doi.org/10.5063/schema/codemeta-2.0"}'::jsonb ;
Mar 1 2019, 2:51 PM · Archive content, Indexer

Feb 27 2019

vlorentz added a revision to T1549: Clean up entries in {origin_intrinsic,revision}_metadata with no metadata: D1206: Prevent origin metadata indexer from writing empty records.
Feb 27 2019, 3:44 PM · Archive content, Indexer
vlorentz claimed T1549: Clean up entries in {origin_intrinsic,revision}_metadata with no metadata.
Feb 27 2019, 3:40 PM · Archive content, Indexer
vlorentz added a comment to T1549: Clean up entries in {origin_intrinsic,revision}_metadata with no metadata.

To know that we went over the origin and did not find anything, without additional tables.

Feb 27 2019, 11:44 AM · Archive content, Indexer

Feb 26 2019

zack added a comment to T1549: Clean up entries in {origin_intrinsic,revision}_metadata with no metadata.

I don't understand your comment. What are the remaining arguments for using NULL instead of just deleting rows?

Feb 26 2019, 6:42 PM · Archive content, Indexer
vlorentz added a comment to T1549: Clean up entries in {origin_intrinsic,revision}_metadata with no metadata.

and the NULL option

Feb 26 2019, 5:42 PM · Archive content, Indexer
zack added a comment to T1549: Clean up entries in {origin_intrinsic,revision}_metadata with no metadata.

Yes :-)
so, do we agree that the right fix for this task is just to get rid of empty-ish rows? or are there other arguments that we haven't considered yet?

Feb 26 2019, 5:41 PM · Archive content, Indexer
vlorentz added a comment to T1549: Clean up entries in {origin_intrinsic,revision}_metadata with no metadata.

Sounds like a solution for T1528 :)

Feb 26 2019, 4:12 PM · Archive content, Indexer
zack added a comment to T1549: Clean up entries in {origin_intrinsic,revision}_metadata with no metadata.

What is the provenance map?

Feb 26 2019, 3:12 PM · Archive content, Indexer
vlorentz added a comment to T1549: Clean up entries in {origin_intrinsic,revision}_metadata with no metadata.

What is the provenance map?

Feb 26 2019, 2:17 PM · Archive content, Indexer
zack added a project to T1549: Clean up entries in {origin_intrinsic,revision}_metadata with no metadata: Archive content.

My tentative proposal is to delete all table entries for which no metadata has been found.
The invariant will be: if an origin/revision has metadata, there will be an entry in the table(s); if not, the origin/revision will not appear.

Feb 26 2019, 12:53 PM · Archive content, Indexer

Feb 20 2019

olasd updated subscribers of T1534: PostgreSQL replication issues between prado and somerset.

After some more stewing and discussion with @zack, we'll be going for the "upgrade to pg 11 and restart replication from scratch" route;

Feb 20 2019, 2:32 PM · System administration, Archive content

Feb 19 2019

olasd updated subscribers of T1534: PostgreSQL replication issues between prado and somerset.

After reading some mailing list posts discussing the error message, and discussion with @ftigeot:

Feb 19 2019, 6:22 PM · System administration, Archive content
olasd added a comment to T1534: PostgreSQL replication issues between prado and somerset.

Logs on primary:

2019-02-19 14:27:44 UTC [15973]: [1-1] user=postgres,db=softwareheritage LOG:  starting logical decoding for slot "pgl_softwareheritage_prado_somerset"
2019-02-19 14:27:44 UTC [15973]: [2-1] user=postgres,db=softwareheritage DETAIL:  streaming transactions committing after 18607/9189A578, reading WAL from 18607/9189A578
2019-02-19 14:27:44 UTC [15973]: [3-1] user=postgres,db=softwareheritage ERROR:  record with incorrect prev-link 5403A/2E2F1829 at 18607/9189A578 
2019-02-19 14:27:44 UTC [15973]: [4-1] user=postgres,db=softwareheritage LOG:  could not receive data from client: Connection reset by peer
Feb 19 2019, 3:29 PM · System administration, Archive content
olasd renamed T1534: PostgreSQL replication issues between prado and somerset from PostgreSQL replication issues between prado and beaubourg to PostgreSQL replication issues between prado and somerset.
Feb 19 2019, 3:22 PM · System administration, Archive content
olasd updated subscribers of T1534: PostgreSQL replication issues between prado and somerset.
Feb 19 2019, 2:26 PM · System administration, Archive content
olasd triaged T1534: PostgreSQL replication issues between prado and somerset as High priority.
Feb 19 2019, 2:10 PM · System administration, Archive content

Nov 19 2018

douardda added a project to T846: Some objects from the original GitHub import have never actually been imported.: Restricted Project.
Nov 19 2018, 3:29 PM · Roadmap 2020, Restricted Project, Archive content

Nov 6 2018

anlambert added a comment to T1303: Web UI: empty repositories shown as 404 errors with "snapshot not found" message.

@zack, this is now deployed.

Nov 6 2018, 1:40 PM · Archive content, Web app

Nov 5 2018

zack added a comment to T1303: Web UI: empty repositories shown as 404 errors with "snapshot not found" message.

thanks! can you ping me when this fix (T1303) is deployed?

Nov 5 2018, 5:53 PM · Archive content, Web app
anlambert closed T1303: Web UI: empty repositories shown as 404 errors with "snapshot not found" message as Resolved by committing rDWAPPS4e82cd02449e: browse: Properly handle the empty snapshot.
Nov 5 2018, 5:33 PM · Archive content, Web app
anlambert claimed T1303: Web UI: empty repositories shown as 404 errors with "snapshot not found" message.
Nov 5 2018, 11:13 AM · Archive content, Web app

Nov 2 2018

zack triaged T1303: Web UI: empty repositories shown as 404 errors with "snapshot not found" message as High priority.
Nov 2 2018, 9:08 PM · Archive content, Web app

Oct 19 2018

olasd added a comment to T830: Remove tables occurrence and occurrence_history.

After a few days of rest:

Oct 19 2018, 11:30 AM · Storage manager, Archive content

Oct 18 2018

ardumont added a comment to T1159: hg loader: Schedule oneshot tasks for googlecode origin ingestion.

As other loader failure report, here is a better output:

Oct 18 2018, 11:26 AM · Archive content

Oct 17 2018

ardumont added a comment to T1159: hg loader: Schedule oneshot tasks for googlecode origin ingestion.

Next step would be to simply extract the listing and reschedule those.

Oct 17 2018, 12:12 PM · Archive content

Oct 16 2018

zack added a comment to T830: Remove tables occurrence and occurrence_history.

\o/

Oct 16 2018, 5:34 PM · Storage manager, Archive content
olasd added a comment to T830: Remove tables occurrence and occurrence_history.

Oct 16 2018, 12:21 PM · Storage manager, Archive content
olasd closed T830: Remove tables occurrence and occurrence_history as Resolved by committing rDSTO435ebcdbf412: Drop table occurrence_history.
Oct 16 2018, 12:20 PM · Storage manager, Archive content

Oct 15 2018

olasd changed the status of T1211: reingest missing early objects from Open to Work in Progress.

The low hanging fruits (i.e. stuff that is still live on GitHub) have been reimported.

Oct 15 2018, 4:28 PM · Archive content
ardumont added a comment to T1159: hg loader: Schedule oneshot tasks for googlecode origin ingestion.

Those need investigation, report...

Oct 15 2018, 4:25 PM · Archive content
olasd added a revision to T830: Remove tables occurrence and occurrence_history: D535: Drop table occurrence_history.
Oct 15 2018, 4:22 PM · Storage manager, Archive content
ardumont added a comment to T1257: Formalize the default branch convention for snapshots.

@ardumont: can you file a separate task for migrating existing snapshot in the archive? (I suspect you've clearer than me in mind which snapshots need to be migrated…) TIA

Oct 15 2018, 10:49 AM · Archive content
ardumont added a comment to T1159: hg loader: Schedule oneshot tasks for googlecode origin ingestion.

It's currently running.

Oct 15 2018, 10:46 AM · Archive content

Oct 12 2018

zack closed T1257: Formalize the default branch convention for snapshots as Resolved.

I'm closing this as it was about defining the naming convention and we have done so. I'm going to file a task about documenting it as part of the data model documentation.

Oct 12 2018, 8:07 PM · Archive content

Oct 11 2018

ardumont added a comment to T1159: hg loader: Schedule oneshot tasks for googlecode origin ingestion.

It's currently running.

Oct 11 2018, 6:10 PM · Archive content
ardumont added a comment to T1257: Formalize the default branch convention for snapshots.

As per oral discussion, from the development point of view, we should now be good:

Oct 11 2018, 3:54 PM · Archive content

Oct 10 2018

olasd closed T838: SQL storage: drop the entity tables as Resolved by committing rDSTO65e6b69eddc5: Drop unused entity tables.
Oct 10 2018, 5:00 PM · Storage manager, Archive content
olasd added a revision to T838: SQL storage: drop the entity tables: D509: Drop unused entity tables.
Oct 10 2018, 4:25 PM · Storage manager, Archive content
olasd triaged T1260: Extend the release object model to allow synthetic objects as Normal priority.
Oct 10 2018, 1:58 PM · Archive content
olasd triaged T1258: Synthesize release objects for all upstream things that match the concept of a release as Normal priority.
Oct 10 2018, 1:22 PM · Archive content
zack added a comment to T1257: Formalize the default branch convention for snapshots.
  • the default branch for snapshots is defined to be HEAD.
    • if the concept of HEAD exists with the same name in the upstream VCS (f.e. git, svn), this branch should be a literal pointer to the corresponding archived object
    • if the concept of HEAD doesn't exist with the same name in the upstream VCS (f.e. mercurial), this branch should be an alias pointing at the default branch, named using the upstream VCS context (f.e. in the mercurial case, that would be an alias for the tip of the default branch)
    • if the concept of a default branch/version doesn't exist in the upstream VCS, no HEAD branch should exist in the snapshot
Oct 10 2018, 1:21 PM · Archive content
olasd triaged T1257: Formalize the default branch convention for snapshots as High priority.
Oct 10 2018, 11:58 AM · Archive content

Oct 4 2018

zack added a comment to T838: SQL storage: drop the entity tables.

agreed, they should be removed (I've updated the task title accordingly)

Oct 4 2018, 12:19 PM · Storage manager, Archive content
zack renamed T838: SQL storage: drop the entity tables from Decide what to do with the entity tables to SQL storage: drop the entity tables.
Oct 4 2018, 12:18 PM · Storage manager, Archive content
zack removed a parent task for T1156: Fix release targets of already loaded mercurial type origins: T336: "save code now".
Oct 4 2018, 11:46 AM · Archive content

Oct 3 2018

olasd added a comment to T830: Remove tables occurrence and occurrence_history.

All the old visits have now been migrated to snapshots.

Oct 3 2018, 3:04 PM · Storage manager, Archive content