I just simplified the regexp to allow qualifiers permutation: https://github.com/inveniosoftware/idutils/blob/cc09640ffb457bab3cfe8d0eeb4822dd521fd36d/idutils/__init__.py#L245-L249
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Advanced Search
Jun 9 2020
Is there a way to improve the regex in https://github.com/inveniosoftware/idutils/pull/60 to allow qualifiers to come in any order instead of the canonical one?
PR submitted: https://github.com/inveniosoftware/idutils/pull/60
Mar 24 2020
Feb 19 2020
Jan 29 2020
Jan 23 2020
List of revisions with no parents (1259):
Dec 16 2019
Dec 13 2019
Nov 18 2019
I've used swh-graph to lookup the 74 still missing contents, I've managed to find 67 of them, see cnt→ori mapping in (tracing them back to actual origins requires T2045):
Jul 3 2019
Jun 20 2019
151 contents have been restored with help from the provenance index, thanks to @grouss.
Jun 17 2019
Apr 24 2019
We should investigate why they are there.
This is now done, aside from a minor issue noted below:
softwareheritage-indexer=# select count(*) from revision_intrinsic_metadata where metadata = '{"@context": "https://doi.org/10.5063/schema/codemeta-2.0"}'::jsonb; count ------- 0 (1 row)
Apr 3 2019
Apr 2 2019
Mar 25 2019
The replication process from prado to somerset is now complete, and the archive frontend has been switched over to this database.
Mar 23 2019
Mar 22 2019
The replicated cluster is now clear to be taken down for a rebuild.
Mar 20 2019
Mar 15 2019
@vlorentz: lather, rinse, repeat.
softwareheritage-indexer=# DELETE FROM revision_intrinsic_metadata WHERE metadata = '{"@context": "https://doi.org/10.5063/schema/codemeta-2.0"}'::jsonb ; ERROR: deadlock detected DETAIL: Process 23900 waits for ShareLock on transaction 212164862; blocked by process 20175. Process 20175 waits for ShareLock on transaction 212164381; blocked by process 23900. HINT: See server log for query details. CONTEXT: while deleting tuple (772424,55) in relation "revision_intrinsic_metadata" Time: 33048,828 ms (00:33,049)
(just happened, after indexers have been restarted including D1218)
Mar 4 2019
In T1549#29103, @vlorentz wrote:Once this is landed and deployed, ordering your DELETEs by revision_metadata.id will acquire locks in the same order as the idx_storage, solving the deadlock issue.
Once this is landed and deployed, ordering your DELETEs by revision_metadata.id will acquire locks in the same order as the idx_storage, solving the deadlock issue.
Mar 2 2019
The update completed, but a first attempt at the second DELETE failed with a deadlock (?!):
softwareheritage-indexer=# DELETE FROM revision_metadata WHERE translated_metadata = '{"@context": "https://doi.org/10.5063/schema/codemeta-2.0"}'::jsonb ; ERROR: deadlock detected DETAIL: Process 10966 waits for ShareLock on transaction 197265813; blocked by process 11754. Process 11754 waits for ShareLock on transaction 197264487; blocked by process 10966. HINT: See server log for query details. CONTEXT: while deleting tuple (1380733,15) in relation "revision_metadata" Time: 170864,091 ms (02:50,864)
The following fix for the above (suggested by @vlorentz ) is now running:
update revision_metadata set translated_metadata = origin_intrinsic_metadata.metadata from origin_intrinsic_metadata where revision_metadata.id=origin_intrinsic_metadata.from_revision and revision_metadata.translated_metadata='{"@context": "https://doi.org/10.5063/schema/codemeta-2.0"}' and origin_intrinsic_metadata.metadata != '{"@context": "https://doi.org/10.5063/schema/codemeta-2.0"}';
Mar 1 2019
As discussed on IRC, even after cleaning up origin_intrinsic_metadata, the DELETE on revision_metadata fails with:
softwareheritage-indexer=# DELETE FROM revision_metadata softwareheritage-indexer-# WHERE translated_metadata = '{"@context": "https://doi.org/10.5063/schema/codemeta-2.0"}'::json b ;
I've started the first of following queries on somerset (in a screen of my user):
DELETE FROM origin_intrinsic_metadata WHERE metadata = '{"@context": "https://doi.org/10.5063/schema/codemeta-2.0"}'::jsonb ;
Feb 27 2019
To know that we went over the origin and did not find anything, without additional tables.
Feb 26 2019
I don't understand your comment. What are the remaining arguments for using NULL instead of just deleting rows?
and the NULL option
Yes :-)
so, do we agree that the right fix for this task is just to get rid of empty-ish rows? or are there other arguments that we haven't considered yet?
Sounds like a solution for T1528 :)
In T1549#28882, @vlorentz wrote:What is the provenance map?
What is the provenance map?
My tentative proposal is to delete all table entries for which no metadata has been found.
The invariant will be: if an origin/revision has metadata, there will be an entry in the table(s); if not, the origin/revision will not appear.
Feb 20 2019
After some more stewing and discussion with @zack, we'll be going for the "upgrade to pg 11 and restart replication from scratch" route;
Feb 19 2019
After reading some mailing list posts discussing the error message, and discussion with @ftigeot:
Logs on primary:
2019-02-19 14:27:44 UTC [15973]: [1-1] user=postgres,db=softwareheritage LOG: starting logical decoding for slot "pgl_softwareheritage_prado_somerset" 2019-02-19 14:27:44 UTC [15973]: [2-1] user=postgres,db=softwareheritage DETAIL: streaming transactions committing after 18607/9189A578, reading WAL from 18607/9189A578 2019-02-19 14:27:44 UTC [15973]: [3-1] user=postgres,db=softwareheritage ERROR: record with incorrect prev-link 5403A/2E2F1829 at 18607/9189A578 2019-02-19 14:27:44 UTC [15973]: [4-1] user=postgres,db=softwareheritage LOG: could not receive data from client: Connection reset by peer
Nov 19 2018
Nov 6 2018
@zack, this is now deployed.
Nov 5 2018
thanks! can you ping me when this fix (T1303) is deployed?
Nov 2 2018
Oct 19 2018
After a few days of rest:
Oct 18 2018
As other loader failure report, here is a better output:
Oct 17 2018
Next step would be to simply extract the listing and reschedule those.
Oct 16 2018
\o/
Oct 15 2018
The low hanging fruits (i.e. stuff that is still live on GitHub) have been reimported.
Those need investigation, report...
@ardumont: can you file a separate task for migrating existing snapshot in the archive? (I suspect you've clearer than me in mind which snapshots need to be migrated…) TIA
It's currently running.
Oct 12 2018
I'm closing this as it was about defining the naming convention and we have done so. I'm going to file a task about documenting it as part of the data model documentation.
Oct 11 2018
It's currently running.
As per oral discussion, from the development point of view, we should now be good:
Oct 10 2018
- the default branch for snapshots is defined to be HEAD.
- if the concept of HEAD exists with the same name in the upstream VCS (f.e. git, svn), this branch should be a literal pointer to the corresponding archived object
- if the concept of HEAD doesn't exist with the same name in the upstream VCS (f.e. mercurial), this branch should be an alias pointing at the default branch, named using the upstream VCS context (f.e. in the mercurial case, that would be an alias for the tip of the default branch)
- if the concept of a default branch/version doesn't exist in the upstream VCS, no HEAD branch should exist in the snapshot
Oct 4 2018
agreed, they should be removed (I've updated the task title accordingly)
Oct 3 2018
All the old visits have now been migrated to snapshots.