Page MenuHomeSoftware Heritage

clean up bogus python object references from the archiver DB table
Closed, MigratedEdits Locked

Description

A buggy version of the archiver has been deployed for a little while, resulting in some bogus entries in the content_archive table that contain python object references instead of archive names, e.g.:

UPDATE content_archive
SET copies=jsonb_set(
    copies, '{<swh.objstorage.api.client.RemoteObjStorage object at 0x7f9fafad80b8>}',
    '{"status":"present", "mtime":1473946397}'
)
WHERE content_id='\x273ef8de2e8cfd33a5871c493540f86b9d997324'

We need to clean them up, replacing the object references with the right archive name (most likely "banco", as the archiver is only copying from uffizi to banco, but this needs to be double-checked).

Related Objects

StatusAssignedTask
Migratedgitlab-migration

Event Timeline

zack triaged this task as High priority.Sep 15 2016, 4:11 PM
zack created this task.

New archiver is indeed fixed (after package, deploy and task restarted), using pg_activity, we can glimpse equivalent queries with:

UPDATE content_archive
SET copies=jsonb_set(
    copies, '{'banco'}',
    '{"status":"present", "mtime":1473946397}'
)
WHERE content_id='\x273ef8de2e8cfd33a5871c493540f86b9d997324'

(as expected)

In T564#9550, @ardumont wrote:

WHERE content_id='\x273ef8de2e8cfd33a5871c493540f86b9d997324'

we cannot assume that's the only object reference that has been around.
We should rather look for all content_id that are *not* one of the valid archive names and fix those.

we cannot assume that's the only object reference that has been around.

I don't ^^

We should rather look for all content_id that are *not* one of the valid archive names and fix those.

Yes, indeed.

I just reused your query and adapted as to what i've seen from pg_activity.

olasd changed the task status from Open to Work in Progress.Sep 16 2016, 3:53 PM
olasd claimed this task.
olasd added a subscriber: olasd.

The update query is running on the archiver database to fix this up.

Apparently some entries are left in the database...

Performing a full dump/edit/restore cycle

Updated the 8008 lines that were still to be fixed.

ardumont added a parent task: Unknown Object (Maniphest Task).Oct 8 2016, 9:45 AM
ardumont mentioned this in Unknown Object (Maniphest Task).Oct 8 2016, 9:53 AM