Page MenuHomeSoftware Heritage

Fix bogus directory entry permissions in database
Closed, MigratedEdits Locked

Description

T685 had us generate a flurry of directory entries with bogus permissions. We need to fix them.

Probably something along the lines of :

update directory_entry_file set perms = 33188 where perms = 100644;
update directory_entry_file set perms = 33261 where perms = 100755;
update directory_entry_file set perms = 40960 where perms = 120000;
update directory_entry_dir set perms = 16384 where perms = 40000;

Event Timeline

zack moved this task from Restricted Project Column to Restricted Project Column on the Restricted Project board.Feb 22 2017, 9:17 AM

I've been thinking a bit more about this...

  • The updates won't work as is, as we might have duplicated entries, one with a bogus perms value, one with the proper perms value.
  • deduplicating the updates will mean updating directory entries to point at the updated directory_entry_file/dir, using the ever so inefficient gin indexes on the directory table

Tons of fun in perspective.

with broken_perms as (
    select *
    from directory_entry_file
    where perms = 100644
), conflicting as (
    select b.id as old_id, def.id as new_id
    from broken_perms b
    inner join directory_entry_file def
    on b.target = def.target and b.name = def.name and def.perms = 33188
), to_update as (
    select id
    from broken_perms
    where not exists (select 1 from conflicting where old_id = id)
), updated as (
    update directory_entry_file def
    set perms = 33188
    from to_update u
    where def.id = u.id
), deleted as (
    delete from directory_entry_file def
    using conflicting c
    where def.id = c.old_id
) select * from conflicting;
olasd changed the task status from Open to Work in Progress.Apr 7 2017, 5:05 PM
olasd claimed this task.

The query updating the directory_entry_dir table is in progress.

The queries updating the directory table with conflicting directory entries are in progress.

import psycopg2

ids = set(int(l.split(',')[0]) for l in open('/srv/softwareheritage/postgres/export/directory_entry_dir.tsv'))

db = psycopg2.connect('host=localhost port=5433 dbname=softwareheritage user=swhstorage password=<>')

QUERY = 'update directory set dir_entries = array(select coalesce(t.new_id, d.id) from unnest(dir_entries) d(id) left join tmp_update_dir_entries t on t.old_id = d.id) where dir_entries @> ARRAY[%s::bigint]'
i = 0

for entry in ids:
    if not (i % 1000):
        print(i)
        db.commit()
        cur = db.cursor()
    cur.execute(QUERY, (entry,))
    if cur.statusmessage != 'UPDATE 0': print(entry, cur.statusmessage)
    i += 1

db.commit()

Directories have all been updated.

I've started running the query for file entries with mode 100644. Instead of doing the delete beforehand, I'll do it after updating the directories, to avoid a repeat of T710.

All the directories should now have been corrected.

I'm running a double check to make sure we have not inserted any new bogus file entries since then, then I'll proceed to close this issue.

The directories have been corrected, but the bogus file entries have not been deleted yet.

The following query:

select count(directory.id) from directory inner join directory_entry_file tablesample system(0.1) on ARRAY[CAST (directory_entry_file.id AS bigint)] <@ file_entries where directory_entry_file.perms = 100644;

returns 0, so it should be a pretty good sign that all the directories have been successfuly removed but not the associated file entries.