Keep an up to date count of the number of objects in each archive
Closed, MigratedEdits Locked
Actions

Assigned To

Authored By

	olasd
	Feb 7 2017, 6:58 PM

Description

We need to keep a running count of the number of objects in each of our archives, and to publish that.

Scanning the 3 billion rows of the archiver table is not a reasonable option, as it takes multiple hours: we need to do something smarter.

One proposal is to keep a running count for all the objects, bucketed by the last bytes of the id, updated via a trigger on the content_archive table.

We can then run a full count by just using a sum on the few hundred thousand entries of that table.

Revisions and Commits

rDSTO Storage manager
	rDSTOc6abed2ca3b7 sql/archiver: get the count of objects in each archive

Related Objects

Mentioned Here: rDSTO598114c5da2e: sql/archiver: keep archive counts using a bucketed list

Event Timeline

The counting strategy has been implemented in rDSTO598114c5da.

The initial, incremental counting of the objects in each archive is running in a screen on prado.

We will then be able to add the trigger to keep the counts updated.

olasd closed this task as Resolved by committing rDSTOc6abed2ca3b7: sql/archiver: get the count of objects in each archive.Feb 9 2017, 6:41 PM

olasd added a commit: rDSTOc6abed2ca3b7: sql/archiver: get the count of objects in each archive.

The initial count has been done:

mkfifo /tmp/fifo

\copy (select substring(content_id from 19) as bucket, jbe.key as archive
        from content_archive
        join lateral jsonb_each(copies) jbe on true
        where jbe.value->>'status' = 'present') to /tmp/fifo

from collections import Counter

f = open('/tmp/fifo', 'r')
c = Counter(tuple(l.strip().split()) for l in f)
f.close()

out = open('/tmp/buckets', 'w')
for (bucket, archive), count in c.items():
    print(archive, bucket, count, sep='\t', file=out)
out.close()

\copy content_archive_counts (archive, bucket, count) from '/tmp/buckets'

The trigger to update the table has been enabled.

zack added a project: Restricted Project.Feb 13 2017, 3:33 PM

This task has been migrated to GitLab.

Keep an up to date count of the number of objects in each archiveClosed, MigratedEdits LockedActions

Description

Revisions and Commits

Related Objects

Event Timeline

Keep an up to date count of the number of objects in each archive
Closed, MigratedEdits Locked
Actions