Event Timeline
Last edit to write all data to "temporary" table prior to actually cleanup (it was only done for release and snapshot so far. It's also done for origin_visit and fetch_history now).
Only the "mark data to cleanup" step has been run on prado so far [1]. In effect, the tables temp_to_cleanup_* are populated with the cleanup data now.
Spot checks:
temp_to_cleanup_origin_visit/temp_to_cleanup_fetch_history:
Check that all origins come from google code.
select distinct substring(url for 24) from origin where exists ( select 1 from temp_to_cleanup_origin_visit where origin.id = temp_to_cleanup_origin_visit.origin ) and substring(url from length(url) - 18) != '.googlecode.com/hg/';
returns only https://code.google.com/.
temp_to_cleanup_release:
Check all releases have the shape of bogus mercurial releases:
with to_check as ( select distinct(id) from temp_to_cleanup_release ) select distinct author, target_type, date from release where exists ( select 1 from to_check where to_check.id = release.id );
returns
author | target_type | date | comment | synthetic ----------+-------------+------+---------+----------- 13868695 | revision | | | f
Check all releases point to nowhere
with to_check as ( select distinct id, target from temp_to_cleanup_release inner join release using (id) ) select id from revision where exists ( select 1 from to_check where to_check.target = revision.id );
returns 0 rows.
temp_to_cleanup_snapshot_branches:
Check all origin_visits to be removed point to one of these snapshots.
with to_check as ( select distinct snapshot_id from temp_to_cleanup_origin_visit inner join origin_visit using (origin, visit) ) select distinct snapshot_id from temp_to_cleanup_snapshot_branches t where not exists ( select 1 from to_check where to_check.snapshot_id = t.snapshot_id );
returns 0 rows.
Vice versa, check all snapshots are referenced by only these origin_visits.
with to_check as ( select distinct snapshot_id from temp_to_cleanup_snapshot_branches ) select distinct origin, visit from origin_visit where exists ( select 1 from to_check where to_check.snapshot_id = origin_visit.snapshot_id ) and not exists ( select 1 from temp_to_cleanup_origin_visit t where t.origin = origin_visit.origin and t.visit = origin_visit.visit );
returns 0 rows.
Check all branches marked as to be removed are not referenced in any other snapshots
select count(distinct branch_id) from snapshot_branches where exists ( select 1 from temp_to_cleanup_snapshot_branches t where t.branch_id = snapshot_branches.branch_id ) and not exists ( select 1 from temp_to_cleanup_snapshot_branches t where t.snapshot_id = snapshot_branches.snapshot_id );
returns 0.
I think we're safe for removal of these objects.