(agreed)
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Advanced Search
Nov 5 2017
Nov 4 2017
Nov 3 2017
ok, here are the building blocks I've prepared to resolve this task, as a step by step recipe:
Oct 26 2017
Oct 21 2017
and here's the actual disk usage (!= compressed size)
/srv/softwareheritage/scratch/lists $ xzcat oversize-contents.txt.xz | while read id ; do du $(swh-ls-obj $id) ; done | cut -f 1 | paste -sd+ | bc 16847900696
those are KB, so the total disk usage is ~15.5 TB (not bad!)
Oct 20 2017
as a datapoint, at the time of writing the total (uncompressed) size occupied by content objects that are larger than our current limit is as follows:
Oct 3 2017
Reopened since a subtask (or child task) is still opened (T676).
Oct 2 2017
It got restarted 2 weeks ago (Monday 18th September 2017).
It just finished (Monday 2nd October 2017).
Sep 15 2017
Sep 2 2017
*amazing*, thanks!
Sep 1 2017
I now backfilled the rrd files in munin with historic data grabbed from the content table.
The content table has a nice ctime field that will allow us to regenerate historical data. I'm looking into this now.
My current point of view is thus: we've been bitten by inconsistencies between primary and replica before, so I think the counts should run on the primary and get replicated through the standard means to replicas, even if that means stressing the primary a bit more.
In T719#13453, @zack wrote:From what you wrote I'm assuming you plan to run the cron count on the replica also when in production.
Jul 26 2017
Sounds viable and good to me. (From what you wrote I'm assuming you plan to run the cron count on the replica also when in production.)
Jul 24 2017
Well, it turns out that pg_stat_user_tables is pretty bad as well, just in a different way than pg_class is...
Jun 6 2017
As of now, ingestion, after multiple (re)schedulings, has been done.
As of now, ingestion, after multiple (re)schedulings, has been done.
May 30 2017
And of course now there's a discrepancy between the graphs (exported from statistics on the main database) and the counter (exported from real-time statistics on the replica database, on which vacuum has never been run)...
May 12 2017
May 4 2017
All updates performed.
softwareheritage=> select origin.id, count(distinct visit) from origin left join origin_visit ov on ov.origin = origin.id where type = 'ftp' group by origin.id having count(distinct visit) <> 1; ┌────┬───────┐ │ id │ count │ ├────┼───────┤ └────┴───────┘ (0 ligne)
Not with the current public API: the referential integrity of occurrences generated by the GNU injection is not verified, so the occurrences for "not version 1.2.4" are unreachable.
So with the API can we access directly version 1.3.12? aka object_id: 2994581
Apr 26 2017
It's the same explanation as for the gitorious injection (T312). Only the numbers change:
Update on this.
An update on this, this is still work in progress.
Apr 24 2017
All the directories should now have been corrected.
Apr 20 2017
Directories have all been updated.
Apr 19 2017
The queries updating the directory table with conflicting directory entries are in progress.
Apr 7 2017
The query updating the directory_entry_dir table is in progress.