Page MenuHomeSoftware Heritage
Feed Advanced Search

Nov 5 2017

olasd added a parent task for T830: Remove tables occurrence and occurrence_history: T565: embrace repository snapshot object in the data model (meta task).
Nov 5 2017, 9:29 PM · Storage manager, Archive content
olasd created T830: Remove tables occurrence and occurrence_history.
Nov 5 2017, 9:28 PM · Storage manager, Archive content
zack added a comment to T829: Remove duplication between fetch_history and origin_visit.

(agreed)

Nov 5 2017, 8:27 PM · Storage manager, Archive content
olasd triaged T829: Remove duplication between fetch_history and origin_visit as Normal priority.
Nov 5 2017, 7:53 PM · Storage manager, Archive content

Nov 4 2017

olasd created T829: Remove duplication between fetch_history and origin_visit.
Nov 4 2017, 3:28 PM · Storage manager, Archive content

Nov 3 2017

zack changed the status of T809: move contents larger than current injection limit to separate object storage from Open to Work in Progress.

ok, here are the building blocks I've prepared to resolve this task, as a step by step recipe:

Nov 3 2017, 2:36 PM · Archive content
olasd created T827: Reinstall pg_logical after postgres 10 upgrade.
Nov 3 2017, 11:22 AM · Archive content

Oct 26 2017

zack added a comment to T817: analyze bogus mimetype values in content_mimetype table.

FTR, the query I've used to generate the stats is:


(the encoding there is needed due to T818)

Oct 26 2017, 12:37 PM · Archive content, Indexer
zack created T817: analyze bogus mimetype values in content_mimetype table.
Oct 26 2017, 12:36 PM · Archive content, Indexer

Oct 21 2017

zack updated subscribers of T809: move contents larger than current injection limit to separate object storage.
Oct 21 2017, 1:00 PM · Archive content
zack added a comment to T809: move contents larger than current injection limit to separate object storage.

and here's the actual disk usage (!= compressed size)

/srv/softwareheritage/scratch/lists $ xzcat oversize-contents.txt.xz | while read id ; do du $(swh-ls-obj $id) ; done | cut -f 1 | paste -sd+ | bc
16847900696

those are KB, so the total disk usage is ~15.5 TB (not bad!)

Oct 21 2017, 12:59 PM · Archive content

Oct 20 2017

zack added a comment to T809: move contents larger than current injection limit to separate object storage.

as a datapoint, at the time of writing the total (uncompressed) size occupied by content objects that are larger than our current limit is as follows:

Oct 20 2017, 5:39 PM · Archive content
zack created T809: move contents larger than current injection limit to separate object storage.
Oct 20 2017, 4:59 PM · Archive content

Oct 3 2017

ardumont renamed T312: Gitorious import: ingest repositories from ingest Gitorious repositories to Gitorious import: ingest repositories.
Oct 3 2017, 10:14 AM · Archive coverage, Restricted Project, Origin-Gitorious, Format-Git
ardumont added a subtask for T312: Gitorious import: ingest repositories: T674: Gitorious import: Examine ingestion logs for errors and list them if any.
Oct 3 2017, 10:14 AM · Archive coverage, Restricted Project, Origin-Gitorious, Format-Git
ardumont removed a parent task for T312: Gitorious import: ingest repositories: T674: Gitorious import: Examine ingestion logs for errors and list them if any.
Oct 3 2017, 10:13 AM · Archive coverage, Restricted Project, Origin-Gitorious, Format-Git
ardumont added a subtask for T617: ingest Google Code Subversion repositories: T328: svn / subversion loader.
Oct 3 2017, 9:56 AM · Archive coverage, Origin-GoogleCode, SVN Loader
ardumont removed a parent task for T617: ingest Google Code Subversion repositories: T328: svn / subversion loader.
Oct 3 2017, 9:55 AM · Archive coverage, Origin-GoogleCode, SVN Loader
ardumont added a comment to T617: ingest Google Code Subversion repositories.

Reopened since a subtask (or child task) is still opened (T676).

Oct 3 2017, 9:50 AM · Archive coverage, Origin-GoogleCode, SVN Loader
ardumont reopened T617: ingest Google Code Subversion repositories as "Open".
Oct 3 2017, 9:49 AM · Archive coverage, Origin-GoogleCode, SVN Loader
ardumont reopened T617: ingest Google Code Subversion repositories, a subtask of T367: ingest Google Code repositories, as Open.
Oct 3 2017, 9:49 AM · Archive coverage, Restricted Project
ardumont added a parent task for T617: ingest Google Code Subversion repositories: T328: svn / subversion loader.
Oct 3 2017, 9:47 AM · Archive coverage, Origin-GoogleCode, SVN Loader
ardumont removed a subtask for T617: ingest Google Code Subversion repositories: T328: svn / subversion loader.
Oct 3 2017, 9:46 AM · Archive coverage, Origin-GoogleCode, SVN Loader
ardumont added a subtask for T617: ingest Google Code Subversion repositories: T676: Google Code SVN import: Examine ingestion logs for errors and list them if any.
Oct 3 2017, 9:46 AM · Archive coverage, Origin-GoogleCode, SVN Loader
ardumont removed a parent task for T617: ingest Google Code Subversion repositories: T676: Google Code SVN import: Examine ingestion logs for errors and list them if any.
Oct 3 2017, 9:46 AM · Archive coverage, Origin-GoogleCode, SVN Loader

Oct 2 2017

ardumont closed T617: ingest Google Code Subversion repositories, a subtask of T367: ingest Google Code repositories, as Resolved.
Oct 2 2017, 4:22 PM · Archive coverage, Restricted Project
ardumont closed T617: ingest Google Code Subversion repositories as Resolved.
Oct 2 2017, 4:22 PM · Archive coverage, Origin-GoogleCode, SVN Loader
ardumont added a comment to T617: ingest Google Code Subversion repositories.

It got restarted 2 weeks ago (Monday 18th September 2017).
It just finished (Monday 2nd October 2017).

Oct 2 2017, 4:22 PM · Archive coverage, Origin-GoogleCode, SVN Loader

Sep 15 2017

ardumont closed T673: ingest Google Code Git repositories as Resolved.
Sep 15 2017, 3:28 PM · Archive coverage
ardumont closed T673: ingest Google Code Git repositories, a subtask of T367: ingest Google Code repositories, as Resolved.
Sep 15 2017, 3:28 PM · Archive coverage, Restricted Project

Sep 2 2017

zack added a comment to T719: Get a better estimated object count for the archive.

*amazing*, thanks!

Sep 2 2017, 10:39 PM · Archive content

Sep 1 2017

olasd added a comment to T719: Get a better estimated object count for the archive.

I now backfilled the rrd files in munin with historic data grabbed from the content table.

Sep 1 2017, 6:41 PM · Archive content
olasd added a comment to T719: Get a better estimated object count for the archive.

The content table has a nice ctime field that will allow us to regenerate historical data. I'm looking into this now.

Sep 1 2017, 2:36 PM · Archive content
zack added a comment to T719: Get a better estimated object count for the archive.

My current point of view is thus: we've been bitten by inconsistencies between primary and replica before, so I think the counts should run on the primary and get replicated through the standard means to replicas, even if that means stressing the primary a bit more.

Sep 1 2017, 11:07 AM · Archive content
olasd added a comment to T719: Get a better estimated object count for the archive.
In T719#13453, @zack wrote:

From what you wrote I'm assuming you plan to run the cron count on the replica also when in production.

Sep 1 2017, 10:40 AM · Archive content
olasd closed T719: Get a better estimated object count for the archive as Resolved by committing rDSTO20f47ddc1d27: sql/swh-func: keep a cache of exact object counts as a table.
Sep 1 2017, 10:27 AM · Archive content

Jul 26 2017

zack added a comment to T719: Get a better estimated object count for the archive.

Sounds viable and good to me. (From what you wrote I'm assuming you plan to run the cron count on the replica also when in production.)

Jul 26 2017, 10:54 AM · Archive content

Jul 24 2017

olasd reopened T719: Get a better estimated object count for the archive as "Open".

Well, it turns out that pg_stat_user_tables is pretty bad as well, just in a different way than pg_class is...

Jul 24 2017, 3:45 PM · Archive content

Jun 6 2017

ardumont added a comment to T673: ingest Google Code Git repositories.

As of now, ingestion, after multiple (re)schedulings, has been done.

Jun 6 2017, 1:35 PM · Archive coverage
ardumont added a comment to T312: Gitorious import: ingest repositories.

As of now, ingestion, after multiple (re)schedulings, has been done.

Jun 6 2017, 1:34 PM · Archive coverage, Restricted Project, Origin-Gitorious, Format-Git

May 30 2017

olasd closed T719: Get a better estimated object count for the archive as Resolved by committing rDSTO8e97754a92c6: sql/swh-func: use pg_stat_user_tables for more accurate tuple statistics.
May 30 2017, 1:09 PM · Archive content
olasd added a comment to T719: Get a better estimated object count for the archive.

And of course now there's a discrepancy between the graphs (exported from statistics on the main database) and the counter (exported from real-time statistics on the replica database, on which vacuum has never been run)...

May 30 2017, 12:11 PM · Archive content

May 12 2017

olasd created T719: Get a better estimated object count for the archive.
May 12 2017, 3:58 PM · Archive content

May 4 2017

olasd closed T714: Visits for the GNU injection don't contain all the occurrences. as Resolved.

All updates performed.

May 4 2017, 2:50 PM · Archive content
olasd changed the status of T714: Visits for the GNU injection don't contain all the occurrences. from Open to Work in Progress.
softwareheritage=> select origin.id, count(distinct visit) from origin left join origin_visit ov on ov.origin = origin.id where type = 'ftp' group by origin.id having count(distinct visit) <> 1;
┌────┬───────┐
│ id │ count │
├────┼───────┤
└────┴───────┘
(0 ligne)
May 4 2017, 2:11 PM · Archive content
olasd added a comment to T714: Visits for the GNU injection don't contain all the occurrences..

Not with the current public API: the referential integrity of occurrences generated by the GNU injection is not verified, so the occurrences for "not version 1.2.4" are unreachable.

May 4 2017, 1:57 PM · Archive content
moranegg added a comment to T714: Visits for the GNU injection don't contain all the occurrences..

So with the API can we access directly version 1.3.12? aka object_id: 2994581

May 4 2017, 1:51 PM · Archive content
olasd created T714: Visits for the GNU injection don't contain all the occurrences..
May 4 2017, 1:44 PM · Archive content

Apr 26 2017

ardumont added a comment to T673: ingest Google Code Git repositories.

It's the same explanation as for the gitorious injection (T312). Only the numbers change:

Apr 26 2017, 10:44 AM · Archive coverage
ardumont added a comment to T312: Gitorious import: ingest repositories.

Update on this.

Apr 26 2017, 10:41 AM · Archive coverage, Restricted Project, Origin-Gitorious, Format-Git
ardumont updated subscribers of T617: ingest Google Code Subversion repositories.

An update on this, this is still work in progress.

Apr 26 2017, 10:16 AM · Archive coverage, Origin-GoogleCode, SVN Loader

Apr 24 2017

olasd added a comment to T686: Fix bogus directory entry permissions in database.

All the directories should now have been corrected.

Apr 24 2017, 3:03 PM · Archive content, Restricted Project

Apr 20 2017

olasd added a comment to T686: Fix bogus directory entry permissions in database.

Directories have all been updated.

Apr 20 2017, 1:07 PM · Archive content, Restricted Project

Apr 19 2017

olasd added a comment to T686: Fix bogus directory entry permissions in database.

The queries updating the directory table with conflicting directory entries are in progress.

Apr 19 2017, 12:31 PM · Archive content, Restricted Project

Apr 7 2017

olasd changed the status of T686: Fix bogus directory entry permissions in database from Open to Work in Progress.

The query updating the directory_entry_dir table is in progress.

Apr 7 2017, 5:05 PM · Archive content, Restricted Project
zack edited projects for T376: ingest git.eclipse.org repositories, added: Archive content; removed General.
Apr 7 2017, 11:06 AM · Archive coverage
zack added a project to T419: ingest PyPI into the Software Heritage archive (meta task): Archive content.
Apr 7 2017, 11:06 AM · Archive coverage, Origin-Pypi
zack edited projects for T561: ingest bitbucket (meta task), added: Archive content; removed General.
Apr 7 2017, 11:06 AM · Archive coverage, Origin-Bitbucket
zack added a project to T592: ingest bitbucket git repositories: Archive content.
Apr 7 2017, 11:06 AM · Archive coverage, Origin-Bitbucket
zack added a project to T593: ingest bitbucket hg/mercurial repositories: Archive content.
Apr 7 2017, 11:06 AM · Archive coverage, Origin-Bitbucket
zack added a project to T682: Ingest Google Code Mercurial repositories: Archive content.
Apr 7 2017, 11:06 AM · Archive coverage, Mercurial loader
zack removed a project from T367: ingest Google Code repositories: General.
Apr 7 2017, 11:00 AM · Archive coverage, Restricted Project
zack added a project to T312: Gitorious import: ingest repositories: Archive content.
Apr 7 2017, 11:00 AM · Archive coverage, Restricted Project, Origin-Gitorious, Format-Git
zack added a project to T367: ingest Google Code repositories: Archive content.
Apr 7 2017, 10:59 AM · Archive coverage, Restricted Project
zack edited projects for T75: Check integrity of directories, revisions, and releases, added: Archive content; removed Storage manager.
Apr 7 2017, 10:59 AM · Archive content, Restricted Project
zack added a project to T686: Fix bogus directory entry permissions in database: Archive content.
Apr 7 2017, 10:59 AM · Archive content, Restricted Project
zack added a project to T673: ingest Google Code Git repositories: Archive content.
Apr 7 2017, 10:58 AM · Archive coverage
zack added a project to T617: ingest Google Code Subversion repositories: Archive content.
Apr 7 2017, 10:58 AM · Archive coverage, Origin-GoogleCode, SVN Loader
zack created Archive content.
Apr 7 2017, 10:58 AM