Page MenuHomeSoftware Heritage
Feed Advanced Search

Feb 13 2018

ardumont updated the task description for T957: googlecode import: Check for origin clashes and fix if any.
Feb 13 2018, 6:46 PM · Archive content, Mercurial loader
ardumont renamed T957: googlecode import: Check for origin clashes and fix if any from googlecode import: Check for origin clashes to googlecode import: Check for origin clashes and fix if any.
Feb 13 2018, 3:34 PM · Archive content, Mercurial loader
ardumont changed the status of T957: googlecode import: Check for origin clashes and fix if any from Open to Work in Progress.
Feb 13 2018, 3:33 PM · Archive content, Mercurial loader
ardumont changed the status of T957: googlecode import: Check for origin clashes and fix if any, a subtask of T682: Ingest Google Code Mercurial repositories, from Open to Work in Progress.
Feb 13 2018, 3:33 PM · Archive coverage, Mercurial loader
ardumont added a comment to T957: googlecode import: Check for origin clashes and fix if any.

Yes, we are hitting the same problem.

Feb 13 2018, 3:33 PM · Archive content, Mercurial loader
ardumont added a comment to T955: googlecode import: hglib.error.CommandError during loading.

Basic checks on the archive is fine:

Feb 13 2018, 3:06 PM · Origin-GoogleCode, Archive content, Mercurial loader
ardumont added projects to T617: ingest Google Code Subversion repositories: SVN Loader, Origin-GoogleCode.
Feb 13 2018, 2:31 PM · Archive coverage, Origin-GoogleCode, SVN Loader
ardumont closed T956: googlecode import: Clean up visit wrongly targetting empty snapshot, a subtask of T682: Ingest Google Code Mercurial repositories, as Resolved.
Feb 13 2018, 2:26 PM · Archive coverage, Mercurial loader
ardumont closed T956: googlecode import: Clean up visit wrongly targetting empty snapshot as Resolved.
softwareheritage=> select count(*) from origin_visit inner join origin on origin_visit.origin = origin.id where origin.type = 'hg';
 count
--------
 126678
(1 row)
softwareheritage=> select count(*) from origin o inner join origin_visit ov on o.id=ov.origin where type='hg' and url like '%googlecode%' and ov.snapshot_id = 16;  # empty snapshot
 count
--------
 126661
(1 row)
Feb 13 2018, 2:26 PM · Archive content, Mercurial loader
ardumont added a comment to T682: Ingest Google Code Mercurial repositories.

Out of 127k (127048) only ~125k (124899, query on swh db) are referenced.

Feb 13 2018, 2:15 PM · Archive coverage, Mercurial loader
ardumont created T958: googlecode import: Clean up googlecode origin's origin_visits.
Feb 13 2018, 1:45 PM · SVN Loader, Origin-GoogleCode, Archive content
ardumont created T957: googlecode import: Check for origin clashes and fix if any.
Feb 13 2018, 12:26 PM · Archive content, Mercurial loader
ardumont renamed T955: googlecode import: hglib.error.CommandError during loading from import googlecode: hglib.error.CommandError during loading to googlecode import: hglib.error.CommandError during loading.
Feb 13 2018, 12:21 PM · Origin-GoogleCode, Archive content, Mercurial loader
ardumont renamed T956: googlecode import: Clean up visit wrongly targetting empty snapshot from googlecode import: Clean up visit targetting wrongly an empty snapshot to googlecode import: Clean up visit wrongly targetting empty snapshot.
Feb 13 2018, 12:19 PM · Archive content, Mercurial loader
ardumont created T956: googlecode import: Clean up visit wrongly targetting empty snapshot.
Feb 13 2018, 12:19 PM · Archive content, Mercurial loader
ardumont created T955: googlecode import: hglib.error.CommandError during loading.
Feb 13 2018, 12:11 PM · Origin-GoogleCode, Archive content, Mercurial loader
ardumont added a comment to T682: Ingest Google Code Mercurial repositories.

FYI all loaded repositories point to an empty snapshot.

Feb 13 2018, 10:19 AM · Archive coverage, Mercurial loader

Feb 12 2018

ardumont added a comment to T682: Ingest Google Code Mercurial repositories.

FYI all loaded repositories point to an empty snapshot.

Feb 12 2018, 5:41 PM · Archive coverage, Mercurial loader
olasd added a comment to T682: Ingest Google Code Mercurial repositories.

FYI all loaded repositories point to an empty snapshot.

Feb 12 2018, 5:26 PM · Archive coverage, Mercurial loader
ardumont added a comment to T682: Ingest Google Code Mercurial repositories.

Some error speaks for themselves (OSError, error during extraction), some are not.
I'm currently digging into this and will open dedicated tasks when deemed necessary.

Feb 12 2018, 4:29 PM · Archive coverage, Mercurial loader
ardumont added a comment to T682: Ingest Google Code Mercurial repositories.

Out of 127k (127048) only ~125k (124899, query on swh db) are referenced.

Feb 12 2018, 3:47 PM · Archive coverage, Mercurial loader

Feb 9 2018

fiendish added a comment to T682: Ingest Google Code Mercurial repositories.

yay

Feb 9 2018, 10:53 PM · Archive coverage, Mercurial loader
ardumont added a comment to T682: Ingest Google Code Mercurial repositories.

rDSNIP26ea29b2d2abf9c931ba5efcf0f49d4194254e79

Feb 9 2018, 5:52 PM · Archive coverage, Mercurial loader
ardumont claimed T682: Ingest Google Code Mercurial repositories.
Feb 9 2018, 5:51 PM · Archive coverage, Mercurial loader
ardumont changed the status of T682: Ingest Google Code Mercurial repositories from Open to Work in Progress.
Feb 9 2018, 5:49 PM · Archive coverage, Mercurial loader
ardumont changed the status of T682: Ingest Google Code Mercurial repositories, a subtask of T367: ingest Google Code repositories, from Open to Work in Progress.
Feb 9 2018, 5:49 PM · Archive coverage, Restricted Project
ardumont updated subscribers of T682: Ingest Google Code Mercurial repositories.

This is now running on our swh-workers, scheduling running on saatchi:

Feb 9 2018, 5:48 PM · Archive coverage, Mercurial loader

Feb 6 2018

ardumont closed T948: googlecode import: Loading failure on symbolic link edge cases as Resolved by committing rDLDSVN06bcb409d9af: swh.loader.svn: Fix corner edge case on symbolic link.
Feb 6 2018, 3:35 PM · Origin-GoogleCode, SVN Loader, Archive content
ardumont closed T947: googlecode import: Some dumps are just empty repository, a subtask of T879: Reschedule googlecode svn origins from scratch, as Resolved.
Feb 6 2018, 3:35 PM · Origin-GoogleCode, SVN Loader, Archive content
ardumont closed T948: googlecode import: Loading failure on symbolic link edge cases, a subtask of T879: Reschedule googlecode svn origins from scratch, as Resolved.
Feb 6 2018, 3:35 PM · Origin-GoogleCode, SVN Loader, Archive content
ardumont closed T947: googlecode import: Some dumps are just empty repository as Resolved by committing rDLDSVNde3c7a031f8b: swh.loader.svn: Deal with empty svn repository.
Feb 6 2018, 3:35 PM · Origin-GoogleCode, SVN Loader, Archive content

Feb 5 2018

ardumont added a comment to T948: googlecode import: Loading failure on symbolic link edge cases.

It appears that in this case, the properties must be changed not to the symlink but to its source.

Feb 5 2018, 5:42 PM · Origin-GoogleCode, SVN Loader, Archive content
ardumont created T948: googlecode import: Loading failure on symbolic link edge cases.
Feb 5 2018, 3:46 PM · Origin-GoogleCode, SVN Loader, Archive content
ardumont changed the status of T947: googlecode import: Some dumps are just empty repository, a subtask of T879: Reschedule googlecode svn origins from scratch, from Open to Work in Progress.
Feb 5 2018, 1:45 PM · Origin-GoogleCode, SVN Loader, Archive content
ardumont renamed T947: googlecode import: Some dumps are just empty repository from googlecode import: Some dumps starts their log to revision 0 to googlecode import: Some dumps are just empty repository.
Feb 5 2018, 1:45 PM · Origin-GoogleCode, SVN Loader, Archive content
ardumont added a comment to T947: googlecode import: Some dumps are just empty repository.

It's more empty repository case than a repository starting its commit range at 0...

Feb 5 2018, 1:37 PM · Origin-GoogleCode, SVN Loader, Archive content
ardumont created T947: googlecode import: Some dumps are just empty repository.
Feb 5 2018, 11:43 AM · Origin-GoogleCode, SVN Loader, Archive content

Feb 2 2018

ardumont added a comment to T879: Reschedule googlecode svn origins from scratch.

This is in stand-by during the snapshot migration.

Feb 2 2018, 1:44 PM · Origin-GoogleCode, SVN Loader, Archive content

Dec 21 2017

ardumont updated the task description for T682: Ingest Google Code Mercurial repositories.
Dec 21 2017, 10:51 AM · Archive coverage, Mercurial loader

Dec 20 2017

ardumont changed the status of T329: hg / mercurial loader, a subtask of T593: ingest bitbucket hg/mercurial repositories, from Open to Work in Progress.
Dec 20 2017, 11:42 AM · Archive coverage, Origin-Bitbucket
ardumont changed the status of T329: hg / mercurial loader, a subtask of T682: Ingest Google Code Mercurial repositories, from Open to Work in Progress.
Dec 20 2017, 11:42 AM · Archive coverage, Mercurial loader

Dec 14 2017

ardumont closed T676: Google Code SVN import: Examine ingestion logs for errors and list them if any, a subtask of T617: ingest Google Code Subversion repositories, as Resolved.
Dec 14 2017, 3:24 PM · Archive coverage, Origin-GoogleCode, SVN Loader
ardumont closed T847: loader-svn: Some SVN origins have occurrences that point to non-existent objects, a subtask of T617: ingest Google Code Subversion repositories, as Resolved.
Dec 14 2017, 3:23 PM · Archive coverage, Origin-GoogleCode, SVN Loader
ardumont closed T896: Clean up wrong origins, a subtask of T879: Reschedule googlecode svn origins from scratch, as Resolved.
Dec 14 2017, 3:03 PM · Origin-GoogleCode, SVN Loader, Archive content
ardumont closed T896: Clean up wrong origins as Resolved.
Dec 14 2017, 3:03 PM · Origin-GoogleCode, SVN Loader, Archive content
ardumont added a comment to T896: Clean up wrong origins.

P202 checked and ok locally.
Now asked for review as it will remove data from the main db.

Dec 14 2017, 12:06 PM · Origin-GoogleCode, SVN Loader, Archive content
ardumont changed the status of T896: Clean up wrong origins from Open to Work in Progress.
Dec 14 2017, 12:06 PM · Origin-GoogleCode, SVN Loader, Archive content
ardumont changed the status of T896: Clean up wrong origins, a subtask of T879: Reschedule googlecode svn origins from scratch, from Open to Work in Progress.
Dec 14 2017, 12:06 PM · Origin-GoogleCode, SVN Loader, Archive content
ardumont added a comment to T879: Reschedule googlecode svn origins from scratch.

After discussion with the team, it has been decided to remove from the re-scheduling the svn dumps whose compressed size exceeds 2Gib.
This reflects the same decision took for git repositories.

Dec 14 2017, 12:05 PM · Origin-GoogleCode, SVN Loader, Archive content

Dec 13 2017

ardumont reopened T847: loader-svn: Some SVN origins have occurrences that point to non-existent objects, a subtask of T617: ingest Google Code Subversion repositories, as Open.
Dec 13 2017, 11:42 AM · Archive coverage, Origin-GoogleCode, SVN Loader
ardumont created T896: Clean up wrong origins.
Dec 13 2017, 11:40 AM · Origin-GoogleCode, SVN Loader, Archive content

Dec 11 2017

ardumont added a comment to T879: Reschedule googlecode svn origins from scratch.

Scheduled back from saatchi (as i needed the producer credentials to access the queue properties):

Dec 11 2017, 5:08 PM · Origin-GoogleCode, SVN Loader, Archive content
ardumont raised the priority of T879: Reschedule googlecode svn origins from scratch from Normal to High.
Dec 11 2017, 11:03 AM · Origin-GoogleCode, SVN Loader, Archive content
ardumont changed the status of T879: Reschedule googlecode svn origins from scratch from Open to Work in Progress.
Dec 11 2017, 11:03 AM · Origin-GoogleCode, SVN Loader, Archive content
ardumont changed the status of T879: Reschedule googlecode svn origins from scratch, a subtask of T617: ingest Google Code Subversion repositories, from Open to Work in Progress.
Dec 11 2017, 11:03 AM · Archive coverage, Origin-GoogleCode, SVN Loader
ardumont updated the task description for T879: Reschedule googlecode svn origins from scratch.
Dec 11 2017, 11:01 AM · Origin-GoogleCode, SVN Loader, Archive content
ardumont created T879: Reschedule googlecode svn origins from scratch.
Dec 11 2017, 10:59 AM · Origin-GoogleCode, SVN Loader, Archive content
ardumont closed T847: loader-svn: Some SVN origins have occurrences that point to non-existent objects, a subtask of T617: ingest Google Code Subversion repositories, as Resolved.
Dec 11 2017, 10:20 AM · Archive coverage, Origin-GoogleCode, SVN Loader

Dec 9 2017

ardumont added a subtask for T617: ingest Google Code Subversion repositories: T847: loader-svn: Some SVN origins have occurrences that point to non-existent objects.
Dec 9 2017, 10:53 AM · Archive coverage, Origin-GoogleCode, SVN Loader

Dec 1 2017

ardumont added a parent task for T817: analyze bogus mimetype values in content_mimetype table: T713: Index existing contents (mimetype, language, license).
Dec 1 2017, 2:15 PM · Archive content, Indexer

Nov 23 2017

ardumont closed T817: analyze bogus mimetype values in content_mimetype table as Resolved.
Nov 23 2017, 12:30 PM · Archive content, Indexer
ardumont closed T854: clean up bogus mimetype values in content_mimetype table, a subtask of T817: analyze bogus mimetype values in content_mimetype table, as Resolved.
Nov 23 2017, 12:11 PM · Archive content, Indexer
ardumont closed T854: clean up bogus mimetype values in content_mimetype table as Resolved.
Nov 23 2017, 12:11 PM · Archive content, Indexer
ardumont closed T849: Fix bogus mimetype values detection in the mimetype indexer implementation as Resolved.
Nov 23 2017, 12:10 PM · Archive content, Indexer
ardumont closed T849: Fix bogus mimetype values detection in the mimetype indexer implementation, a subtask of T850: reschedule indexing of contents with bogus mimetype values, as Resolved.
Nov 23 2017, 12:10 PM · Archive content, Indexer
ardumont closed T849: Fix bogus mimetype values detection in the mimetype indexer implementation, a subtask of T817: analyze bogus mimetype values in content_mimetype table, as Resolved.
Nov 23 2017, 12:10 PM · Archive content, Indexer
ardumont closed T850: reschedule indexing of contents with bogus mimetype values as Resolved.
Nov 23 2017, 12:07 PM · Archive content, Indexer
ardumont closed T850: reschedule indexing of contents with bogus mimetype values, a subtask of T817: analyze bogus mimetype values in content_mimetype table, as Resolved.
Nov 23 2017, 12:07 PM · Archive content, Indexer
ardumont added a comment to T850: reschedule indexing of contents with bogus mimetype values.

The old tool is id 7, the new one is 9:

Nov 23 2017, 12:07 PM · Archive content, Indexer

Nov 22 2017

ardumont changed the status of T850: reschedule indexing of contents with bogus mimetype values from Open to Work in Progress.
Nov 22 2017, 4:01 PM · Archive content, Indexer
ardumont changed the status of T850: reschedule indexing of contents with bogus mimetype values, a subtask of T817: analyze bogus mimetype values in content_mimetype table, from Open to Work in Progress.
Nov 22 2017, 4:01 PM · Archive content, Indexer
ardumont added a comment to T850: reschedule indexing of contents with bogus mimetype values.

Depends on T761

Nov 22 2017, 4:00 PM · Archive content, Indexer
ardumont added a comment to T854: clean up bogus mimetype values in content_mimetype table.

Bogus mimetype values are identified by the following queries:

softwareheritage=> select count(*) from content_mimetype where mimetype LIKE '[%' or mimetype like '' and indexer_configuration_id=7;
 count
-------
 50733
(1 row)
Nov 22 2017, 3:59 PM · Archive content, Indexer

Nov 16 2017

ardumont added a comment to T817: analyze bogus mimetype values in content_mimetype table.

Status:

  • Final listing of bogus values: /srv/storage/space/lists/indexer/mimetype/sha1-with-bogus-values.txt.gz (50733)
  • Queue reached the sane point.
  • workers stopped.
Nov 16 2017, 9:07 AM · Archive content, Indexer

Nov 15 2017

ardumont added a comment to T817: analyze bogus mimetype values in content_mimetype table.

I am waiting for the queue to drop at 10000 as that will avoid rescheduling the already done 10000 (well except for the new bogus values :)

Nov 15 2017, 4:55 PM · Archive content, Indexer
ardumont renamed T817: analyze bogus mimetype values in content_mimetype table from analyze bogus mimetype values in content_mimetypes table to analyze bogus mimetype values in content_mimetype table.
Nov 15 2017, 4:24 PM · Archive content, Indexer
ardumont created T854: clean up bogus mimetype values in content_mimetype table.
Nov 15 2017, 4:24 PM · Archive content, Indexer
ardumont updated the task description for T850: reschedule indexing of contents with bogus mimetype values.
Nov 15 2017, 4:23 PM · Archive content, Indexer
ardumont added a comment to T817: analyze bogus mimetype values in content_mimetype table.

There might be other bogus values in the stats that I haven't noticed.

Nov 15 2017, 4:01 PM · Archive content, Indexer
ardumont added a comment to T817: analyze bogus mimetype values in content_mimetype table.

I don't see how i can easily check this though since we don't have the sha1 provenance yet.

Nov 15 2017, 12:31 PM · Archive content, Indexer
zack renamed T850: reschedule indexing of contents with bogus mimetype values from Schedule back bogus mimetype values for indexation to reschedule indexing of contents with bogus mimetype values.
Nov 15 2017, 12:25 PM · Archive content, Indexer
ardumont added a subtask for T850: reschedule indexing of contents with bogus mimetype values: T849: Fix bogus mimetype values detection in the mimetype indexer implementation.
Nov 15 2017, 11:44 AM · Archive content, Indexer
ardumont added a parent task for T849: Fix bogus mimetype values detection in the mimetype indexer implementation: T850: reschedule indexing of contents with bogus mimetype values.
Nov 15 2017, 11:44 AM · Archive content, Indexer
ardumont renamed T850: reschedule indexing of contents with bogus mimetype values from Schedule back bogus mimetype values for indexing to Schedule back bogus mimetype values for indexation.
Nov 15 2017, 11:43 AM · Archive content, Indexer
ardumont created T850: reschedule indexing of contents with bogus mimetype values.
Nov 15 2017, 11:42 AM · Archive content, Indexer
ardumont renamed T817: analyze bogus mimetype values in content_mimetype table from bogus mimetype values in content_mimetypes table to analyze bogus mimetype values in content_mimetypes table.
Nov 15 2017, 11:39 AM · Archive content, Indexer
ardumont created T849: Fix bogus mimetype values detection in the mimetype indexer implementation.
Nov 15 2017, 11:38 AM · Archive content, Indexer
ardumont added a comment to T817: analyze bogus mimetype values in content_mimetype table.

From the top of my head, i would say that i forgot to clean up those bogus values after the initial runs around december 2016.
I don't see how i can easily check this though since we don't have the sha1 provenance yet.

Nov 15 2017, 11:35 AM · Archive content, Indexer

Nov 13 2017

olasd updated the task description for T846: Some objects from the original GitHub import have never actually been imported..
Nov 13 2017, 6:59 PM · Roadmap 2020, Restricted Project, Archive content
olasd created T846: Some objects from the original GitHub import have never actually been imported..
Nov 13 2017, 6:59 PM · Roadmap 2020, Restricted Project, Archive content
seirl added a comment to T686: Fix bogus directory entry permissions in database.

The directories have been corrected, but the bogus file entries have not been deleted yet.

Nov 13 2017, 2:58 PM · Archive content, Restricted Project
zack added a comment to T809: move contents larger than current injection limit to separate object storage.

This is now done on uffizi.
Potentially remaining sub-tasks before closing this:

Nov 13 2017, 8:58 AM · Archive content

Nov 12 2017

zack claimed T809: move contents larger than current injection limit to separate object storage.

Updated SQL to also delete objects from tables that references them, e.g., the indexer ones.

Nov 12 2017, 10:53 AM · Archive content

Nov 8 2017

olasd closed T827: Reinstall pg_logical after postgres 10 upgrade as Resolved.

The replication is now functional, indexes have been recreated, and the frontend now points to the new database.

Nov 8 2017, 3:04 PM · Archive content
zack added a comment to T809: move contents larger than current injection limit to separate object storage.

thanks for the review, I've updated the SQL query accordingly

Nov 8 2017, 2:37 PM · Archive content

Nov 6 2017

olasd added a comment to T827: Reinstall pg_logical after postgres 10 upgrade.
create extension pglogical;
select pglogical.create_node(node_name := 'prado', dsn := 'host=prado.internal.softwareheritage.org port=5433 dbname=softwareheritage');
select pglogical.replication_set_add_table('default', 'content', true);
Nov 6 2017, 5:28 PM · Archive content
olasd added a comment to T809: move contents larger than current injection limit to separate object storage.

As a comment on 4, the object_id column is per-table, so you should avoid carrying it over to the skipped_content table.

Nov 6 2017, 4:23 PM · Archive content
olasd created T838: SQL storage: drop the entity tables.
Nov 6 2017, 4:20 PM · Storage manager, Archive content
olasd added a subtask for T835: Migrate away from using sha1s as foreign keys in the database: T698: Migrate the content store to a new (internal) primary key scheme.
Nov 6 2017, 2:38 PM · Archive content
olasd created T835: Migrate away from using sha1s as foreign keys in the database.
Nov 6 2017, 2:04 PM · Archive content