Page MenuHomeSoftware Heritage
Feed Advanced Search

Sep 1 2016

ardumont added a comment to T553: Open api endpoint /api/1/provenance/ to read a content's provenance information.

@jbertran Indeed, for the moment locally, it's more /api/1/provenance/

Sep 1 2016, 10:43 AM · Web app, General
jbertran added a comment to T553: Open api endpoint /api/1/provenance/ to read a content's provenance information.

/browse/<>/ should probably be replaced by a more significant/useful route as well.

Sep 1 2016, 10:41 AM · Web app, General

Aug 30 2016

ardumont added a subtask for T547: Azure prototype: Content provenance information API: Unknown Object (Maniphest Task).
Aug 30 2016, 5:21 PM · General
ardumont removed parent tasks for T552: List and populate contents per revision in cache_content_revision (api endpoint): Unknown Object (Maniphest Task), T547: Azure prototype: Content provenance information API.
Aug 30 2016, 5:19 PM · Storage manager, General
ardumont removed a subtask for T547: Azure prototype: Content provenance information API: T552: List and populate contents per revision in cache_content_revision (api endpoint).
Aug 30 2016, 5:19 PM · General
ardumont added a parent task for T552: List and populate contents per revision in cache_content_revision (api endpoint): Unknown Object (Maniphest Task).
Aug 30 2016, 5:17 PM · Storage manager, General
ardumont renamed T552: List and populate contents per revision in cache_content_revision (api endpoint) from List and populate contents per revision in cache_content_revision to List and populate contents per revision in cache_content_revision (api endpoint).
Aug 30 2016, 5:05 PM · Storage manager, General
ardumont closed T552: List and populate contents per revision in cache_content_revision (api endpoint) as Resolved by committing rDSTO490d82da3335: Open cache_content_revision_add(revision_id) api endpoint.
Aug 30 2016, 4:58 PM · Storage manager, General
ardumont closed T552: List and populate contents per revision in cache_content_revision (api endpoint), a subtask of T547: Azure prototype: Content provenance information API, as Resolved.
Aug 30 2016, 4:58 PM · General
olasd closed T550: Add cache tables for provenance information API, a subtask of T547: Azure prototype: Content provenance information API, as Resolved.
Aug 30 2016, 2:53 PM · General
ardumont renamed T552: List and populate contents per revision in cache_content_revision (api endpoint) from List and population contents per revision in cache_content_revision to List and populate contents per revision in cache_content_revision.
Aug 30 2016, 2:37 PM · Storage manager, General
ardumont renamed T552: List and populate contents per revision in cache_content_revision (api endpoint) from List contents per revision to List and population contents per revision in cache_content_revision.
Aug 30 2016, 2:37 PM · Storage manager, General
zack added a parent task for T552: List and populate contents per revision in cache_content_revision (api endpoint): Unknown Object (Maniphest Task).
Aug 30 2016, 2:04 PM · Storage manager, General
zack added a parent task for T551: List interesting origins for the content provenance information prototype: Unknown Object (Maniphest Task).
Aug 30 2016, 2:04 PM · General
zack added a parent task for T547: Azure prototype: Content provenance information API: Unknown Object (Maniphest Task).
Aug 30 2016, 12:53 PM · General
ardumont claimed T552: List and populate contents per revision in cache_content_revision (api endpoint).
Aug 30 2016, 12:31 PM · Storage manager, General
ardumont updated the task description for T553: Open api endpoint /api/1/provenance/ to read a content's provenance information.
Aug 30 2016, 12:31 PM · Web app, General
ardumont created T553: Open api endpoint /api/1/provenance/ to read a content's provenance information.
Aug 30 2016, 12:30 PM · Web app, General
ardumont created T552: List and populate contents per revision in cache_content_revision (api endpoint).
Aug 30 2016, 12:20 PM · Storage manager, General
olasd created T551: List interesting origins for the content provenance information prototype.
Aug 30 2016, 12:15 PM · General
olasd added a comment to T547: Azure prototype: Content provenance information API.
In T547#9188, @zack wrote:

local cache: we consider "revisions we haven't seen" to be revisions not seen in the past for a specific origin (the one being visited)

As per yesterday's F2F discussion, we are going to experiment (first) with 2.B (new revisions only with local cache).

The rationale is twofold:

  • there is no loss of information with it (if we want, we can always further "unroll" transitive revisions later)
Aug 30 2016, 11:20 AM · General
zack added a project to T547: Azure prototype: Content provenance information API: General.

Ack on all the above. Just a precision on the revisions→origin mapping.

Aug 30 2016, 11:09 AM · General

Aug 16 2016

olasd edited subtasks for T359: Indexers: batch content analyzer infrastructure, added: T528: swh-journal: Create a journal client listing objects of a given type; removed: T424: swh-journal: persistent journal infrastructure to record additions to the swh-storage.
Aug 16 2016, 6:35 PM · Indexer, General

Jul 26 2016

olasd added a parent task for T358: doc: high-level architecture diagram: T509: Generate and publish Software Heritage Development Documentation.
Jul 26 2016, 12:56 PM · Documentation

Jul 22 2016

zack created T508: prototype: git archive from SWH.
Jul 22 2016, 10:20 AM · Vault, General

Jul 19 2016

zack placed T358: doc: high-level architecture diagram up for grabs.
Jul 19 2016, 4:44 PM · Documentation

Jul 13 2016

ardumont reopened T328: svn / subversion loader, a subtask of T367: ingest Google Code repositories, as Work in Progress.
Jul 13 2016, 7:01 PM · Archive coverage, Restricted Project

Jul 11 2016

ardumont updated the task description for T341: store ingested tarball artifacts (in a separate storage area).
Jul 11 2016, 5:08 PM · General

Jun 26 2016

zack closed T268: ontology - document on the wiki the data model terminology as Resolved.
Jun 26 2016, 6:47 PM · General, Restricted Project
rdicosmo added a comment to T268: ontology - document on the wiki the data model terminology.

Still missing quite a few entries but the foundations are there, so Ok for
closing this.... it will grow by itself now

Jun 26 2016, 6:33 PM · General, Restricted Project

Jun 25 2016

zack edited projects for T268: ontology - document on the wiki the data model terminology, added: General; removed Wiki.

I've reviewed @olasd entries and completed some more.
I've also standardized how we add common information to terms, e.g. "Examples:" sections, or "Also known as:" and "Note:".

Jun 25 2016, 9:03 PM · General, Restricted Project

Jun 14 2016

ardumont added a subtask for T367: ingest Google Code repositories: T328: svn / subversion loader.
Jun 14 2016, 12:02 PM · Archive coverage, Restricted Project

Jun 13 2016

zack added a parent task for T359: Indexers: batch content analyzer infrastructure: T439: Indexers: compute (and maintain up-to-date) the filetype of all blobs.
Jun 13 2016, 4:06 PM · Indexer, General

May 31 2016

zack moved T268: ontology - document on the wiki the data model terminology from Restricted Project Column to Restricted Project Column on the Restricted Project board.
May 31 2016, 11:27 AM · General, Restricted Project
ardumont closed T397: Check retrieved archives from googlecode as Resolved.
May 31 2016, 10:17 AM · General
ardumont closed T397: Check retrieved archives from googlecode, a subtask of T367: ingest Google Code repositories, as Resolved.
May 31 2016, 10:17 AM · Archive coverage, Restricted Project
ardumont added a comment to T397: Check retrieved archives from googlecode.

No more errors.

May 31 2016, 10:17 AM · General

May 30 2016

rdicosmo added a project to T268: ontology - document on the wiki the data model terminology: Restricted Project.

We need at least a draft of the glossary in good shape before the grand opening.
Remember to do this on: https://wg.softwareheritage.org/index.php?title=Glossary

May 30 2016, 8:32 PM · General, Restricted Project
ardumont added a comment to T397: Check retrieved archives from googlecode.

Only 4132 out of 1379346 files were in errors during checks (~0.29%)

May 30 2016, 11:47 AM · General

May 29 2016

zack added a subtask for T359: Indexers: batch content analyzer infrastructure: T424: swh-journal: persistent journal infrastructure to record additions to the swh-storage.
May 29 2016, 5:57 PM · Indexer, General

May 25 2016

rdicosmo added a parent task for T367: ingest Google Code repositories: Unknown Object (Maniphest Task).
May 25 2016, 4:07 PM · Archive coverage, Restricted Project

May 13 2016

olasd changed the visibility for General.
May 13 2016, 5:23 PM
olasd changed the visibility for T397: Check retrieved archives from googlecode.
May 13 2016, 5:09 PM · General
olasd changed the visibility for T383: stay up to date w.r.t. new Debian uploads.
May 13 2016, 5:09 PM · Restricted Project, General, Origin-Debian
olasd changed the visibility for T382: stay up to date w.r.t. new GitHub repositories.
May 13 2016, 5:09 PM · Restricted Project, General, Origin-GitHub
olasd changed the visibility for T376: ingest git.eclipse.org repositories.
May 13 2016, 5:09 PM · Archive coverage
olasd changed the visibility for T372: Discriminate repositories nature (hg, svn, git) - code.google.com.
May 13 2016, 5:09 PM · Fetcher Googlecode, General
olasd changed the visibility for T368: retrieve code.google.com repositories.
May 13 2016, 5:09 PM · General
olasd changed the visibility for T367: ingest Google Code repositories.
May 13 2016, 5:09 PM · Archive coverage, Restricted Project
olasd changed the visibility for T359: Indexers: batch content analyzer infrastructure.
May 13 2016, 5:09 PM · Indexer, General
olasd changed the visibility for T349: Investigate alternatives to Celery + RabbitMQ.
May 13 2016, 5:09 PM · General
olasd changed the visibility for T347: ingest antelink s3 contents in swh.
May 13 2016, 5:09 PM · General
olasd changed the visibility for T341: store ingested tarball artifacts (in a separate storage area).
May 13 2016, 5:09 PM · General
olasd changed the visibility for T337: specify a manifest format for documenting archived software.
May 13 2016, 5:09 PM · General
olasd changed the visibility for T335: specify the URI scheme swh:... to point to software heritage objects.
May 13 2016, 5:09 PM · General
olasd changed the visibility for T329: hg / mercurial loader.
May 13 2016, 5:09 PM · Mercurial loader
olasd changed the visibility for T273: Darcs loader.
May 13 2016, 5:08 PM · General
olasd changed the visibility for T268: ontology - document on the wiki the data model terminology.
May 13 2016, 5:08 PM · General, Restricted Project
olasd changed the visibility for T239: preserve at least 2 copies of each content object.
May 13 2016, 5:07 PM · General
olasd changed the visibility for T189: centralized syslog.
May 13 2016, 5:07 PM · General
olasd changed the visibility for T67: prototype: git clone from SWH.
May 13 2016, 5:05 PM · Vault, General
olasd changed the visibility for T66: clone and load fork GitHub repositories.
May 13 2016, 5:05 PM · Restricted Project, General
olasd changed the visibility for T57: git commit IRC notifications on #swh-devel.
May 13 2016, 5:05 PM · General
olasd changed the visibility for T5: pg_hash: postgres datatype for checksums.
May 13 2016, 5:04 PM · General

May 6 2016

ardumont added a comment to T397: Check retrieved archives from googlecode.

Around ~120k done.
It's rather slow, around 1.1/s.

May 6 2016, 11:21 AM · General

May 5 2016

ardumont added a comment to T397: Check retrieved archives from googlecode.
  • done in 86c1353
  • packaged in python3-swh.fetcher.googlecode v0.0.3
  • deployed on worker01
  • worker01 is currently checking those archives
May 5 2016, 12:33 AM · General
ardumont created T397: Check retrieved archives from googlecode.
May 5 2016, 12:29 AM · General

May 3 2016

zack lowered the priority of T358: doc: high-level architecture diagram from High to Normal.
May 3 2016, 5:46 PM · Documentation
ardumont closed T368: retrieve code.google.com repositories as Resolved.
May 3 2016, 2:10 PM · General
ardumont closed T368: retrieve code.google.com repositories, a subtask of T367: ingest Google Code repositories, as Resolved.
May 3 2016, 2:10 PM · Archive coverage, Restricted Project
ardumont added a comment to T368: retrieve code.google.com repositories.

Rescheduled and no more errors now.

May 3 2016, 2:10 PM · General
ardumont added a comment to T368: retrieve code.google.com repositories.

After checks, there are:

  • 342 files in error (problem during fetch time)
  • 158 corrupted files (bad length or md5 checksums mismatch)
May 3 2016, 9:48 AM · General

May 1 2016

ardumont added a comment to T368: retrieve code.google.com repositories.

It's a second round-trip.

May 1 2016, 7:58 PM · General
zack added a comment to T368: retrieve code.google.com repositories.
In T368#5716, @ardumont wrote:

worker01 is done.

May 1 2016, 7:18 PM · General
ardumont added a comment to T368: retrieve code.google.com repositories.

worker01 is done.

May 1 2016, 7:12 PM · General

Apr 27 2016

zack created T383: stay up to date w.r.t. new Debian uploads.
Apr 27 2016, 9:26 PM · Restricted Project, General, Origin-Debian
zack added a project to T382: stay up to date w.r.t. new GitHub repositories: Restricted Project.
Apr 27 2016, 9:24 PM · Restricted Project, General, Origin-GitHub
zack added a parent task for T66: clone and load fork GitHub repositories: T382: stay up to date w.r.t. new GitHub repositories.
Apr 27 2016, 9:23 PM · Restricted Project, General
zack added subtasks for T382: stay up to date w.r.t. new GitHub repositories: T313: Retrieve fork information for github repositories in swh.lister.github, T66: clone and load fork GitHub repositories.
Apr 27 2016, 9:23 PM · Restricted Project, General, Origin-GitHub
zack added a project to T382: stay up to date w.r.t. new GitHub repositories: General.
Apr 27 2016, 9:23 PM · Restricted Project, General, Origin-GitHub
zack added a project to T66: clone and load fork GitHub repositories: Restricted Project.
Apr 27 2016, 9:21 PM · Restricted Project, General
zack closed Unknown Object (Maniphest Task), a subtask of T367: ingest Google Code repositories, as Resolved.
Apr 27 2016, 9:05 PM · Archive coverage, Restricted Project

Apr 15 2016

ardumont closed T347: ingest antelink s3 contents in swh as Resolved.
Apr 15 2016, 1:30 PM · General
zack created T376: ingest git.eclipse.org repositories.
Apr 15 2016, 10:12 AM · Archive coverage

Apr 13 2016

ardumont closed T372: Discriminate repositories nature (hg, svn, git) - code.google.com as Resolved by committing R61:aa01eda8b927: Fetch the project.json per project:.
Apr 13 2016, 4:22 PM · Fetcher Googlecode, General
ardumont closed T372: Discriminate repositories nature (hg, svn, git) - code.google.com, a subtask of T367: ingest Google Code repositories, as Resolved.
Apr 13 2016, 4:22 PM · Archive coverage, Restricted Project
zack added a comment to T372: Discriminate repositories nature (hg, svn, git) - code.google.com.

The type of repository can be extracted using the main API of the Google Code Archive. It's something extra that we should do in addition to the file download, but it'd be much better than applying heuristics do the download files (no matter how trivial they would be).

Apr 13 2016, 1:54 PM · Fetcher Googlecode, General

Apr 12 2016

ardumont added a project to T372: Discriminate repositories nature (hg, svn, git) - code.google.com: Fetcher Googlecode.
Apr 12 2016, 7:49 PM · Fetcher Googlecode, General
ardumont renamed T372: Discriminate repositories nature (hg, svn, git) - code.google.com from Discriminate repositories nature (hg, svn, git) to Discriminate repositories nature (hg, svn, git) - code.google.com.
Apr 12 2016, 7:48 PM · Fetcher Googlecode, General
ardumont created T372: Discriminate repositories nature (hg, svn, git) - code.google.com.
Apr 12 2016, 7:28 PM · Fetcher Googlecode, General
ardumont added a comment to T368: retrieve code.google.com repositories.

worker01 is now fetching and checking the source archives from google archive.

Apr 12 2016, 4:01 PM · General
ardumont added a comment to T368: retrieve code.google.com repositories.

repository: https://forge.softwareheritage.org/diffusion/61/

Apr 12 2016, 1:38 PM · General
ardumont changed the status of T368: retrieve code.google.com repositories from Open to Work in Progress.
Apr 12 2016, 1:37 PM · General
ardumont changed the status of T368: retrieve code.google.com repositories, a subtask of T367: ingest Google Code repositories, from Open to Work in Progress.
Apr 12 2016, 1:37 PM · Archive coverage, Restricted Project

Apr 11 2016

zack assigned T368: retrieve code.google.com repositories to ardumont.
Apr 11 2016, 10:32 AM · General

Apr 9 2016

rdicosmo added a subtask for T367: ingest Google Code repositories: Unknown Object (Maniphest Task).
Apr 9 2016, 8:47 PM · Archive coverage, Restricted Project
rdicosmo removed a subtask for T368: retrieve code.google.com repositories: Unknown Object (Maniphest Task).
Apr 9 2016, 8:46 PM · General
rdicosmo added a comment to T368: retrieve code.google.com repositories.

Attention: as of today, we have a bit less than two months left before Google erases *all* the original VCS from Google Code. After that date, only the archived version will remain, that may be incorrect.
So we have only a bit less than two months to report bugs up to them

Apr 9 2016, 8:23 PM · General
zack created T368: retrieve code.google.com repositories.
Apr 9 2016, 8:49 AM · General
zack created T367: ingest Google Code repositories.
Apr 9 2016, 8:47 AM · Archive coverage, Restricted Project