Page MenuHomeSoftware Heritage

Discriminate repositories nature (hg, svn, git) - code.google.com
Closed, ResolvedPublic

Description

A priori, except for svn repositories, there are no information about repositories nature in the metadata (.json file) nor in filename of the repositories that are currently being downloaded by worker01.

So we may need an intermediary step to determine that.

Note: Also, those files are currently archives (.zip, .gz) files (so not homogeneous).

Event Timeline

ardumont created this task.Apr 12 2016, 7:28 PM
ardumont renamed this task from Discriminate repositories nature (hg, svn, git) to Discriminate repositories nature (hg, svn, git) - code.google.com.Apr 12 2016, 7:48 PM
ardumont added a project: Fetcher Googlecode.
zack added a comment.Apr 13 2016, 1:54 PM

The type of repository can be extracted using the main API of the Google Code Archive. It's something extra that we should do in addition to the file download, but it'd be much better than applying heuristics do the download files (no matter how trivial they would be).

See the repoType field.

In the process, it would be good to also archive the project.json files, as those will be important in the future for us too.

olasd changed the visibility from "All Users" to "Public (No Login Required)".May 13 2016, 5:09 PM