Page MenuHomeSoftware Heritage
Feed Advanced Search

Jul 19 2018

ardumont changed the status of T1151: Start listing gitlab.com , a subtask of T1111: ingest GitLab.com (meta-task), from Open to Work in Progress.
Jul 19 2018, 11:46 AM · Archive coverage, General, Origin-GitLab

Jul 18 2018

ardumont updated the task description for T1151: Start listing gitlab.com .
Jul 18 2018, 6:39 PM · Scheduling utilities, Archive coverage, General, Origin-GitLab
ardumont triaged T1151: Start listing gitlab.com as High priority.
Jul 18 2018, 4:28 PM · Scheduling utilities, Archive coverage, General, Origin-GitLab

Jul 17 2018

ardumont closed T989: Implement GitLab lister, a subtask of T1111: ingest GitLab.com (meta-task), as Resolved.
Jul 17 2018, 6:46 PM · Archive coverage, General, Origin-GitLab

Jul 9 2018

anlambert changed the status of T1120: save code now moderation UI, a subtask of T336: "save code now", from Open to Work in Progress.
Jul 9 2018, 10:53 AM · General

Jul 5 2018

ardumont updated subscribers of T1111: ingest GitLab.com (meta-task).

Some repositories @olasd mentioned to me that qualifies as gitlab repositories (in parenthesis, their current size in term of repositories):

Jul 5 2018, 9:36 AM · Archive coverage, General, Origin-GitLab

Jul 3 2018

anlambert changed the status of T1121: save code now API entry point, a subtask of T336: "save code now", from Open to Work in Progress.
Jul 3 2018, 11:16 AM · General

Jun 28 2018

zack renamed T1122: properly handle ingestion of archives within archives (recursive extraction) from Decide how to handle software deposits containing double archive wrapping to properly handle ingestion of archives within archives (recursive extraction).
Jun 28 2018, 10:31 AM · General
zack triaged T1122: properly handle ingestion of archives within archives (recursive extraction) as Normal priority.

The general problem (see below for the deposit-specific case) is indeed complex to deal with (both conceptually in a pure Merkle setting and practically due to the existence of zip bombs). I think a workable solution might be ingest the archive as is and also ingest a separate directory corresponding to the archive content, with some metadata linking the two. That way by default we will only return what we have ingested (without recursion), but we will offer ways to dig-in recursively, e.g., in the web app. There will be plenty of devils in plenty of details for this though.

Jun 28 2018, 10:31 AM · General
rdicosmo created T1122: properly handle ingestion of archives within archives (recursive extraction).
Jun 28 2018, 10:11 AM · General

Jun 27 2018

zack triaged T1119: save code now submission form as High priority.
Jun 27 2018, 8:15 AM · Web app
zack edited projects for T336: "save code now", added: General; removed Web app.

I've generalized the title of this task, will add sub-tasks for the specific features that are still missing to complete this.

Jun 27 2018, 8:11 AM · General

Jun 25 2018

ardumont changed the status of T989: Implement GitLab lister, a subtask of T1111: ingest GitLab.com (meta-task), from Open to Work in Progress.
Jun 25 2018, 3:13 PM · Archive coverage, General, Origin-GitLab

Jun 19 2018

zack edited projects for T1111: ingest GitLab.com (meta-task), added: Archive coverage; removed Archive content.
Jun 19 2018, 3:27 PM · Archive coverage, General, Origin-GitLab
zack added a subtask for T1111: ingest GitLab.com (meta-task): T989: Implement GitLab lister.
Jun 19 2018, 3:21 PM · Archive coverage, General, Origin-GitLab
zack triaged T1111: ingest GitLab.com (meta-task) as High priority.
Jun 19 2018, 3:21 PM · Archive coverage, General, Origin-GitLab

Jun 14 2018

moranegg added a comment to T1098: Add full contextual information in a swh-id of an object.

I completely agree that 'filename' is not enough and adding each time a new piece of context isn't a good solution.
Both path strategies (integers vs identifiers) are interesting.

Jun 14 2018, 3:34 PM · Web app, General
rdicosmo added a comment to T1098: Add full contextual information in a swh-id of an object.

Here is a concrete proposal for the path language:

Jun 14 2018, 11:02 AM · Web app, General
zack added a comment to T1098: Add full contextual information in a swh-id of an object.

Well, there are other scenarios: like us being forced to remove content for legal reasons. But note that I'm not arguing against the path-based approach. The risk exists only for path encoded using *integers*, because they're by construction relative to the object you traverse. You can have paths that contain the full-step information (e.g., a file/directory name, or a commit identifier), and those paths would be resolvable even if you lose access to intermediate objects. The problem with those kind of paths is that they are much longer than the integer-based ones. That robustness-v-compactness trade-off is the though one I was referring to.

Jun 14 2018, 10:11 AM · Web app, General
rdicosmo added a comment to T1098: Add full contextual information in a swh-id of an object.

I see your point, but let's remember that here we want to provide a means for a user A to encode efficiently the context information necessary for another user B to be shown the same view of the archive as the one A has.

Jun 14 2018, 9:53 AM · Web app, General
zack added a comment to T1098: Add full contextual information in a swh-id of an object.

It just occurred to me that this works (in the sense that the paths will be resolvable) only if we have all the objects in the path from the snapshot down to the pointed object, which is not something we can guarantee in general — e.g., we might have archived a repository which had missing objects in the first place.
It is all contextual information which would not make it impossible to see the final object you're pointing too. But this issue calls into question the robustness of integer-based paths for our purposes here. For instance, an fpath based on actual file/directory names will always be displayable, one based on integers will not be.
Though trade-off…

Jun 14 2018, 9:35 AM · Web app, General
rdicosmo added a comment to T1098: Add full contextual information in a swh-id of an object.

Actually, we can generalize the approach even a bit more.

Jun 14 2018, 9:15 AM · Web app, General
rdicosmo raised the priority of T1098: Add full contextual information in a swh-id of an object from Low to Normal.
Jun 14 2018, 9:10 AM · Web app, General

Jun 13 2018

moranegg added subtasks for T1102: Handle all GitHub elements: T17: handle github assets in git loader, T1101: fetch release note from github to keep in release_metadata table, T833: When listing an origin, add origin level metadata to RMD storage.
Jun 13 2018, 4:24 PM · meta-task, General
moranegg triaged T1102: Handle all GitHub elements as Low priority.
Jun 13 2018, 4:24 PM · meta-task, General
rdicosmo renamed T1098: Add full contextual information in a swh-id of an object from Add file-name as contextual information in a swh-id of a content object to Add full contextual information in a swh-id of an object.
Jun 13 2018, 3:57 PM · Web app, General
rdicosmo added a comment to T1098: Add full contextual information in a swh-id of an object.

Thanks for starting this... it's an important discussion, and it goes quite beyond the need of a "filename" attribute in our family of context attributes :-)

Jun 13 2018, 3:28 PM · Web app, General
rdicosmo added a comment to T1099: support origin and SWHID blocklist for archive search and browse.

Seems a nice way to go: we would also need some easy to use interface to
edit the "visibility" bit too...

Jun 13 2018, 1:53 PM · General, Web app
olasd added a comment to T1099: support origin and SWHID blocklist for archive search and browse.

The simplest approximation of this that I can see is adding a visibility column to the origin table, and tweaking that manually when we get a request.

Jun 13 2018, 1:44 PM · General, Web app
zack renamed T1099: support origin and SWHID blocklist for archive search and browse from Implement a blacklist/whitelist feature on the search engine of the archive to support origin blacklist for archive search and browse.
Jun 13 2018, 12:12 PM · General, Web app

Jun 12 2018

zack added a project to T1098: Add full contextual information in a swh-id of an object: General.

(tagging as General, while we discuss it)

Jun 12 2018, 4:26 PM · Web app, General
zack renamed T1087: facet/metadata-based project search from facet/metadata-bases project search to facet/metadata-based project search.
Jun 12 2018, 12:14 PM · Metadata workflow, General, Web app
zack renamed T1087: facet/metadata-based project search from "Browse" should mean browse to facet/metadata-bases project search.
Jun 12 2018, 12:14 PM · Metadata workflow, General, Web app

Jun 6 2018

zack triaged T1086: ingest Debian's Alioth (archived) repositories (meta-task) as Normal priority.
Jun 6 2018, 1:42 PM · Archive coverage
zack added a project to T1002: ingest Hackage, the Haskell package repository (meta task): Archive content.
Jun 6 2018, 1:41 PM · Hackage loader, Hackage lister, Archive coverage

Jun 5 2018

zack moved T1040: identifiers: support optional contextual parts for line numbers and origin from Restricted Project Column to Restricted Project Column on the Restricted Project board.
Jun 5 2018, 11:19 AM · Restricted Project, General
zack moved T1040: identifiers: support optional contextual parts for line numbers and origin from Restricted Project Column to Restricted Project Column on the Restricted Project board.
Jun 5 2018, 11:05 AM · Restricted Project, General
zack closed T1040: identifiers: support optional contextual parts for line numbers and origin as Resolved.

closing, now that all sub-tasks have been completed

Jun 5 2018, 11:05 AM · Restricted Project, General
zack closed T1041: document contextual parts of persistent identifiers, a subtask of T1040: identifiers: support optional contextual parts for line numbers and origin, as Resolved.
Jun 5 2018, 11:05 AM · Restricted Project, General

May 29 2018

zack moved T1040: identifiers: support optional contextual parts for line numbers and origin from Restricted Project Column to Restricted Project Column on the Restricted Project board.
May 29 2018, 11:23 AM · Restricted Project, General
zack added a project to T1040: identifiers: support optional contextual parts for line numbers and origin: Restricted Project.
May 29 2018, 11:23 AM · Restricted Project, General

May 18 2018

anlambert closed T1042: support optional/contextual parts of persistent identifiers in the web app resolver, a subtask of T1040: identifiers: support optional contextual parts for line numbers and origin, as Resolved.
May 18 2018, 6:46 PM · Restricted Project, General
anlambert added a comment to T1040: identifiers: support optional contextual parts for line numbers and origin.

I agree with your proposal.

May 18 2018, 3:40 PM · Restricted Project, General
zack added a comment to T1040: identifiers: support optional contextual parts for line numbers and origin.

So I think the best option here is to used named parameters as optional parts in the identifiers. This will give us some flexibility regarding the adding of new ones in the future. Regarding the separator, we could either used \ or | as they should not interfere with origin urls to extract.

May 18 2018, 2:32 PM · Restricted Project, General
anlambert added a comment to T1040: identifiers: support optional contextual parts for line numbers and origin.

Thanks for the clear explanation.

May 18 2018, 1:21 PM · Restricted Project, General

May 17 2018

zack added a comment to T1040: identifiers: support optional contextual parts for line numbers and origin.

the problems I see with optional URL parameters instead of modifying the identifiers themselves are the following:

May 17 2018, 9:06 PM · Restricted Project, General
anlambert added a comment to T1040: identifiers: support optional contextual parts for line numbers and origin.

As I am currently implementing the task, I am wondering if adding optional parts to a swh identifier v1 is the adequate solution.

May 17 2018, 4:59 PM · Restricted Project, General

May 16 2018

anlambert changed the status of T1042: support optional/contextual parts of persistent identifiers in the web app resolver, a subtask of T1040: identifiers: support optional contextual parts for line numbers and origin, from Open to Work in Progress.
May 16 2018, 3:21 PM · Restricted Project, General

Apr 28 2018

zack triaged T1040: identifiers: support optional contextual parts for line numbers and origin as Normal priority.
Apr 28 2018, 3:29 PM · Restricted Project, General

Mar 30 2018

ardumont closed T647: support software deposit via SWORD protocol (meta task) as Resolved.
Mar 30 2018, 11:57 AM · SWORD deposit, General
moranegg added a comment to T647: support software deposit via SWORD protocol (meta task).

@ardumont, I think you can resolve this one ;-)

Mar 30 2018, 11:47 AM · SWORD deposit, General

Mar 27 2018

zack added a comment to T1002: ingest Hackage, the Haskell package repository (meta task).

relevant highlights:

Mar 27 2018, 6:02 PM · Hackage loader, Hackage lister, Archive coverage
zack renamed T1002: ingest Hackage, the Haskell package repository (meta task) from ingest Hackage into the Software Heritage archive (meta task) to ingest Hackage (Haskell package repository) into the Software Heritage archive (meta task).
Mar 27 2018, 6:01 PM · Hackage loader, Hackage lister, Archive coverage

Mar 25 2018

zack added a comment to T1002: ingest Hackage, the Haskell package repository (meta task).

update from joeyh, there is no need for any specific hack to maintain a local mirror, it is just an undocumented feature:

Mar 25 2018, 3:47 PM · Hackage loader, Hackage lister, Archive coverage

Mar 24 2018

zack triaged T1002: ingest Hackage, the Haskell package repository (meta task) as Normal priority.
Mar 24 2018, 10:26 PM · Hackage loader, Hackage lister, Archive coverage

Mar 7 2018

olasd closed T537: Update the task scheduler and the task event listener to use the new partial status, a subtask of T533: Allow loaders to register partial state (meta task), as Resolved.
Mar 7 2018, 4:43 PM · General

Feb 15 2018

anlambert closed T949: swh-web: Display origin-visit's details using snapshots, a subtask of T565: embrace repository snapshot object in the data model (meta task), as Resolved.
Feb 15 2018, 3:35 PM · General

Feb 6 2018

olasd added a comment to T565: embrace repository snapshot object in the data model (meta task).

swh-loader-git and swh-loader-debian have now been migrated to snapshots as well, and restarted.

Feb 6 2018, 4:30 PM · General

Feb 2 2018

ardumont added a comment to T565: embrace repository snapshot object in the data model (meta task).

Current status on the development migration towards snapshot (branch wip/snapshot(s)) as far as I know:

Feb 2 2018, 11:19 AM · General

Jan 14 2018

zack closed T335: specify the URI scheme swh:... to point to software heritage objects as Resolved.

Closed in rDMODb61c6665661c823080192b351af4744dddb35f1e

Jan 14 2018, 10:32 PM · General
zack closed T335: specify the URI scheme swh:... to point to software heritage objects, a subtask of T337: specify a manifest format for documenting archived software, as Resolved.
Jan 14 2018, 10:32 PM · General

Jan 12 2018

zack added a comment to T335: specify the URI scheme swh:... to point to software heritage objects.

yeah, i was thinking about it while running earlier on today :) i'm not yet sure if i'll specify the meaning of the sha1 of each object here, or just say that the sha1 is the primary key of the object and refer to swh-model, we'll see

Jan 12 2018, 11:10 PM · General
olasd added a comment to T335: specify the URI scheme swh:... to point to software heritage objects.

When writing the documentation, please be sure to be explicit whether content identifier its sha1 or its salted sha1_git, because that's not clear which it is from this discussion :)

Jan 12 2018, 7:13 PM · General
zack changed the status of T335: specify the URI scheme swh:... to point to software heritage objects from Open to Work in Progress.
In T335#16990, @zack wrote:
identifier = "swh" ":" scheme_version ":" obj_type ":" obj_id ;
scheme_version = "1" ;
obj_type =
    "snp"  # snapshot
  | "rel"  # release
  | "rev"  # revision
  | "dir"  # directory
  | "cnt"  # content
  ;
obj_id = object sha1, hex-encoded with (lowercase) ASCII characters ;
Jan 12 2018, 6:11 PM · General
zack changed the status of T335: specify the URI scheme swh:... to point to software heritage objects, a subtask of T337: specify a manifest format for documenting archived software, from Open to Work in Progress.
Jan 12 2018, 6:11 PM · General
zack added a comment to T335: specify the URI scheme swh:... to point to software heritage objects.

in the future, if we switch to blake2/256 (or equivalent length checksums), the examples would become something like:

Jan 12 2018, 2:11 PM · General
zack claimed T335: specify the URI scheme swh:... to point to software heritage objects.
Jan 12 2018, 2:05 PM · General
zack raised the priority of T335: specify the URI scheme swh:... to point to software heritage objects from Normal to High.

concrete, tentative proposal (EBNF):

identifier = "swh" ":" scheme_version ":" obj_type ":" obj_id ;
scheme_version = "1" ;
obj_type =
    "snp"  # snapshot
  | "rel"  # release
  | "rev"  # revision
  | "dir"  # directory
  | "cnt"  # content
  ;
obj_id = object sha1, hex-encoded with (lowercase) ASCII characters ;
Jan 12 2018, 2:04 PM · General
zack added a parent task for T335: specify the URI scheme swh:... to point to software heritage objects: T926: Web UI: support resolution of external pointers into the archive.
Jan 12 2018, 1:53 PM · General

Jan 8 2018

ardumont closed T718: SWORD deposit: backend server, a subtask of T647: support software deposit via SWORD protocol (meta task), as Resolved.
Jan 8 2018, 12:34 PM · SWORD deposit, General

Jan 3 2018

zack added a parent task for T647: support software deposit via SWORD protocol (meta task): T716: Integration HAL - Software Heritage (metatask).
Jan 3 2018, 10:33 AM · SWORD deposit, General

Dec 20 2017

ardumont changed the status of T329: hg / mercurial loader, a subtask of T807: dogfooding: ingest the Software Heritage forge into the archive (via the canonical URLs), from Open to Work in Progress.
Dec 20 2017, 11:42 AM · General

Dec 19 2017

ardumont renamed T908: mercurial loader: Define scheduler task(s) from mercurial loader: Define scheduler task to mercurial loader: Define scheduler task(s).
Dec 19 2017, 2:06 PM · Mercurial loader
ardumont created T908: mercurial loader: Define scheduler task(s).
Dec 19 2017, 2:05 PM · Mercurial loader
ardumont created T907: mercurial loader: Align mercurial loader with other loaders.
Dec 19 2017, 2:04 PM · Mercurial loader
ardumont created T906: mercurial loader: Debian package.
Dec 19 2017, 1:59 PM · System administration, Mercurial loader
ardumont claimed T329: hg / mercurial loader.
Dec 19 2017, 1:58 PM · Mercurial loader

Dec 15 2017

olasd closed T567: adapt SQL storage for repository snapshot objects, a subtask of T565: embrace repository snapshot object in the data model (meta task), as Resolved.
Dec 15 2017, 3:37 PM · General

Dec 13 2017

olasd closed T566: specify serialization format for repository snapshot objects, a subtask of T565: embrace repository snapshot object in the data model (meta task), as Resolved.
Dec 13 2017, 11:30 AM · General

Dec 12 2017

seirl added a parent task for T565: embrace repository snapshot object in the data model (meta task): T887: Vault: "snapshot" cooker.
Dec 12 2017, 3:45 PM · General
seirl closed T530: Software Heritage Vault, a subtask of T508: prototype: git archive from SWH, as Resolved.
Dec 12 2017, 3:44 PM · Vault, General
seirl closed T508: prototype: git archive from SWH, a subtask of T67: prototype: git clone from SWH, as Resolved.
Dec 12 2017, 3:43 PM · Vault, General
seirl closed T508: prototype: git archive from SWH as Resolved.

This is already well beyond a prototype, so I'm closing this task.

Dec 12 2017, 3:43 PM · Vault, General
seirl closed T67: prototype: git clone from SWH as Resolved.

As the "prototype" part of the vault is definitely over, I'm closing this task.

Dec 12 2017, 3:42 PM · Vault, General

Nov 10 2017

olasd added a comment to T335: specify the URI scheme swh:... to point to software heritage objects.

We can always allow people to truncate the identifier to some arbitrary (shorter) length. The canonical URI would be the full identifier, but our URI resolver can recognize shortened identifiers and point to a disambiguation page with all the objects whose identifier starts with the given string.

Nov 10 2017, 7:49 PM · General

Nov 6 2017

rdicosmo added a comment to T335: specify the URI scheme swh:... to point to software heritage objects.

I agree with all the suggestions: the full id should definitely contain all
this information.
Nevertheless, the sheer length of the result *may* turn out to be a blocker
for adoption as a reference to software in the academic publishing
framework. We can propose this, and see if we need to also provide a
shorter backup if really there is a strong negative feedback.

Nov 6 2017, 3:19 PM · General
zack added a comment to T335: specify the URI scheme swh:... to point to software heritage objects.

I'm not opposed to having explicit hash scheme names in the IDs—it is a good idea, only to be weighed against the cost in terms of length.
But we should also have schema version numbers, in case more radical changes will be needed in the future, e.g., renaming the object types in the graph.
If we retain both suggestions, that would give:

  • swh:1:revision:sha1_git:<git sha1 of a revision>
  • swh:1:content:blake2s256:<blake2s256 of a content>
Nov 6 2017, 3:04 PM · General
olasd added a comment to T335: specify the URI scheme swh:... to point to software heritage objects.

I've been thinking about this in relation to T836.

Nov 6 2017, 2:58 PM · General

Nov 5 2017

olasd added a subtask for T565: embrace repository snapshot object in the data model (meta task): T830: Remove tables occurrence and occurrence_history.
Nov 5 2017, 9:29 PM · General

Oct 17 2017

zack added a subtask for T807: dogfooding: ingest the Software Heritage forge into the archive (via the canonical URLs): T329: hg / mercurial loader.
Oct 17 2017, 3:44 PM · General
zack added a parent task for T329: hg / mercurial loader: T807: dogfooding: ingest the Software Heritage forge into the archive (via the canonical URLs).
Oct 17 2017, 3:44 PM · Mercurial loader
zack added a subtask for T807: dogfooding: ingest the Software Heritage forge into the archive (via the canonical URLs): T328: svn / subversion loader.
Oct 17 2017, 3:44 PM · General
zack created T807: dogfooding: ingest the Software Heritage forge into the archive (via the canonical URLs).
Oct 17 2017, 3:43 PM · General

Oct 16 2017

fiendish added a revision to T329: hg / mercurial loader: D256: mercurial bundle20 parser/loader.
Oct 16 2017, 10:06 PM · Mercurial loader

Oct 6 2017

zack closed T5: pg_hash: postgres datatype for checksums as Wontfix.
Oct 6 2017, 2:55 PM · General

Sep 15 2017

zack added a project to T508: prototype: git archive from SWH: Vault.
Sep 15 2017, 10:12 AM · Vault, General
zack added a project to T67: prototype: git clone from SWH: Vault.
Sep 15 2017, 10:03 AM · Vault, General
zack assigned T647: support software deposit via SWORD protocol (meta task) to ardumont.
Sep 15 2017, 9:58 AM · SWORD deposit, General
zack closed T598: Store content -> revision cache in azure table storage, a subtask of T547: Azure prototype: Content provenance information API, as Wontfix.
Sep 15 2017, 9:58 AM · General
zack closed T547: Azure prototype: Content provenance information API as Wontfix.

we're taking a different route for this now, based on @grouss WIP

Sep 15 2017, 9:57 AM · General