Page MenuHomeSoftware Heritage
Feed Advanced Search

Oct 4 2018

rdicosmo triaged T1241: Persistent identifiers (PIDs): add a way to describe Merkle DAG paths as Low priority.
Oct 4 2018, 12:27 PM · UX, Web app, General
zack closed T337: specify a manifest format for documenting archived software as Resolved.

The identification part of this task has been done with documenting/implementing our PIDs, the rest is more suited for the software citation work on which @moranegg is actively working.

Oct 4 2018, 12:13 PM · General
zack added a project to T1087: facet/metadata-based project search: Metadata workflow.
Oct 4 2018, 11:55 AM · Metadata workflow, General, Web app
zack raised the priority of T1087: facet/metadata-based project search from Low to Normal.
Oct 4 2018, 11:55 AM · Metadata workflow, General, Web app
zack closed T1098: Add full contextual information in a swh-id of an object as Resolved.

this has been done in bc30e8bc60ac3a310f91a15b5692e6b9bc6a30a3

Oct 4 2018, 11:53 AM · Web app, General
zack removed subtasks for T807: dogfooding: ingest the Software Heritage forge into the archive (via the canonical URLs): T329: hg / mercurial loader, T328: svn / subversion loader.
Oct 4 2018, 11:51 AM · General
zack closed T336: "save code now" as Resolved.
Oct 4 2018, 11:47 AM · General
zack added a comment to T336: "save code now".

this is now done (YAY :-)), closing the task to reflect current status

Oct 4 2018, 11:47 AM · General
zack removed a subtask for T336: "save code now": T1156: Fix release targets of already loaded mercurial type origins.
Oct 4 2018, 11:46 AM · General

Oct 1 2018

moranegg changed the status of T833: When listing an origin, add origin level metadata to RMD storage, a subtask of T1102: Handle all GitHub elements, from Work in Progress to Open.
Oct 1 2018, 3:05 PM · meta-task, General

Sep 4 2018

ardumont closed T1111: ingest GitLab.com (meta-task) as Resolved.
Sep 4 2018, 6:17 PM · Archive coverage, General, Origin-GitLab
anlambert closed T1119: save code now submission form, a subtask of T336: "save code now", as Resolved.
Sep 4 2018, 3:14 PM · General
anlambert closed T1121: save code now API entry point, a subtask of T336: "save code now", as Resolved.
Sep 4 2018, 3:14 PM · General
anlambert closed T1120: save code now moderation UI, a subtask of T336: "save code now", as Resolved.
Sep 4 2018, 3:14 PM · General

Aug 24 2018

ardumont added a comment to T1111: ingest GitLab.com (meta-task).

A priori, at current speed, there remains ~7.5 days till the end of the gitlab origins ingestion.

Aug 24 2018, 12:06 PM · Archive coverage, General, Origin-GitLab

Aug 3 2018

ardumont closed T329: hg / mercurial loader, a subtask of T807: dogfooding: ingest the Software Heritage forge into the archive (via the canonical URLs), as Resolved.
Aug 3 2018, 3:03 PM · General

Jul 26 2018

ardumont added a subtask for T336: "save code now": T1156: Fix release targets of already loaded mercurial type origins.
Jul 26 2018, 3:08 PM · General

Jul 25 2018

ardumont changed the status of T1111: ingest GitLab.com (meta-task) from Open to Work in Progress.
Jul 25 2018, 6:29 PM · Archive coverage, General, Origin-GitLab

Jul 24 2018

anlambert changed the status of T1119: save code now submission form, a subtask of T336: "save code now", from Open to Work in Progress.
Jul 24 2018, 11:14 AM · General

Jul 21 2018

ardumont added a comment to T336: "save code now".

But that's listing, not loading. It's not clear to me that a user that added a forge would be interested in knowing when we're done adding its origins, that's just an implementation detail. The user will want to know when we have archived all of it at least once, which is complicated to define. It might be enough to give visibility to when the listing is done, but it'll certainly require a different user-facing explanation than saving an origin.

Jul 21 2018, 11:55 AM · General

Jul 20 2018

zack added a comment to T336: "save code now".

E.g., you don't "schedule" the addition of an entire forge as a single task,

Yes, there are 2 tasks for now (incremental, full) but if we also hide that detail within T1157... Then that could be a win, i think ;)

Jul 20 2018, 10:08 PM · General
ardumont added a comment to T336: "save code now".

E.g., you don't "schedule" the addition of an entire forge as a single task,

Jul 20 2018, 5:55 PM · General
zack updated subscribers of T336: "save code now".

Does adding a supported forge (e.g gitlab instance) considered a possible save now request?

Jul 20 2018, 5:50 PM · General
ardumont added a comment to T336: "save code now".

Does adding a supported forge (e.g gitlab instance) considered a possible save now request?

Jul 20 2018, 5:18 PM · General
ardumont closed T1151: Start listing gitlab.com as Resolved.
Jul 20 2018, 1:21 PM · Scheduling utilities, Archive coverage, General, Origin-GitLab
ardumont closed T1151: Start listing gitlab.com , a subtask of T1111: ingest GitLab.com (meta-task), as Resolved.
Jul 20 2018, 1:21 PM · Archive coverage, General, Origin-GitLab

Jul 19 2018

ardumont changed the status of T1151: Start listing gitlab.com from Open to Work in Progress.
Jul 19 2018, 11:46 AM · Scheduling utilities, Archive coverage, General, Origin-GitLab
ardumont changed the status of T1151: Start listing gitlab.com , a subtask of T1111: ingest GitLab.com (meta-task), from Open to Work in Progress.
Jul 19 2018, 11:46 AM · Archive coverage, General, Origin-GitLab

Jul 18 2018

ardumont updated the task description for T1151: Start listing gitlab.com .
Jul 18 2018, 6:39 PM · Scheduling utilities, Archive coverage, General, Origin-GitLab
ardumont triaged T1151: Start listing gitlab.com as High priority.
Jul 18 2018, 4:28 PM · Scheduling utilities, Archive coverage, General, Origin-GitLab

Jul 17 2018

ardumont closed T989: Implement GitLab lister, a subtask of T1111: ingest GitLab.com (meta-task), as Resolved.
Jul 17 2018, 6:46 PM · Archive coverage, General, Origin-GitLab

Jul 9 2018

anlambert changed the status of T1120: save code now moderation UI, a subtask of T336: "save code now", from Open to Work in Progress.
Jul 9 2018, 10:53 AM · General

Jul 5 2018

ardumont updated subscribers of T1111: ingest GitLab.com (meta-task).

Some repositories @olasd mentioned to me that qualifies as gitlab repositories (in parenthesis, their current size in term of repositories):

Jul 5 2018, 9:36 AM · Archive coverage, General, Origin-GitLab

Jul 3 2018

anlambert changed the status of T1121: save code now API entry point, a subtask of T336: "save code now", from Open to Work in Progress.
Jul 3 2018, 11:16 AM · General

Jun 28 2018

zack renamed T1122: properly handle ingestion of archives within archives (recursive extraction) from Decide how to handle software deposits containing double archive wrapping to properly handle ingestion of archives within archives (recursive extraction).
Jun 28 2018, 10:31 AM · General
zack triaged T1122: properly handle ingestion of archives within archives (recursive extraction) as Normal priority.

The general problem (see below for the deposit-specific case) is indeed complex to deal with (both conceptually in a pure Merkle setting and practically due to the existence of zip bombs). I think a workable solution might be ingest the archive as is and also ingest a separate directory corresponding to the archive content, with some metadata linking the two. That way by default we will only return what we have ingested (without recursion), but we will offer ways to dig-in recursively, e.g., in the web app. There will be plenty of devils in plenty of details for this though.

Jun 28 2018, 10:31 AM · General
rdicosmo created T1122: properly handle ingestion of archives within archives (recursive extraction).
Jun 28 2018, 10:11 AM · General

Jun 27 2018

zack triaged T1119: save code now submission form as High priority.
Jun 27 2018, 8:15 AM · Web app
zack edited projects for T336: "save code now", added: General; removed Web app.

I've generalized the title of this task, will add sub-tasks for the specific features that are still missing to complete this.

Jun 27 2018, 8:11 AM · General

Jun 25 2018

ardumont changed the status of T989: Implement GitLab lister, a subtask of T1111: ingest GitLab.com (meta-task), from Open to Work in Progress.
Jun 25 2018, 3:13 PM · Archive coverage, General, Origin-GitLab

Jun 19 2018

zack edited projects for T1111: ingest GitLab.com (meta-task), added: Archive coverage; removed Archive content.
Jun 19 2018, 3:27 PM · Archive coverage, General, Origin-GitLab
zack added a subtask for T1111: ingest GitLab.com (meta-task): T989: Implement GitLab lister.
Jun 19 2018, 3:21 PM · Archive coverage, General, Origin-GitLab
zack triaged T1111: ingest GitLab.com (meta-task) as High priority.
Jun 19 2018, 3:21 PM · Archive coverage, General, Origin-GitLab

Jun 14 2018

moranegg added a comment to T1098: Add full contextual information in a swh-id of an object.

I completely agree that 'filename' is not enough and adding each time a new piece of context isn't a good solution.
Both path strategies (integers vs identifiers) are interesting.

Jun 14 2018, 3:34 PM · Web app, General
rdicosmo added a comment to T1098: Add full contextual information in a swh-id of an object.

Here is a concrete proposal for the path language:

Jun 14 2018, 11:02 AM · Web app, General
zack added a comment to T1098: Add full contextual information in a swh-id of an object.

Well, there are other scenarios: like us being forced to remove content for legal reasons. But note that I'm not arguing against the path-based approach. The risk exists only for path encoded using *integers*, because they're by construction relative to the object you traverse. You can have paths that contain the full-step information (e.g., a file/directory name, or a commit identifier), and those paths would be resolvable even if you lose access to intermediate objects. The problem with those kind of paths is that they are much longer than the integer-based ones. That robustness-v-compactness trade-off is the though one I was referring to.

Jun 14 2018, 10:11 AM · Web app, General
rdicosmo added a comment to T1098: Add full contextual information in a swh-id of an object.

I see your point, but let's remember that here we want to provide a means for a user A to encode efficiently the context information necessary for another user B to be shown the same view of the archive as the one A has.

Jun 14 2018, 9:53 AM · Web app, General
zack added a comment to T1098: Add full contextual information in a swh-id of an object.

It just occurred to me that this works (in the sense that the paths will be resolvable) only if we have all the objects in the path from the snapshot down to the pointed object, which is not something we can guarantee in general — e.g., we might have archived a repository which had missing objects in the first place.
It is all contextual information which would not make it impossible to see the final object you're pointing too. But this issue calls into question the robustness of integer-based paths for our purposes here. For instance, an fpath based on actual file/directory names will always be displayable, one based on integers will not be.
Though trade-off…

Jun 14 2018, 9:35 AM · Web app, General
rdicosmo added a comment to T1098: Add full contextual information in a swh-id of an object.

Actually, we can generalize the approach even a bit more.

Jun 14 2018, 9:15 AM · Web app, General
rdicosmo raised the priority of T1098: Add full contextual information in a swh-id of an object from Low to Normal.
Jun 14 2018, 9:10 AM · Web app, General

Jun 13 2018

moranegg added subtasks for T1102: Handle all GitHub elements: T17: handle github assets in git loader, T1101: fetch release note from github to keep in release_metadata table, T833: When listing an origin, add origin level metadata to RMD storage.
Jun 13 2018, 4:24 PM · meta-task, General
moranegg triaged T1102: Handle all GitHub elements as Low priority.
Jun 13 2018, 4:24 PM · meta-task, General
rdicosmo renamed T1098: Add full contextual information in a swh-id of an object from Add file-name as contextual information in a swh-id of a content object to Add full contextual information in a swh-id of an object.
Jun 13 2018, 3:57 PM · Web app, General
rdicosmo added a comment to T1098: Add full contextual information in a swh-id of an object.

Thanks for starting this... it's an important discussion, and it goes quite beyond the need of a "filename" attribute in our family of context attributes :-)

Jun 13 2018, 3:28 PM · Web app, General
rdicosmo added a comment to T1099: support origin and SWHID blocklist for archive search and browse.

Seems a nice way to go: we would also need some easy to use interface to
edit the "visibility" bit too...

Jun 13 2018, 1:53 PM · General, Web app
olasd added a comment to T1099: support origin and SWHID blocklist for archive search and browse.

The simplest approximation of this that I can see is adding a visibility column to the origin table, and tweaking that manually when we get a request.

Jun 13 2018, 1:44 PM · General, Web app
zack renamed T1099: support origin and SWHID blocklist for archive search and browse from Implement a blacklist/whitelist feature on the search engine of the archive to support origin blacklist for archive search and browse.
Jun 13 2018, 12:12 PM · General, Web app

Jun 12 2018

zack added a project to T1098: Add full contextual information in a swh-id of an object: General.

(tagging as General, while we discuss it)

Jun 12 2018, 4:26 PM · Web app, General
zack renamed T1087: facet/metadata-based project search from facet/metadata-bases project search to facet/metadata-based project search.
Jun 12 2018, 12:14 PM · Metadata workflow, General, Web app
zack renamed T1087: facet/metadata-based project search from "Browse" should mean browse to facet/metadata-bases project search.
Jun 12 2018, 12:14 PM · Metadata workflow, General, Web app

Jun 6 2018

zack triaged T1086: ingest Debian's Alioth (archived) repositories (meta-task) as Normal priority.
Jun 6 2018, 1:42 PM · Archive coverage
zack added a project to T1002: ingest Hackage, the Haskell package repository (meta task): Archive content.
Jun 6 2018, 1:41 PM · Hackage loader, Hackage lister, Archive coverage

Jun 5 2018

zack moved T1040: identifiers: support optional contextual parts for line numbers and origin from Restricted Project Column to Restricted Project Column on the Restricted Project board.
Jun 5 2018, 11:19 AM · Restricted Project, General
zack moved T1040: identifiers: support optional contextual parts for line numbers and origin from Restricted Project Column to Restricted Project Column on the Restricted Project board.
Jun 5 2018, 11:05 AM · Restricted Project, General
zack closed T1040: identifiers: support optional contextual parts for line numbers and origin as Resolved.

closing, now that all sub-tasks have been completed

Jun 5 2018, 11:05 AM · Restricted Project, General
zack closed T1041: document contextual parts of persistent identifiers, a subtask of T1040: identifiers: support optional contextual parts for line numbers and origin, as Resolved.
Jun 5 2018, 11:05 AM · Restricted Project, General

May 29 2018

zack moved T1040: identifiers: support optional contextual parts for line numbers and origin from Restricted Project Column to Restricted Project Column on the Restricted Project board.
May 29 2018, 11:23 AM · Restricted Project, General
zack added a project to T1040: identifiers: support optional contextual parts for line numbers and origin: Restricted Project.
May 29 2018, 11:23 AM · Restricted Project, General

May 18 2018

anlambert closed T1042: support optional/contextual parts of persistent identifiers in the web app resolver, a subtask of T1040: identifiers: support optional contextual parts for line numbers and origin, as Resolved.
May 18 2018, 6:46 PM · Restricted Project, General
anlambert added a comment to T1040: identifiers: support optional contextual parts for line numbers and origin.

I agree with your proposal.

May 18 2018, 3:40 PM · Restricted Project, General
zack added a comment to T1040: identifiers: support optional contextual parts for line numbers and origin.

So I think the best option here is to used named parameters as optional parts in the identifiers. This will give us some flexibility regarding the adding of new ones in the future. Regarding the separator, we could either used \ or | as they should not interfere with origin urls to extract.

May 18 2018, 2:32 PM · Restricted Project, General
anlambert added a comment to T1040: identifiers: support optional contextual parts for line numbers and origin.

Thanks for the clear explanation.

May 18 2018, 1:21 PM · Restricted Project, General

May 17 2018

zack added a comment to T1040: identifiers: support optional contextual parts for line numbers and origin.

the problems I see with optional URL parameters instead of modifying the identifiers themselves are the following:

May 17 2018, 9:06 PM · Restricted Project, General
anlambert added a comment to T1040: identifiers: support optional contextual parts for line numbers and origin.

As I am currently implementing the task, I am wondering if adding optional parts to a swh identifier v1 is the adequate solution.

May 17 2018, 4:59 PM · Restricted Project, General

May 16 2018

anlambert changed the status of T1042: support optional/contextual parts of persistent identifiers in the web app resolver, a subtask of T1040: identifiers: support optional contextual parts for line numbers and origin, from Open to Work in Progress.
May 16 2018, 3:21 PM · Restricted Project, General

Apr 28 2018

zack triaged T1040: identifiers: support optional contextual parts for line numbers and origin as Normal priority.
Apr 28 2018, 3:29 PM · Restricted Project, General

Mar 30 2018

ardumont closed T647: support software deposit via SWORD protocol (meta task) as Resolved.
Mar 30 2018, 11:57 AM · SWORD deposit, General
moranegg added a comment to T647: support software deposit via SWORD protocol (meta task).

@ardumont, I think you can resolve this one ;-)

Mar 30 2018, 11:47 AM · SWORD deposit, General

Mar 27 2018

zack added a comment to T1002: ingest Hackage, the Haskell package repository (meta task).

relevant highlights:

Mar 27 2018, 6:02 PM · Hackage loader, Hackage lister, Archive coverage
zack renamed T1002: ingest Hackage, the Haskell package repository (meta task) from ingest Hackage into the Software Heritage archive (meta task) to ingest Hackage (Haskell package repository) into the Software Heritage archive (meta task).
Mar 27 2018, 6:01 PM · Hackage loader, Hackage lister, Archive coverage

Mar 25 2018

zack added a comment to T1002: ingest Hackage, the Haskell package repository (meta task).

update from joeyh, there is no need for any specific hack to maintain a local mirror, it is just an undocumented feature:

Mar 25 2018, 3:47 PM · Hackage loader, Hackage lister, Archive coverage

Mar 24 2018

zack triaged T1002: ingest Hackage, the Haskell package repository (meta task) as Normal priority.
Mar 24 2018, 10:26 PM · Hackage loader, Hackage lister, Archive coverage

Mar 7 2018

olasd closed T537: Update the task scheduler and the task event listener to use the new partial status, a subtask of T533: Allow loaders to register partial state (meta task), as Resolved.
Mar 7 2018, 4:43 PM · General

Feb 15 2018

anlambert closed T949: swh-web: Display origin-visit's details using snapshots, a subtask of T565: embrace repository snapshot object in the data model (meta task), as Resolved.
Feb 15 2018, 3:35 PM · General

Feb 6 2018

olasd added a comment to T565: embrace repository snapshot object in the data model (meta task).

swh-loader-git and swh-loader-debian have now been migrated to snapshots as well, and restarted.

Feb 6 2018, 4:30 PM · General

Feb 2 2018

ardumont added a comment to T565: embrace repository snapshot object in the data model (meta task).

Current status on the development migration towards snapshot (branch wip/snapshot(s)) as far as I know:

Feb 2 2018, 11:19 AM · General

Jan 14 2018

zack closed T335: specify the URI scheme swh:... to point to software heritage objects as Resolved.

Closed in rDMODb61c6665661c823080192b351af4744dddb35f1e

Jan 14 2018, 10:32 PM · General
zack closed T335: specify the URI scheme swh:... to point to software heritage objects, a subtask of T337: specify a manifest format for documenting archived software, as Resolved.
Jan 14 2018, 10:32 PM · General

Jan 12 2018

zack added a comment to T335: specify the URI scheme swh:... to point to software heritage objects.

yeah, i was thinking about it while running earlier on today :) i'm not yet sure if i'll specify the meaning of the sha1 of each object here, or just say that the sha1 is the primary key of the object and refer to swh-model, we'll see

Jan 12 2018, 11:10 PM · General
olasd added a comment to T335: specify the URI scheme swh:... to point to software heritage objects.

When writing the documentation, please be sure to be explicit whether content identifier its sha1 or its salted sha1_git, because that's not clear which it is from this discussion :)

Jan 12 2018, 7:13 PM · General
zack changed the status of T335: specify the URI scheme swh:... to point to software heritage objects from Open to Work in Progress.
In T335#16990, @zack wrote:
identifier = "swh" ":" scheme_version ":" obj_type ":" obj_id ;
scheme_version = "1" ;
obj_type =
    "snp"  # snapshot
  | "rel"  # release
  | "rev"  # revision
  | "dir"  # directory
  | "cnt"  # content
  ;
obj_id = object sha1, hex-encoded with (lowercase) ASCII characters ;
Jan 12 2018, 6:11 PM · General
zack changed the status of T335: specify the URI scheme swh:... to point to software heritage objects, a subtask of T337: specify a manifest format for documenting archived software, from Open to Work in Progress.
Jan 12 2018, 6:11 PM · General
zack added a comment to T335: specify the URI scheme swh:... to point to software heritage objects.

in the future, if we switch to blake2/256 (or equivalent length checksums), the examples would become something like:

Jan 12 2018, 2:11 PM · General
zack claimed T335: specify the URI scheme swh:... to point to software heritage objects.
Jan 12 2018, 2:05 PM · General
zack raised the priority of T335: specify the URI scheme swh:... to point to software heritage objects from Normal to High.

concrete, tentative proposal (EBNF):

identifier = "swh" ":" scheme_version ":" obj_type ":" obj_id ;
scheme_version = "1" ;
obj_type =
    "snp"  # snapshot
  | "rel"  # release
  | "rev"  # revision
  | "dir"  # directory
  | "cnt"  # content
  ;
obj_id = object sha1, hex-encoded with (lowercase) ASCII characters ;
Jan 12 2018, 2:04 PM · General
zack added a parent task for T335: specify the URI scheme swh:... to point to software heritage objects: T926: Web UI: support resolution of external pointers into the archive.
Jan 12 2018, 1:53 PM · General

Jan 8 2018

ardumont closed T718: SWORD deposit: backend server, a subtask of T647: support software deposit via SWORD protocol (meta task), as Resolved.
Jan 8 2018, 12:34 PM · SWORD deposit, General

Jan 3 2018

zack added a parent task for T647: support software deposit via SWORD protocol (meta task): T716: Integration HAL - Software Heritage (metatask).
Jan 3 2018, 10:33 AM · SWORD deposit, General

Dec 20 2017

ardumont changed the status of T329: hg / mercurial loader, a subtask of T807: dogfooding: ingest the Software Heritage forge into the archive (via the canonical URLs), from Open to Work in Progress.
Dec 20 2017, 11:42 AM · General

Dec 19 2017

ardumont renamed T908: mercurial loader: Define scheduler task(s) from mercurial loader: Define scheduler task to mercurial loader: Define scheduler task(s).
Dec 19 2017, 2:06 PM · Mercurial loader