Page MenuHomeSoftware Heritage

Data ModelFolder
ActivePublic

Members

  • This project does not have any members.

Watchers

  • This project does not have any watchers.

Recent Activity

Today

ardumont renamed T2211: Go beyond git expressivity from Go beyound git expressivity to Go beyond git expressivity.
Wed, Jan 29, 6:43 PM · Mercurial loader, Storage manager, Data Model, Restricted Project

Wed, Jan 22

vlorentz added a project to T2210: Data Model: Data Model.
Wed, Jan 22, 4:39 PM · Data Model, Restricted Project
vlorentz added projects to T2211: Go beyond git expressivity: Data Model, Storage manager, Mercurial loader.
Wed, Jan 22, 4:38 PM · Mercurial loader, Storage manager, Data Model, Restricted Project
vlorentz added a project to T2212: Specification for swh:2+: identifiers: Data Model.
Wed, Jan 22, 4:38 PM · Data Model, Restricted Project
vlorentz closed T1731: Intrinsic identifiers for origins as Resolved.

in addition we will also need to modify storage to store and allow retrieval of hashed origin URLs.

Wed, Jan 22, 2:18 PM · Storage manager, Data Model

Oct 4 2019

zack closed T1986: swh.model.identifiers: move validation from parsing_persistent_identifier to PersistentId constructor as Resolved by committing rDMOD375832f51bf5: PID: move validation checks to PersistentId constructor.
Oct 4 2019, 7:11 PM · Easy hack, Data Model

Sep 5 2019

zack triaged T1986: swh.model.identifiers: move validation from parsing_persistent_identifier to PersistentId constructor as Low priority.
Sep 5 2019, 5:39 PM · Easy hack, Data Model

Aug 20 2019

olasd added a comment to T1957: Handling missing DAG nodes.

I think objects that we refuse to archive because of policy (that is, currently, contents larger than 100MB) also fit that description.

Aug 20 2019, 10:57 AM · Data Model
zack updated the task description for T1957: Handling missing DAG nodes.
Aug 20 2019, 10:34 AM · Data Model
vlorentz triaged T1957: Handling missing DAG nodes as Normal priority.
Aug 20 2019, 9:59 AM · Data Model
vlorentz added parent tasks for T1957: Handling missing DAG nodes: T611: support for external definitions in the svn/subversion loader, T1617: Experiment with generation numbers to improve revisions walk performance.
Aug 20 2019, 9:59 AM · Data Model
vlorentz created T1957: Handling missing DAG nodes.
Aug 20 2019, 9:58 AM · Data Model

Jul 10 2019

vlorentz added a project to T1731: Intrinsic identifiers for origins: Storage manager.
Jul 10 2019, 5:01 PM · Storage manager, Data Model
vlorentz reopened T1731: Intrinsic identifiers for origins as "Open".
Jul 10 2019, 4:59 PM · Storage manager, Data Model
vlorentz closed T1731: Intrinsic identifiers for origins as Resolved.
Jul 10 2019, 4:59 PM · Storage manager, Data Model
vlorentz added a parent task for T1731: Intrinsic identifiers for origins: T1892: Cassandra as a storage backend.
Jul 10 2019, 3:07 PM · Storage manager, Data Model

Jul 8 2019

zack updated subscribers of T1731: Intrinsic identifiers for origins.

As it turns out, intrinsic origin identifiers are indeed handy for graph compression, so I'd like to see this task resolved.

Jul 8 2019, 11:25 AM · Storage manager, Data Model

Jun 30 2019

zack added a parent task for T1731: Intrinsic identifiers for origins: T1867: compress Merkle DAG and origin nodes together.
Jun 30 2019, 1:56 PM · Storage manager, Data Model

Jun 17 2019

ardumont closed D1589: storage-db: Fixing migration scripts 135-136.
Jun 17 2019, 10:12 AM · Data Model, Storage manager
vlorentz accepted D1589: storage-db: Fixing migration scripts 135-136.
Jun 17 2019, 10:05 AM · Data Model, Storage manager

Jun 15 2019

swh-public-ci added a comment to D1589: storage-db: Fixing migration scripts 135-136.

Build is green
See https://jenkins.softwareheritage.org/job/DSTO/job/tox/493/ for more details.

Jun 15 2019, 8:46 AM · Data Model, Storage manager
ardumont added inline comments to D1589: storage-db: Fixing migration scripts 135-136.
Jun 15 2019, 8:45 AM · Data Model, Storage manager
ardumont added projects to D1589: storage-db: Fixing migration scripts 135-136: Storage manager, Data Model.
Jun 15 2019, 8:42 AM · Data Model, Storage manager

Jun 6 2019

vlorentz added a revision to T1731: Intrinsic identifiers for origins: D1557: Add a 'type' column to origin_visit..
Jun 6 2019, 4:18 PM · Storage manager, Data Model
vlorentz added a revision to T1731: Intrinsic identifiers for origins: D1556: Add support for omitting origin type when calling origin_get..
Jun 6 2019, 4:15 PM · Storage manager, Data Model

Jun 5 2019

vlorentz updated subscribers of T1731: Intrinsic identifiers for origins.

Summary of IRL chat with @moranegg @zack @olasd :

Jun 5 2019, 10:47 AM · Storage manager, Data Model

Jun 4 2019

zack added a comment to T1731: Intrinsic identifiers for origins.

Just a couple of comments:

  • the current proposal is ori instead of org as 3-letter stem
  • your use cases are all valid, but would equally work with a full URL and with a hashed URL
Jun 4 2019, 2:40 PM · Storage manager, Data Model
moranegg added a comment to T1731: Intrinsic identifiers for origins.

I'm for the hashed origin only if we make it available as an identifier under our PID schema:

Jun 4 2019, 2:26 PM · Storage manager, Data Model

May 29 2019

vlorentz added a revision to T1731: Intrinsic identifiers for origins: D1523: Add origin persistent identifiers..
May 29 2019, 6:04 PM · Storage manager, Data Model
zack added a comment to T1731: Intrinsic identifiers for origins.

Okay then. I'll work on updating the identifier specification.

So, again, what are the remaining issues that inhibits you to just go ahead and use URI hashes as Cassandra origin IDs?

Those I listed above, which were more "philosophical" than technical. I started implementing it last Monday anyway, and it looks good.

May 29 2019, 6:04 PM · Storage manager, Data Model
vlorentz added a comment to T1731: Intrinsic identifiers for origins.

Okay then. I'll work on updating the identifier specification.

May 29 2019, 3:11 PM · Storage manager, Data Model
zack added a comment to T1731: Intrinsic identifiers for origins.

In this case, we'll also need to have an identifier for URL + type, if they want to cite/link to the non-default one.
We could use the "contextual information" mechanism, eg. swh:1:ori:SHA1;type=git

May 29 2019, 11:44 AM · Storage manager, Data Model
vlorentz added a comment to T1731: Intrinsic identifiers for origins.

show the first one by default, and allow the user to pick another one.

May 29 2019, 11:24 AM · Storage manager, Data Model

May 28 2019

zack added a comment to T1731: Intrinsic identifiers for origins.

This sounds like a good idea.
But it has some weird implications on components that use the concept of "origin head" (web UI and metadata indexers); because they'll use radically different content depending on which loader visited last.
But having two VCSs at the same URL is weird in itself, so 🤷

May 28 2019, 5:09 PM · Storage manager, Data Model

May 23 2019

ardumont added a comment to T1731: Intrinsic identifiers for origins.

One way to answer the question use the hash vs tuple (or plain url) is to know whether those identifier are destined to be persistent ones or not.
If they do, the hash would be more consistent with the existing ones (swh:1:ori:<hash>?).
Also, they'd be simpler to use (read/type) in a url (vs a url within a url).

May 23 2019, 11:07 AM · Storage manager, Data Model

May 22 2019

vlorentz added a comment to T1731: Intrinsic identifiers for origins.

This sounds like a good idea.

May 22 2019, 12:23 PM · Storage manager, Data Model
zack added a comment to T1731: Intrinsic identifiers for origins.

Tangential, but impactful on this discussion, we have had in the past a discussion about removing origin types from our notion of origin (there might be a task about it, but I couldn't find it right now).

May 22 2019, 12:01 PM · Storage manager, Data Model
vlorentz triaged T1731: Intrinsic identifiers for origins as Normal priority.
May 22 2019, 11:25 AM · Storage manager, Data Model

Apr 24 2019

anlambert triaged T1687: Add filename as an optional part in persistent identifiers as Normal priority.
Apr 24 2019, 11:24 AM · Data Model

Apr 11 2019

anlambert triaged T1645: Associating multiple authors to a revision as Normal priority.
Apr 11 2019, 4:35 PM · Data Model

Jul 12 2018

zack closed T1135: swh-identify: follow symlink by default for paths given as args as Resolved by committing rDMOD07208f047d18: swh-identify: follow symlinks for CLI arguments (by default).
Jul 12 2018, 4:22 PM · Data Model
zack closed T1134: swh-identify: support multiple path arguments as Resolved by committing rDMOD89f8d114b4f9: swh-identify: add support for passing multiple CLI arguments.
Jul 12 2018, 3:32 PM · Data Model
zack closed T1133: swh-identify: show filename in output as Resolved by committing rDMODf53989093669: swh-identify: show filename in output (by default).
Jul 12 2018, 3:01 PM · Data Model
zack closed T1133: swh-identify: show filename in output, a subtask of T1136: swh-identify: support recursive checksumming of directories, as Resolved.
Jul 12 2018, 3:01 PM · Data Model
zack added a parent task for T1133: swh-identify: show filename in output: T1136: swh-identify: support recursive checksumming of directories.
Jul 12 2018, 2:19 PM · Data Model
zack added a subtask for T1136: swh-identify: support recursive checksumming of directories: T1133: swh-identify: show filename in output.
Jul 12 2018, 2:19 PM · Data Model
zack triaged T1136: swh-identify: support recursive checksumming of directories as Normal priority.
Jul 12 2018, 2:19 PM · Data Model
zack triaged T1135: swh-identify: follow symlink by default for paths given as args as Normal priority.
Jul 12 2018, 2:16 PM · Data Model
zack created T1135: swh-identify: follow symlink by default for paths given as args.
Jul 12 2018, 2:16 PM · Data Model
zack updated the task description for T1133: swh-identify: show filename in output.
Jul 12 2018, 2:04 PM · Data Model