Page MenuHomeSoftware Heritage

Implement a MongoDB backend for SWH-provenance
Closed, MigratedEdits Locked


Steps for phase 1

  • Finding an initial experimental data model
  • Make an MVP (with the new backend and a sample dataset)
  • Run some experiments
  • Scaling/performance improvements
  • Change the data model and indexes accordingly
  • Finding the hosting strategy (single server vs multiple instance)
  • Populate with prod data and testing
  • Deployment

Phase 2:

  • Support to work with incremental data stream

Event Timeline

jayeshv triaged this task as Normal priority.
jayeshv created this task.
zack renamed this task from Implement a MonoDB backend for SWH-provenance to Implement a MongoDB backend for SWH-provenance .Jul 15 2021, 10:52 AM
jayeshv updated the task description. (Show Details)

Mongo engine

"name" : "wiredTiger",
"supportsCommittedReads" : true,
"oldestRequiredTimestampForCrashRecovery" : Timestamp(0, 0),
"supportsPendingDrops" : true,
"dropPendingIdents" : NumberLong(0),
"supportsSnapshotReadConcern" : true,
"readOnly" : false,
"persistent" : true,
"backupCursorOpen" : false,
"supportsResumableIndexBuilds" : true

Data model

content {
        id: sha1
        ts    // optional
        revision: {<ref revision str>: [<ref path>]}
        directory: {<ref directory str>: [<ref path>]}

directory {
          id: sha1
          ts  //optional
          revision: {<ref revision str>: [<ref path>]}

revision {
         ts - optional
         preferred  <ref origin>  //optinal
         origin  [<ref origin>]
         revision [<ref revisions>]

origin {

path {

Python interface
Will use pymongo, the low level driver to interface with the db.

db version v5.0.1
Build Info: {
    "version": "5.0.1",
    "gitVersion": "318fd9cabc59dc9651f3189b622af6e06ab6cd33",
    "openSSLVersion": "OpenSSL 1.1.1f  31 Mar 2020",
    "modules": [],
    "allocator": "tcmalloc",
    "environment": {
        "distmod": "ubuntu2004",
        "distarch": "x86_64",
        "target_arch": "x86_64"
gitlab-migration changed the status of subtask T3557: Run experiments against the MongoDB backend from Wontfix to Migrated.
gitlab-migration changed the status of subtask T3561: Implement the initial MonogDB backend with a simple data model from Resolved to Migrated.