Page MenuHomeSoftware Heritage

Add LevelDB backend for exporter node sets
ClosedPublic

Authored by seirl on Mar 23 2021, 10:13 PM.

Details

Summary

SQLite seems to slow down over time when a lot of data is inserted in
it. This is an attempt at using LevelDB to have more efficient node
sets.

Q: should we make sqlite/leveldb configurable, in case we want to support setups where it's not easy to install leveldb?

Diff Detail

Repository
rDDATASET Datasets
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 20277
Build 31474: arc lint + arc unit

Event Timeline

seirl created this revision.
seirl added a reviewer: Reviewers.
vlorentz added a subscriber: vlorentz.

Sorry for the long delay, I didn't see you diff for some reason...

swh/dataset/utils.py
103–105

Use this instead:

try:
    import plyvel
except ImportError:
    plyvel = None

it spares invoking the import machinery every time you use LevelDBSet

This revision is now accepted and ready to land.Mar 26 2021, 11:21 AM

Remove phabricator garbage

This revision was landed with ongoing or failed builds.Mar 26 2021, 12:07 PM
This revision was automatically updated to reflect the committed changes.

Build is green

Patch application report for D5315 (id=19176)

Rebasing onto af5e4614c8...

Current branch diff-target is up to date.
Changes applied before test
commit 9523be0552d822be617da77bf0d2ca2f479da572
Author: Antoine Pietri <antoine.pietri1@gmail.com>
Date:   Tue Mar 23 23:42:52 2021 +0000

    Model test data: add Release with no author/date
    
    Some releases don't have authors and date fields, this case should be
    checked in the tests.

See https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/292/ for more details.

This revision is now accepted and ready to land.Mar 26 2021, 2:27 PM

Rebase + fix phabricator incorrect ID