Page MenuHomeSoftware Heritage

Add LevelDB backend for exporter node sets
ClosedPublic

Authored by seirl on Mar 23 2021, 10:13 PM.

Details

Summary

SQLite seems to slow down over time when a lot of data is inserted in
it. This is an attempt at using LevelDB to have more efficient node
sets.

Q: should we make sqlite/leveldb configurable, in case we want to support setups where it's not easy to install leveldb?

Diff Detail

Repository
rDDATASET Datasets
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

seirl created this revision.
seirl added a reviewer: Reviewers.
vlorentz added a subscriber: vlorentz.

Sorry for the long delay, I didn't see you diff for some reason...

swh/dataset/utils.py
103–105

Use this instead:

try:
    import plyvel
except ImportError:
    plyvel = None

it spares invoking the import machinery every time you use LevelDBSet

This revision is now accepted and ready to land.Mar 26 2021, 11:21 AM

Remove phabricator garbage

This revision was landed with ongoing or failed builds.Mar 26 2021, 12:07 PM
This revision was automatically updated to reflect the committed changes.

Build is green

Patch application report for D5315 (id=19176)

Rebasing onto af5e4614c8...

Current branch diff-target is up to date.
Changes applied before test
commit 9523be0552d822be617da77bf0d2ca2f479da572
Author: Antoine Pietri <antoine.pietri1@gmail.com>
Date:   Tue Mar 23 23:42:52 2021 +0000

    Model test data: add Release with no author/date
    
    Some releases don't have authors and date fields, this case should be
    checked in the tests.

See https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/292/ for more details.

This revision is now accepted and ready to land.Mar 26 2021, 2:27 PM

Rebase + fix phabricator incorrect ID