Page MenuHomeSoftware Heritage

scanner import db is slow, improve its performances
Closed, MigratedEdits Locked

Description

As per title, currently importing a list of ~30 M SWHIDs to an sqlite database using swh scanner db import takes about 30 minutes (tested on granet).
It can very likely be much faster.
We should try the following classic approaches to make bulk import into SQLite faster:

  • use pragma synchronous = off (yes, it is unsafe, but the conversion to sqlite is all of nothing anyway, so it's not a factor here)
  • use pragma journal_mode = off (ditto)
  • wrap all imports in a single transaction
  • verify we are using prepared statements for inserts

Event Timeline

zack triaged this task as Low priority.Nov 25 2020, 10:00 PM
zack created this task.
zack renamed this task from scanner: improve SWHID (txt) -> sqlite import time to scanner import db is slow, improve its performances.Dec 15 2020, 5:48 PM
zack updated the task description. (Show Details)