Page MenuHomeSoftware Heritage

swh scanner db import loads keeps all input SWHIDs in memory
Closed, MigratedEdits Locked

Description

As per title. This is done to avoid duplicates, but will make it impossible to load very large SWHID lists into an sqlite knowledge base.
Duplicate avoidance is, in fact, pointless at this level, because SWHIDs are in a primary key column, so if a duplicate insert is attempted, the DB will complain anyway.
We should read the input in streaming fashion (possibly chunking, if that helps insert performances) instead.