Page MenuHomeSoftware Heritage

license dataset: add logic to convert/import dataset into a SQL database
Closed, MigratedEdits Locked

Description

As the dataset is naturally tabular, it is handy to query it SQL-style.
We still want to retain the simplicity of CSV files (as they can be imported separately into, e.g., Pandas), but also make life easier for SQL users.
A way to do that (suggested by @vlorentz) is to provide documentation and/or logic to import CSV data into a SQL database.

With the current version of the dataset, that can be achieved for all CSV-based information and targeting sqlite3 with the snippet P1529.

As the logic is a bit cumbersome right now due to T4683, we probably want to wait for that to be fixed before advertising this.

Also, providing similar logic for Postgres would be nice as well.
(Postgres will also be able to support the fulle JSON Scancode data, which would be a nice plus.)