HomeSoftware Heritage

Map a row from clearcode toolkit with software heritage archive


Map a row from clearcode toolkit with software heritage archive

This is to build a mechanism to map a row [path(Primary Key), content(binary data), last_modified_date(timestamp with timezone), map_error(error message while mapping), uuid] from clearcode toolkit database, with software heritage archive using content table for sha1 and revision table for sha1_git and extract required information from that row. Then return list of data that has been mapped and mapping status(if able to map every hash of that row, will return True, else return False) so the row that is not being able to map for now can be stored in a state, and can be mapped in future.

Add various exception classes in that can be raised while mapping a row. Check if that row is a definition or harvest and also check if that row does not has invalid path, raise exception if path is invalid. If row is a definiton then map the data using map_definition and if it is a harvest then map it using map_harvest. Use storage instead of sql queries while mapping with the data inside archive. Add tests to cover all the cases and add docstrings to explain how every function works.

Signed-off-by: Tushar Goel <>


TG1999Authored on Feb 2 2021, 10:41 AM
TG1999Pushed on Feb 2 2021, 10:47 AM
Differential Revision
D4931: Add mapping of definitions and harvests
rDMFCD6d70494b713e: Add mapping of sha1 with SWH ID
Build Status
Buildable 18938
Build 29347: test-and-buildJenkins console · Jenkins