HomeSoftware Heritage

luigi: Dynamically list directories instead of using object_types

Description

luigi: Dynamically list directories instead of using object_types

Before this commit, UploadExportToS3 and DownloadExportFromS3 assumed the
set of object types was the same as the set of directories, which is wrong:

  • for the edges format, there is no origin_visit or origin_visit_status directory
  • for both edges and orc formats, this was missing relational tables.

A possible fix would have been to use the swh.dataset.relational.TABLES
constant and keep ignoring non-existing dirs in the edges, but I decided to
simply list directories instead, as it will prevent future issues if we
decide to add directories that do not match any table in Athena for
whatever reason.

Details