HomeSoftware Heritage

luigi: Dynamically list directories instead of using object_types

This commit no longer exists in the repository. It may have been part of a branch which was deleted.

Description

luigi: Dynamically list directories instead of using object_types

Before this commit, UploadExportToS3 and DownloadExportFromS3 assumed the
set of object types was the same as the set of directories, which is wrong:

  • for the edges format, there is no origin_visit or origin_visit_status directory
  • for both edges and orc formats, this was missing relational tables.

A possible fix would have been to use the swh.dataset.relational.TABLES
constant and keep ignoring non-existing dirs in the edges, but I decided to
simply list directories instead, as it will prevent future issues if we
decide to add directories that do not match any table in Athena for
whatever reason.

Details

Provenance
vlorentzAuthored on Dec 16 2022, 3:34 PM
vlorentzPushed on Dec 20 2022, 10:04 AM
Differential Revision
D8965: luigi: Dynamically list directories instead of using object_types
Build Status
Buildable 33312
Build 52212: test-and-buildJenkins console · Jenkins

Commit No Longer Exists

This commit no longer exists in the repository.