Page MenuHomeSoftware Heritage

Add support for Python eggs file processing
Closed, MigratedEdits Locked

Description

Some PyPI origins declare sdist archives as Python eggs files, a legacy format used before Python wheels became
the standard for distributing Python modules, see Djangy 0.7 for instance.

Those files are simple ZIP archives and contain a PKG-INFO file located in an EGG-INFO folder but currently
they cannot be processed by the PyPI loader.

We should add support for loading those type of artifacts into the archive by:

  • adding support to uncompress files with .egg extension in swh.core.tarball
  • adapting swh.loader.package.pypi.loader.extract_intrinsic_metadata to parse the EGG-INFO/PKG-INFO file

Event Timeline

anlambert triaged this task as Normal priority.Sep 15 2021, 1:58 PM
anlambert created this task.
anlambert updated the task description. (Show Details)
anlambert updated the task description. (Show Details)
anlambert claimed this task.
anlambert updated the task description. (Show Details)

Apparently we decided not to archive them so better filtering those files out as proposed in T3575.