Page MenuHomeSoftware Heritage

Push of swh-graph to pypi is broken
Closed, ResolvedPublic

Description

The push to pypi of swh-graph build if broken:

100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 113.9/113.9 MB • 00:04 • 40.1 MB/s
100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 113.9/113.9 MB • 00:04 • 40.1 MB/s
14:36:23  [?25hINFO     Response from https://upload.pypi.org/legacy/:                         
14:36:23           400 File too large. Limit for project '****.graph' is 100 MB. See       
14:36:23           https://pypi.org/help/#file-size-limit for more information.           
14:36:23  INFO     <html>                                                                 
14:36:23            <head>                                                                
14:36:23             <title>400 File too large. Limit for project '****.graph' is 100 MB.  
14:36:23           See https://pypi.org/help/#file-size-limit for more                    
14:36:23           information.</title>                                                   
14:36:23            </head>                                                               
14:36:23            <body>                                                                
14:36:23             <h1>400 File too large. Limit for project '****.graph' is 100 MB. See 
14:36:23           https://pypi.org/help/#file-size-limit for more information.</h1>      
14:36:23             The server could not comply with the request since it is either      
14:36:23           malformed or otherwise incorrect.<br/><br/>                            
14:36:23           File too large. Limit for project &#x27;****.graph&#x27; is 100 MB. See 
14:36:23           https://pypi.org/help/#file-size-limit for more information.           
14:36:23                                                                                  
14:36:23                                                                                  
14:36:23            </body>                                                               
14:36:23           </html>                                                                
14:36:30  [Pipeline] }
14:36:30  [Pipeline] // withCredentials
14:36:30  [Pipeline] }
14:36:30  [Pipeline] // stage
14:36:30  [Pipeline] stage
14:36:30  [Pipeline] { (Declarative: Post Actions)
14:36:30  [Pipeline] cleanWs
14:36:30  [WS-CLEANUP] Deleting project workspace...
14:36:30  [WS-CLEANUP] Deferred wipeout is used...
14:36:30  [WS-CLEANUP] done
14:36:30  [Pipeline] }
14:36:30  [Pipeline] // stage
14:36:30  [Pipeline] }
14:36:30  $ docker stop --time=1 894d0b35172314a17b73427fe5af6b85ed1d9296641fb15618cbe3064f404ffd
14:36:32  $ docker rm -f 894d0b35172314a17b73427fe5af6b85ed1d9296641fb15618cbe3064f404ffd
14:36:33  [Pipeline] // withDockerContainer
14:36:33  [Pipeline] }
14:36:33  [Pipeline] // withDockerRegistry
14:36:33  [Pipeline] }
14:36:33  [Pipeline] // withEnv
14:36:33  [Pipeline] }
14:36:33  [Pipeline] // node
14:36:33  [Pipeline] End of Pipeline
14:36:33  ERROR: script returned exit code 1
14:36:33  Finished: FAILURE

For example: https://jenkins.softwareheritage.org/view/swh%20master/job/DGRPH/job/pypi-upload/27/console

Event Timeline

vsellier triaged this task as High priority.Jun 8 2022, 2:28 PM
vsellier created this task.

IIRC, we cannot reduce the size; and I think it is unreasonable to ask PyPI for a higher limit.

The right fix is to remove all Python code from the swh-graph server, so the swh-graph PyPI package only contains the client. @seirl is already working on this, I believe

We've asked for another bump at https://github.com/pypa/pypi-support/issues/1998.

swh.graph, even after the gRPC refactoring, will still ship a aiohttp server shim for the "REST-like" interfaces used by the python client, so we'll still need to package that as python.

There's a couple of ideas from the latest irc exchange on the issue, on what could happen to actually reduce the python package size:

  • properly packaging and publishing the java artifacts as maven packages instead of python packages
    • pros
      • reduces the sdist size
      • we'll need to do that if we want people to use swh.graph as a java library
    • cons
      • need to implement proper maven publication scaffolding in jenkins
      • need to call out to maven when installing the server components, if we want to still have them pip installable
  • building and distributing the jar file as a downloadable asset instead of shipping it in the sdist
    • pros
      • reduces the sdist size
      • probably a minimal change wrt the status quo: just need to publish the already built jar "somewhere" (instead of in the sdist) and have a mechanism to fetch it in the python code
    • cons
      • need to serve the asset somehow
      • need to implement authenticated distribution (e.g. record the hashes of the jar file in the python sdist, and verify it on download)

For future reference, it looks like we are still "small" players as "big" packages go on PyPI: https://pypi.org/stats/ (e.g., tf-nightly is currently the largest package on PyPI and it weights 427 GiB).
While it is still not nice to ship a big fat JAR in a PyPI package, our extension requests will likely be granted.