Page MenuHomeSoftware Heritage

Investigate if/how we could improve `swh` cli command startup time
Closed, MigratedEdits Locked

Description

Currently, usage of the swh "master" cli tool suffers from pretty slow loading perfs. Seems it's not so lazy. It would be nice to be able to kick a bit its startup time.

In my (venv-based) dev environment, I get:

time swh 
Usage: swh [OPTIONS] COMMAND [ARGS]...

  Command line interface for Software Heritage.

Options:
  -l, --log-level [NOTSET|DEBUG|INFO|WARNING|ERROR|CRITICAL]
                                  Log level (defaults to INFO).
  --log-config FILENAME           Python yaml logging
                                  configuration file.

  --sentry-dsn TEXT               DSN of the Sentry instance to
                                  report to

  -h, --help                      Show this message and exit.

Notes:
  If both options are present, --log-level will override the
  root logger configuration set in --log-config.

  The --log-config YAML must conform to the
  logging.config.dictConfig schema documented at
  https://docs.python.org/3/library/logging.config.html.

Commands:
  auth              Authenticate Software Heritage users with...
  content-replayer  Software Heritage Objstorage tools.
  dataset           Software Heritage Dataset Tools
  db                Software Heritage database generic tools.
  db-init           Initialise a database for the Software...
  deposit           Deposit main command
  graph             Software Heritage graph tools.
  icinga_plugins    Main command for Icinga plugins
  identify          Compute the Software Heritage persistent...
  indexer           Software Heritage Indexer tools.
  lister            Software Heritage Lister tools.
  loader            Loader cli tools
  objstorage        Software Heritage Objstorage tools.
  scanner           Software Heritage Scanner tools.
  scheduler         Software Heritage Scheduler tools.
  search            Software Heritage Search tools.
  storage           Software Heritage Storage tools.
  vault             Software Heritage Vault tools.

real	0m1.694s
user	0m1.766s
sys	0m0.803s

And

PYTHONPROFILEIMPORTTIME=1 swh  2>&1 | grep 'import time:' | wc -l
2622

I'm not convinced we are obliged to import 2k6+ packages to load most swh cli commands.

Revisions and Commits

rDSCH Scheduling utilities
D3907
rDGRPH Compressed graph representation
D3910
rDSEA Archive search
D3920
rDWCLI Web client
D3923
D3923
rDDEP Push deposit
D3909
rDLDBASE Generic VCS/Package Loader
D3915
rDCIDX Metadata indexer
D3913
rDTSCN Code scanner
D3919
rDSTO Storage manager
D3921
rDDATASET Datasets
D3908
rDLS Listers
D3914
rDVAU Software Heritage Vault
D3922
rDMOD Data model
D3916
rDOBJS Object storage
D3917
D3917
rDCORE Foundations and core functionalities
D3901
rDICP Icinga plugins
D3912

Event Timeline

douardda triaged this task as Normal priority.Sep 8 2020, 12:44 PM
douardda created this task.

FTR, many of the cli.py (or cli/ packages) do import way too much stuff by default.
For this swh command to behave half decently, we need to be as lazy as possible regarding importing stuff, i.e. only import when needed, in @cli.command functions.

Testing this I've been able to drop the startup time above below 1s (which remains way too much). But one "faulty" package is enough to ruin the whole lazy-loading thing, so we must chase any import statement in any cli.py file.

douardda claimed this task.

Let's consider this done for now. Still not "great" loading time, but much better than before.