This is a 40 to 70% speed-up of the overall run time (wall clock).
Details
Details
- Reviewers
seirl douardda - Group Reviewers
Reviewers - Commits
- rDGRPH424e75a9d0f8: bytes_to_str: Format strings directly, instead of constructing ExtendedSWHID
Diff Detail
Diff Detail
- Repository
- rDGRPH Compressed graph representation
- Branch
- optim
- Lint
No Linters Available - Unit
No Unit Test Coverage - Build Status
Buildable 22951 Build 35785: Phabricator diff pipeline on jenkins Jenkins console · Jenkins Build 35784: arc lint + arc unit
Event Timeline
Comment Actions
Build is green
Patch application report for D6073 (id=21987)
Could not rebase; Attempt merge onto a48b5be584...
Updating a48b5be..424e75a Fast-forward swh/graph/server/app.py | 14 ++++++++++++-- swh/graph/swhid.py | 13 +++++++++---- 2 files changed, 21 insertions(+), 6 deletions(-)
Changes applied before test
commit 424e75a9d0f888c43c75fa7e9fef8b7d46716514 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Aug 10 12:23:00 2021 +0200 bytes_to_str: Format strings directly, instead of constructing ExtendedSWHID This is a 40 to 70% speed-up of the overall run time (wall clock). commit b54ed982e2039e0bca87cbe17dd63aa667db6d40 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Aug 10 12:03:36 2021 +0200 StreamingGraphView: Buffer lines before writing Most of the time is spent maxing out the CPU in the Python process. This change has two effects: 1. lines are joined before being encoded (instead of encoding them one-by-one) 2. larger network packets are sent, instead of a single packet per line I don't know which affects the performance, but overall, this is a consistent 25 to 35% speed-up to the overall run time of SimpleTraversalView.
See https://jenkins.softwareheritage.org/job/DGRPH/job/tests-on-diff/132/ for more details.
swh/graph/swhid.py | ||
---|---|---|
97 | if you want speed, why not also cut the hash_to_hex call and simply use .hex() ? quick stupid test showed a x2 factor between the 2 on my laptop (just a timeit in ipython of building 1k swhid list) In [21]: %timeit z = [str(ExtendedSWHID(object_type=ExtendedObjectType.REVISION, object_id=v)) for v in h] 6.14 ms ± 24.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) In [22]: %timeit z = [f"swh:1:{ExtendedObjectType.REVISION.value}:{hash_to_hex(v)}" for v in h] 624 µs ± 1.21 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) In [23]: %timeit z = [f"swh:1:{ExtendedObjectType.REVISION.value}:{v.hex()}" for v in h] 359 µs ± 15.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) |
swh/graph/swhid.py | ||
---|---|---|
97 | hash_to_hex is cached |
swh/graph/swhid.py | ||
---|---|---|
97 | yeah, well, it's currently cached with default lru_cache maxsize, which is a very small 128, so I'm not sure it's a lifesaver here. And you can just lru_cache this byte_to_str function :-) Do we have an idea of the average cache-hit ratio we have when used in swh-graph? |
swh/graph/swhid.py | ||
---|---|---|
97 | I don't, but it's probably very low. |