commit 313cc97581a06b3cfe8fc561d4e0b38ccccf152c (HEAD -> faster_validation, D8512) Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org> Date: Thu Sep 22 14:55:26 2022 +0200 model: inline the call to `_check_swhid` This reduce the number of function call and should be faster. The mashup of blind optimisation in the previous changeset yield some interesting results in total. It would be insightful to measure them individually, but that would take more time than we currently have. When testing all the validator changes on our previous "benchmark" we see quite interesting improvement. swh loader run mercurial https://foss.heptapod.net/mercurial/mercurial-devel directory=/data/repos/mercurial-devel = Median time of 3 run = base: 17 minutes 48 seconds before: 11 minutes 50 seconds after: 11 minutes 11 seconds On a profile of the same run, the `to_model` call of the from_disk's `Directory` class took the following percentage: base: 43% before: 15% after: 11% commit 54a19a2aa522787ec368f3707336824db7f13b76 Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org> Date: Thu Sep 22 14:48:03 2022 +0200 model: optimization pass on custom validator (This commit is actually doing two things /o\) - we inline the type-checking in the custom validator to reduce the number of function call. - we optimize some of the custom validator by skipping the creation of intermediate tuples. commit 14ba3c3e19f8c0d8505b4428aa158a594cca6f9d Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org> Date: Thu Sep 22 11:50:39 2022 +0200 model: delete unused validator code Since all `generic_type_validator` are optimized away, the code will no longer be called. So we remove that code to avoid any drifting. A nice "exception" is provided in case this start getting called again in the future. commit b851355ccf71b4d66bfcf11025f40fe8de5275f5 Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org> Date: Thu Sep 22 11:40:40 2022 +0200 model: remove the try/except Since try/except context are known to be expensive in Python, it seems useful to remove them. commit 6add176754eefb6c7b37c7a0b21695ae1ae8f7b9 Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org> Date: Tue Sep 20 18:53:01 2022 +0200 model: also optimize combined validator This ensure we don't have any remaining `generic_type_validator` call that have not been optimized away. commit cc61b9d3860de28afc2eb23dc35da372fdaf1964 Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org> Date: Tue Sep 20 17:09:04 2022 +0200 model: drop the `type_validator()` indirection This indirection seems useless and is probably the remains of some long forgotten rituals. commit bf6fe4be7697ad032cca6901cf51e5c1cc481781 Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org> Date: Tue Sep 20 16:51:32 2022 +0200 model: implement specialized attribute-validator functions This should reduces function calls and speeds things up. It might be useful to introduce even more specialized validator in the future. It would also be useful to skip the intermediate try/except. Some of this will be done in later changesets. commit 1dfea32477023bec219334eb4cf4921203e76f78 Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org> Date: Tue Sep 20 15:48:55 2022 +0200 model: prepare the filtering of type_validator into something faster This is currently doing nothing, but prepare for actually changing the generic validator into faster specialized variants.
Details
- Reviewers
olasd vlorentz - Group Reviewers
Reviewers - Maniphest Tasks
- T4595: Improve common tooling for loading
- Commits
- rDMOD3608271a0aaa: model: optimization pass on custom validator
rDMOD2d65a24a5f41: model: inline the call to `_check_swhid`
rDMODb7267a890888: model: remove the try/except
rDMOD3796e5ba30ff: model: delete unused validator code
rDMODcf529cd1622e: model: also optimize combined validator
rDMODedb57fb15359: model: implement specialized attribute-validator functions
rDMOD6ababdebb749: model: drop the `type_validator()` indirection
rDMOD1dfea3247702: model: prepare the filtering of type_validator into something faster
ran tox on each commit, timing and profiling on the full stack.
Diff Detail
- Repository
- rDMOD Data model
- Branch
- faster_validation (branched from master)
- Lint
No Linters Available - Unit
No Unit Test Coverage - Build Status
Buildable 31633 Build 49477: Phabricator diff pipeline on jenkins Jenkins console · Jenkins Build 49476: arc lint + arc unit
Event Timeline
Build has FAILED
Patch application report for D8512 (id=30654)
Could not rebase; Attempt merge onto 9ce6feb9d6...
Updating 9ce6feb..a780d6f Fast-forward CONTRIBUTORS | 1 + swh/model/from_disk.py | 10 +- swh/model/model.py | 401 ++++++++++++++++++++++++++++++------------ swh/model/tests/test_model.py | 38 +++- 4 files changed, 326 insertions(+), 124 deletions(-)
Changes applied before test
commit a780d6f17777b87417c77921adae75531941771d Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org> Date: Tue Sep 20 17:09:04 2022 +0200 model: drop the `type_validator()` indirection This indirection seems useless and is probably the remains of some long forgotten rituals. commit c632c5539c45acb6598c188dbb06ade13583ec54 Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org> Date: Tue Sep 20 16:51:32 2022 +0200 model: implement specialized attribute-validator functions This should reduces function calls and speeds things up. It might be useful to introduce even more specialized validator in the future. It would also be useful to skip the intermediate try/except. commit 8801fb08e46d3089b15a0ecfd5e5fc394cb1b5c6 Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org> Date: Tue Sep 20 15:48:55 2022 +0200 model: prepare the filtering of type_validator into something faster This is currently doing nothing, but prepare for actually changing the generic validator into faster specialized variants. commit 36185705f1805465d565b00b0df917b0e4f1c8e4 Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org> Date: Tue Sep 20 14:26:17 2022 +0200 from_disk: only build a model object once Before this change, a Directory object was built to compute the `id` of we fed to the Directory object we built for `to_model`.
Link to build: https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/506/
See console output for more information: https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/506/console
Build is green
Patch application report for D8512 (id=30656)
Could not rebase; Attempt merge onto 9ce6feb9d6...
Updating 9ce6feb..f2bbf1f Fast-forward CONTRIBUTORS | 1 + swh/model/from_disk.py | 10 +- swh/model/model.py | 414 ++++++++++++++++++++++++++++++------------ swh/model/tests/test_model.py | 38 +++- 4 files changed, 339 insertions(+), 124 deletions(-)
Changes applied before test
commit f2bbf1fc7adeb7b91296df4cdfc2f8d0c15b79b2 Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org> Date: Tue Sep 20 18:53:01 2022 +0200 model: also optimize combined validator This should provide use with enver more speedup. commit a780d6f17777b87417c77921adae75531941771d Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org> Date: Tue Sep 20 17:09:04 2022 +0200 model: drop the `type_validator()` indirection This indirection seems useless and is probably the remains of some long forgotten rituals. commit c632c5539c45acb6598c188dbb06ade13583ec54 Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org> Date: Tue Sep 20 16:51:32 2022 +0200 model: implement specialized attribute-validator functions This should reduces function calls and speeds things up. It might be useful to introduce even more specialized validator in the future. It would also be useful to skip the intermediate try/except. commit 8801fb08e46d3089b15a0ecfd5e5fc394cb1b5c6 Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org> Date: Tue Sep 20 15:48:55 2022 +0200 model: prepare the filtering of type_validator into something faster This is currently doing nothing, but prepare for actually changing the generic validator into faster specialized variants. commit 36185705f1805465d565b00b0df917b0e4f1c8e4 Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org> Date: Tue Sep 20 14:26:17 2022 +0200 from_disk: only build a model object once Before this change, a Directory object was built to compute the `id` of we fed to the Directory object we built for `to_model`.
See https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/507/ for more details.
The individual commit should be fairly clean. I'll provides some timing or profile soon™
Build is green
Patch application report for D8512 (id=30686)
Could not rebase; Attempt merge onto 9ce6feb9d6...
Updating 9ce6feb..1ce836a Fast-forward CONTRIBUTORS | 1 + swh/model/from_disk.py | 10 +- swh/model/model.py | 502 +++++++++++++++++++++++++++++------------- swh/model/tests/test_model.py | 12 +- 4 files changed, 363 insertions(+), 162 deletions(-)
Changes applied before test
commit 1ce836a90b3dd15c72f473e07ef6d49b4c635963 Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org> Date: Thu Sep 22 11:50:39 2022 +0200 model: delete unused code dead code This should avoid dead code drifting. commit 3f243aa5d08940f05922cca59131f47b3b9f01a7 Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org> Date: Thu Sep 22 11:40:40 2022 +0200 model: remove the try/except This should speeds things up a bit. commit f2bbf1fc7adeb7b91296df4cdfc2f8d0c15b79b2 Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org> Date: Tue Sep 20 18:53:01 2022 +0200 model: also optimize combined validator This should provide use with enver more speedup. commit a780d6f17777b87417c77921adae75531941771d Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org> Date: Tue Sep 20 17:09:04 2022 +0200 model: drop the `type_validator()` indirection This indirection seems useless and is probably the remains of some long forgotten rituals. commit c632c5539c45acb6598c188dbb06ade13583ec54 Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org> Date: Tue Sep 20 16:51:32 2022 +0200 model: implement specialized attribute-validator functions This should reduces function calls and speeds things up. It might be useful to introduce even more specialized validator in the future. It would also be useful to skip the intermediate try/except. commit 8801fb08e46d3089b15a0ecfd5e5fc394cb1b5c6 Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org> Date: Tue Sep 20 15:48:55 2022 +0200 model: prepare the filtering of type_validator into something faster This is currently doing nothing, but prepare for actually changing the generic validator into faster specialized variants. commit 36185705f1805465d565b00b0df917b0e4f1c8e4 Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org> Date: Tue Sep 20 14:26:17 2022 +0200 from_disk: only build a model object once Before this change, a Directory object was built to compute the `id` of we fed to the Directory object we built for `to_model`.
See https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/508/ for more details.
Build is green
Patch application report for D8512 (id=30688)
Could not rebase; Attempt merge onto 9ce6feb9d6...
Updating 9ce6feb..1698904 Fast-forward CONTRIBUTORS | 1 + swh/model/from_disk.py | 10 +- swh/model/model.py | 626 +++++++++++++++++++++++++++++------------- swh/model/tests/test_model.py | 12 +- 4 files changed, 444 insertions(+), 205 deletions(-)
Changes applied before test
commit 1698904aac7b6104f9c9ca2bf8f4b7bc3cf3e4a1 Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org> Date: Thu Sep 22 14:48:03 2022 +0200 model: optimization pass on custom validator We inline the type-checking and avoid creating intermediate tuple. commit 1ce836a90b3dd15c72f473e07ef6d49b4c635963 Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org> Date: Thu Sep 22 11:50:39 2022 +0200 model: delete unused code dead code This should avoid dead code drifting. commit 3f243aa5d08940f05922cca59131f47b3b9f01a7 Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org> Date: Thu Sep 22 11:40:40 2022 +0200 model: remove the try/except This should speeds things up a bit. commit f2bbf1fc7adeb7b91296df4cdfc2f8d0c15b79b2 Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org> Date: Tue Sep 20 18:53:01 2022 +0200 model: also optimize combined validator This should provide use with enver more speedup. commit a780d6f17777b87417c77921adae75531941771d Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org> Date: Tue Sep 20 17:09:04 2022 +0200 model: drop the `type_validator()` indirection This indirection seems useless and is probably the remains of some long forgotten rituals. commit c632c5539c45acb6598c188dbb06ade13583ec54 Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org> Date: Tue Sep 20 16:51:32 2022 +0200 model: implement specialized attribute-validator functions This should reduces function calls and speeds things up. It might be useful to introduce even more specialized validator in the future. It would also be useful to skip the intermediate try/except. commit 8801fb08e46d3089b15a0ecfd5e5fc394cb1b5c6 Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org> Date: Tue Sep 20 15:48:55 2022 +0200 model: prepare the filtering of type_validator into something faster This is currently doing nothing, but prepare for actually changing the generic validator into faster specialized variants. commit 36185705f1805465d565b00b0df917b0e4f1c8e4 Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org> Date: Tue Sep 20 14:26:17 2022 +0200 from_disk: only build a model object once Before this change, a Directory object was built to compute the `id` of we fed to the Directory object we built for `to_model`.
See https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/509/ for more details.
Build is green
Patch application report for D8512 (id=30690)
Could not rebase; Attempt merge onto 9ce6feb9d6...
Updating 9ce6feb..64ae080 Fast-forward CONTRIBUTORS | 1 + swh/model/from_disk.py | 10 +- swh/model/model.py | 652 ++++++++++++++++++++++++++++-------------- swh/model/tests/test_model.py | 12 +- 4 files changed, 460 insertions(+), 215 deletions(-)
Changes applied before test
commit 64ae080f93ba770697e46a464d584cb5ca25e363 Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org> Date: Thu Sep 22 14:55:26 2022 +0200 model: inline _check_swhid I am a compiler. commit 1698904aac7b6104f9c9ca2bf8f4b7bc3cf3e4a1 Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org> Date: Thu Sep 22 14:48:03 2022 +0200 model: optimization pass on custom validator We inline the type-checking and avoid creating intermediate tuple. commit 1ce836a90b3dd15c72f473e07ef6d49b4c635963 Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org> Date: Thu Sep 22 11:50:39 2022 +0200 model: delete unused code dead code This should avoid dead code drifting. commit 3f243aa5d08940f05922cca59131f47b3b9f01a7 Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org> Date: Thu Sep 22 11:40:40 2022 +0200 model: remove the try/except This should speeds things up a bit. commit f2bbf1fc7adeb7b91296df4cdfc2f8d0c15b79b2 Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org> Date: Tue Sep 20 18:53:01 2022 +0200 model: also optimize combined validator This should provide use with enver more speedup. commit a780d6f17777b87417c77921adae75531941771d Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org> Date: Tue Sep 20 17:09:04 2022 +0200 model: drop the `type_validator()` indirection This indirection seems useless and is probably the remains of some long forgotten rituals. commit c632c5539c45acb6598c188dbb06ade13583ec54 Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org> Date: Tue Sep 20 16:51:32 2022 +0200 model: implement specialized attribute-validator functions This should reduces function calls and speeds things up. It might be useful to introduce even more specialized validator in the future. It would also be useful to skip the intermediate try/except. commit 8801fb08e46d3089b15a0ecfd5e5fc394cb1b5c6 Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org> Date: Tue Sep 20 15:48:55 2022 +0200 model: prepare the filtering of type_validator into something faster This is currently doing nothing, but prepare for actually changing the generic validator into faster specialized variants. commit 36185705f1805465d565b00b0df917b0e4f1c8e4 Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org> Date: Tue Sep 20 14:26:17 2022 +0200 from_disk: only build a model object once Before this change, a Directory object was built to compute the `id` of we fed to the Directory object we built for `to_model`.
See https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/510/ for more details.
- model: prepare the filtering of type_validator into something faster
- model: implement specialized attribute-validator functions
- model: drop the type_validator() indirection
- model: also optimize combined validator
- model: remove the try/except
- model: delete unused code dead code
- model: optimization pass on custom validator
- model: inline _check_swhid
Build is green
Patch application report for D8512 (id=30715)
Could not rebase; Attempt merge onto 9ce6feb9d6...
Updating 9ce6feb..10b150c Fast-forward CONTRIBUTORS | 1 + swh/model/from_disk.py | 33 ++- swh/model/git_objects.py | 14 +- swh/model/model.py | 652 ++++++++++++++++++++++++++++-------------- swh/model/tests/test_model.py | 12 +- 5 files changed, 492 insertions(+), 220 deletions(-)
Changes applied before test
commit 10b150cbd6e35d186ad5a2e5fd5ed38d1d10b474 Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org> Date: Thu Sep 22 14:55:26 2022 +0200 model: inline _check_swhid I am a compiler. commit e032ee271efc84e7624945e9cc2339c55a1d939b Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org> Date: Thu Sep 22 14:48:03 2022 +0200 model: optimization pass on custom validator We inline the type-checking and avoid creating intermediate tuple. commit fc9622f51ba18b984220ddc167b5e9e760b79122 Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org> Date: Thu Sep 22 11:50:39 2022 +0200 model: delete unused code dead code This should avoid dead code drifting. commit cc8a6b770d682f26a488a755730c803c0a538e73 Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org> Date: Thu Sep 22 11:40:40 2022 +0200 model: remove the try/except This should speeds things up a bit. commit a2900955815ad9bb81cdb1141ba2650bbfbcd694 Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org> Date: Tue Sep 20 18:53:01 2022 +0200 model: also optimize combined validator This should provide use with enver more speedup. commit 3b47a5ba435859252b5f96f1ee15e971518995b8 Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org> Date: Tue Sep 20 17:09:04 2022 +0200 model: drop the `type_validator()` indirection This indirection seems useless and is probably the remains of some long forgotten rituals. commit 80e336c19093360aff92c082f2b1049005b483f3 Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org> Date: Tue Sep 20 16:51:32 2022 +0200 model: implement specialized attribute-validator functions This should reduces function calls and speeds things up. It might be useful to introduce even more specialized validator in the future. It would also be useful to skip the intermediate try/except. commit 923b61ed334613b268229b4789810dd332100ccf Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org> Date: Tue Sep 20 15:48:55 2022 +0200 model: prepare the filtering of type_validator into something faster This is currently doing nothing, but prepare for actually changing the generic validator into faster specialized variants. commit ad7774bec2b48af2ee74d59e34dc6de02751896a Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org> Date: Thu Sep 22 18:24:46 2022 +0200 from_disk: skip intermediate dictionnary creation when building model This should speed things up. commit 41500db7ea8856f75812f4fe8d8a66c6fdc43a82 Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org> Date: Thu Sep 22 18:12:43 2022 +0200 model: avoid another extra creation of Model object Do not create model object while sorting entry before creating model object. commit 36185705f1805465d565b00b0df917b0e4f1c8e4 Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org> Date: Tue Sep 20 14:26:17 2022 +0200 from_disk: only build a model object once Before this change, a Directory object was built to compute the `id` of we fed to the Directory object we built for `to_model`.
See https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/517/ for more details.
Build is green
Patch application report for D8512 (id=30723)
Could not rebase; Attempt merge onto 9ce6feb9d6...
Updating 9ce6feb..10b150c Fast-forward CONTRIBUTORS | 1 + swh/model/from_disk.py | 33 ++- swh/model/git_objects.py | 14 +- swh/model/model.py | 652 ++++++++++++++++++++++++++++-------------- swh/model/tests/test_model.py | 12 +- 5 files changed, 492 insertions(+), 220 deletions(-)
Changes applied before test
commit 10b150cbd6e35d186ad5a2e5fd5ed38d1d10b474 Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org> Date: Thu Sep 22 14:55:26 2022 +0200 model: inline _check_swhid I am a compiler. commit e032ee271efc84e7624945e9cc2339c55a1d939b Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org> Date: Thu Sep 22 14:48:03 2022 +0200 model: optimization pass on custom validator We inline the type-checking and avoid creating intermediate tuple. commit fc9622f51ba18b984220ddc167b5e9e760b79122 Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org> Date: Thu Sep 22 11:50:39 2022 +0200 model: delete unused code dead code This should avoid dead code drifting. commit cc8a6b770d682f26a488a755730c803c0a538e73 Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org> Date: Thu Sep 22 11:40:40 2022 +0200 model: remove the try/except This should speeds things up a bit. commit a2900955815ad9bb81cdb1141ba2650bbfbcd694 Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org> Date: Tue Sep 20 18:53:01 2022 +0200 model: also optimize combined validator This should provide use with enver more speedup. commit 3b47a5ba435859252b5f96f1ee15e971518995b8 Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org> Date: Tue Sep 20 17:09:04 2022 +0200 model: drop the `type_validator()` indirection This indirection seems useless and is probably the remains of some long forgotten rituals. commit 80e336c19093360aff92c082f2b1049005b483f3 Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org> Date: Tue Sep 20 16:51:32 2022 +0200 model: implement specialized attribute-validator functions This should reduces function calls and speeds things up. It might be useful to introduce even more specialized validator in the future. It would also be useful to skip the intermediate try/except. commit 923b61ed334613b268229b4789810dd332100ccf Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org> Date: Tue Sep 20 15:48:55 2022 +0200 model: prepare the filtering of type_validator into something faster This is currently doing nothing, but prepare for actually changing the generic validator into faster specialized variants. commit ad7774bec2b48af2ee74d59e34dc6de02751896a Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org> Date: Thu Sep 22 18:24:46 2022 +0200 from_disk: skip intermediate dictionnary creation when building model This should speed things up. commit 41500db7ea8856f75812f4fe8d8a66c6fdc43a82 Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org> Date: Thu Sep 22 18:12:43 2022 +0200 model: avoid another extra creation of Model object Do not create model object while sorting entry before creating model object. commit 36185705f1805465d565b00b0df917b0e4f1c8e4 Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org> Date: Tue Sep 20 14:26:17 2022 +0200 from_disk: only build a model object once Before this change, a Directory object was built to compute the `id` of we fed to the Directory object we built for `to_model`.
See https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/521/ for more details.
I am in the process to adding more timing/profile data to the commit message (then to fight to get phabricator to display them)
Build is green
Patch application report for D8512 (id=30728)
Could not rebase; Attempt merge onto 9ce6feb9d6...
Updating 9ce6feb..313cc97 Fast-forward CONTRIBUTORS | 1 + swh/model/from_disk.py | 33 ++- swh/model/git_objects.py | 14 +- swh/model/model.py | 652 ++++++++++++++++++++++++++++-------------- swh/model/tests/test_model.py | 12 +- 5 files changed, 492 insertions(+), 220 deletions(-)
Changes applied before test
commit 313cc97581a06b3cfe8fc561d4e0b38ccccf152c Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org> Date: Thu Sep 22 14:55:26 2022 +0200 model: inline the call to `_check_swhid` This reduce the number of function call and should be faster. The mashup of blind optimisation in the previous changeset yield some interesting results in total. It would be insightful to measure them individually, but that would take more time than we currently have. When testing all the validator changes on our previous "benchmark" we see quite interesting improvement. swh loader run mercurial https://foss.heptapod.net/mercurial/mercurial-devel directory=/data/repos/mercurial-devel = Median time of 3 run = base: 17 minutes 48 seconds before: 11 minutes 50 seconds after: 11 minutes 11 seconds On a profile of the same run, the `to_model` call of the from_disk's `Directory` class took the following percentage: base: 43% before: 15% after: 11% commit 54a19a2aa522787ec368f3707336824db7f13b76 Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org> Date: Thu Sep 22 14:48:03 2022 +0200 model: optimization pass on custom validator (This commit is actually doing two things /o\) - we inline the type-checking in the custom validator to reduce the number of function call. - we optimize some of the custom validator by skipping the creation of intermediate tuples. commit 14ba3c3e19f8c0d8505b4428aa158a594cca6f9d Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org> Date: Thu Sep 22 11:50:39 2022 +0200 model: delete unused validator code Since all `generic_type_validator` are optimized away, the code will no longer be called. So we remove that code to avoid any drifting. A nice "exception" is provided in case this start getting called again in the future. commit b851355ccf71b4d66bfcf11025f40fe8de5275f5 Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org> Date: Thu Sep 22 11:40:40 2022 +0200 model: remove the try/except Since try/except context are known to be expensive in Python, it seems useful to remove them. commit 6add176754eefb6c7b37c7a0b21695ae1ae8f7b9 Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org> Date: Tue Sep 20 18:53:01 2022 +0200 model: also optimize combined validator This ensure we don't have any remaining `generic_type_validator` call that have not been optimized away. commit cc61b9d3860de28afc2eb23dc35da372fdaf1964 Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org> Date: Tue Sep 20 17:09:04 2022 +0200 model: drop the `type_validator()` indirection This indirection seems useless and is probably the remains of some long forgotten rituals. commit bf6fe4be7697ad032cca6901cf51e5c1cc481781 Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org> Date: Tue Sep 20 16:51:32 2022 +0200 model: implement specialized attribute-validator functions This should reduces function calls and speeds things up. It might be useful to introduce even more specialized validator in the future. It would also be useful to skip the intermediate try/except. Some of this will be done in later changesets. commit 1dfea32477023bec219334eb4cf4921203e76f78 Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org> Date: Tue Sep 20 15:48:55 2022 +0200 model: prepare the filtering of type_validator into something faster This is currently doing nothing, but prepare for actually changing the generic validator into faster specialized variants. commit a2e8f18c2a685c7d4aad53345c77a7177190854e Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org> Date: Thu Sep 22 18:24:46 2022 +0200 from_disk: skip intermediate dictionnary creation when building model Before this change we would do the following : 1) translate from_disk's object into `dict`, 2) sort these dict, 3) feed the list to `Directory.from_dict`, 4) create DirectoryEntry from these dict. Skipping the directory creating and directly creating the DirectoryEntries provide us with a small but stable and noticeable performance win. We tested this change on simple information of the Mercurial loader, with a noop-loader stockage: swh loader run mercurial https://foss.heptapod.net/mercurial/mercurial-devel directory=/data/repos/mercurial-devel = Median time of 3 run = before: 11 minute 56 seconds aftere: 11 minute 50 seconds On a profile of the same run, the `to_model` call of the from_disk's `Directory` class took the following percentage: before: 17% after: 15% commit ad3ecac9beae015b9e9144c1da180097958b2827 Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org> Date: Thu Sep 22 18:12:43 2022 +0200 model: avoid another extra creation of Model object Do not create model object while sorting entry before creating model object. This is another case of "let us create object X to prepare the creation of object X", slowing things down. In practice, we will likely skip this code-path after the next changeset, however this seems useful to get this performance footgun out the way. We tested this change on simple information of the Mercurial loader, with a noop-loader stockage: swh loader run mercurial https://foss.heptapod.net/mercurial/mercurial-devel directory=/data/repos/mercurial-devel = Median time of 3 run = before 12 minutes 59 seconds after: 11 minute 56 seconds On a profile of the same run, the `to_model` call of the from_disk's `Directory` class took the following percentage: before: 24% after: 17% commit 814a6c8416d56f5f8b3e590d419d5aea7a888ab2 Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org> Date: Tue Sep 20 14:26:17 2022 +0200 from_disk: only build a model object once Before this change, a Directory object was built to compute the `id` of we fed to the Directory object we built for `to_model`. We tested this change on simple information of the Mercurial loader, with a noop-loader stockage: swh loader run mercurial https://foss.heptapod.net/mercurial/mercurial-devel directory=/data/repos/mercurial-devel = Median time of 3 run = before: 17 minutes 48 seconds after: 12 minutes 59 seconds On a profile of the same run, the `to_model` call of the from_disk's `Directory` class took the following percentage: before: 43% after: 24%
See https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/525/ for more details.
swh/model/model.py | ||
---|---|---|
155–174 | could you add tests for this? it seems uncovered |
swh/model/model.py | ||
---|---|---|
155–174 | Good point, fixing it in the next update. |
Build is green
Patch application report for D8512 (id=30735)
Could not rebase; Attempt merge onto 9ce6feb9d6...
Updating 9ce6feb..2d65a24 Fast-forward CONTRIBUTORS | 1 + swh/model/from_disk.py | 33 ++- swh/model/git_objects.py | 14 +- swh/model/model.py | 652 ++++++++++++++++++++++++++++-------------- swh/model/tests/test_model.py | 22 +- 5 files changed, 502 insertions(+), 220 deletions(-)
Changes applied before test
commit 2d65a24a5f413f3c647688e5d2ad7aa3268c1919 Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org> Date: Thu Sep 22 14:55:26 2022 +0200 model: inline the call to `_check_swhid` This reduce the number of function call and should be faster. The mashup of blind optimisation in the previous changeset yield some interesting results in total. It would be insightful to measure them individually, but that would take more time than we currently have. When testing all the validator changes on our previous "benchmark" we see quite interesting improvement. swh loader run mercurial https://foss.heptapod.net/mercurial/mercurial-devel directory=/data/repos/mercurial-devel = Median time of 3 run = base: 17 minutes 48 seconds before: 11 minutes 50 seconds after: 11 minutes 11 seconds On a profile of the same run, the `to_model` call of the from_disk's `Directory` class took the following percentage: base: 43% before: 15% after: 11% commit 3608271a0aaaa3e419389ec1834de74a14bbf59e Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org> Date: Thu Sep 22 14:48:03 2022 +0200 model: optimization pass on custom validator (This commit is actually doing two things /o\) - we inline the type-checking in the custom validator to reduce the number of function call. - we optimize some of the custom validator by skipping the creation of intermediate tuples. commit 3796e5ba30ff809ce9c75b548e2a6dbe97936fd2 Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org> Date: Thu Sep 22 11:50:39 2022 +0200 model: delete unused validator code Since all `generic_type_validator` are optimized away, the code will no longer be called. So we remove that code to avoid any drifting. A nice "exception" is provided in case this start getting called again in the future. commit b7267a890888adc07b35282c944450e642133c80 Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org> Date: Thu Sep 22 11:40:40 2022 +0200 model: remove the try/except Since try/except context are known to be expensive in Python, it seems useful to remove them. commit cf529cd1622ee942950ca487d0dc3e71b132e39f Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org> Date: Tue Sep 20 18:53:01 2022 +0200 model: also optimize combined validator This ensure we don't have any remaining `generic_type_validator` call that have not been optimized away. commit 6ababdebb749e63b3c11efb59f4c022cab43cd46 Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org> Date: Tue Sep 20 17:09:04 2022 +0200 model: drop the `type_validator()` indirection This indirection seems useless and is probably the remains of some long forgotten rituals. commit edb57fb15359b20050ae972b59003d7a3c8db772 Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org> Date: Tue Sep 20 16:51:32 2022 +0200 model: implement specialized attribute-validator functions This should reduces function calls and speeds things up. It might be useful to introduce even more specialized validator in the future. It would also be useful to skip the intermediate try/except. Some of this will be done in later changesets. commit 1dfea32477023bec219334eb4cf4921203e76f78 Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org> Date: Tue Sep 20 15:48:55 2022 +0200 model: prepare the filtering of type_validator into something faster This is currently doing nothing, but prepare for actually changing the generic validator into faster specialized variants. commit a2e8f18c2a685c7d4aad53345c77a7177190854e Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org> Date: Thu Sep 22 18:24:46 2022 +0200 from_disk: skip intermediate dictionnary creation when building model Before this change we would do the following : 1) translate from_disk's object into `dict`, 2) sort these dict, 3) feed the list to `Directory.from_dict`, 4) create DirectoryEntry from these dict. Skipping the directory creating and directly creating the DirectoryEntries provide us with a small but stable and noticeable performance win. We tested this change on simple information of the Mercurial loader, with a noop-loader stockage: swh loader run mercurial https://foss.heptapod.net/mercurial/mercurial-devel directory=/data/repos/mercurial-devel = Median time of 3 run = before: 11 minute 56 seconds aftere: 11 minute 50 seconds On a profile of the same run, the `to_model` call of the from_disk's `Directory` class took the following percentage: before: 17% after: 15% commit ad3ecac9beae015b9e9144c1da180097958b2827 Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org> Date: Thu Sep 22 18:12:43 2022 +0200 model: avoid another extra creation of Model object Do not create model object while sorting entry before creating model object. This is another case of "let us create object X to prepare the creation of object X", slowing things down. In practice, we will likely skip this code-path after the next changeset, however this seems useful to get this performance footgun out the way. We tested this change on simple information of the Mercurial loader, with a noop-loader stockage: swh loader run mercurial https://foss.heptapod.net/mercurial/mercurial-devel directory=/data/repos/mercurial-devel = Median time of 3 run = before 12 minutes 59 seconds after: 11 minute 56 seconds On a profile of the same run, the `to_model` call of the from_disk's `Directory` class took the following percentage: before: 24% after: 17% commit 814a6c8416d56f5f8b3e590d419d5aea7a888ab2 Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org> Date: Tue Sep 20 14:26:17 2022 +0200 from_disk: only build a model object once Before this change, a Directory object was built to compute the `id` of we fed to the Directory object we built for `to_model`. We tested this change on simple information of the Mercurial loader, with a noop-loader stockage: swh loader run mercurial https://foss.heptapod.net/mercurial/mercurial-devel directory=/data/repos/mercurial-devel = Median time of 3 run = before: 17 minutes 48 seconds after: 12 minutes 59 seconds On a profile of the same run, the `to_model` call of the from_disk's `Directory` class took the following percentage: before: 43% after: 24%
See https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/529/ for more details.