Page MenuHomeSoftware Heritage

various optimisation to the model validation logic.
ClosedPublic

Authored by marmoute on Sep 20 2022, 5:36 PM.

Details

Summary
commit 313cc97581a06b3cfe8fc561d4e0b38ccccf152c (HEAD -> faster_validation, D8512)
Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org>
Date:   Thu Sep 22 14:55:26 2022 +0200

    model: inline the call to `_check_swhid`
    
    This reduce the number of function call and should be faster.
    
    The mashup of blind optimisation in the previous changeset yield some
    interesting results in total.
    
    It would be insightful to measure them individually, but that would
    take more time than we currently have.
    
    When testing all the validator changes on our previous "benchmark" we
    see quite interesting improvement.
    
        swh loader run mercurial https://foss.heptapod.net/mercurial/mercurial-devel directory=/data/repos/mercurial-devel
    
    = Median time of 3 run =
    base:   17 minutes 48 seconds
    before: 11 minutes 50 seconds
    after:  11 minutes 11 seconds
    
    On a profile of the same run, the `to_model` call of the from_disk's `Directory` class took the following percentage:
    base:   43%
    before: 15%
    after:  11%

commit 54a19a2aa522787ec368f3707336824db7f13b76
Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org>
Date:   Thu Sep 22 14:48:03 2022 +0200

    model: optimization pass on custom validator
    
    (This commit is actually doing two things /o\)
    
    - we inline the type-checking in the custom validator to reduce the
      number of function call.
    
    - we optimize some of the custom validator by skipping the creation of
      intermediate tuples.

commit 14ba3c3e19f8c0d8505b4428aa158a594cca6f9d
Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org>
Date:   Thu Sep 22 11:50:39 2022 +0200

    model: delete unused validator code
    
    Since all `generic_type_validator` are optimized away, the code will no
    longer be called. So we remove that code to avoid any drifting.
    
    A nice "exception" is provided in case this start getting called again
    in the future.

commit b851355ccf71b4d66bfcf11025f40fe8de5275f5
Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org>
Date:   Thu Sep 22 11:40:40 2022 +0200

    model: remove the try/except
    
    Since try/except context are known to be expensive in Python, it seems
    useful to remove them.

commit 6add176754eefb6c7b37c7a0b21695ae1ae8f7b9
Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org>
Date:   Tue Sep 20 18:53:01 2022 +0200

    model: also optimize combined validator
    
    This ensure we don't have any remaining `generic_type_validator` call
    that have not been optimized away.

commit cc61b9d3860de28afc2eb23dc35da372fdaf1964
Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org>
Date:   Tue Sep 20 17:09:04 2022 +0200

    model: drop the `type_validator()` indirection
    
    This indirection seems useless and is probably the remains of some long
    forgotten rituals.

commit bf6fe4be7697ad032cca6901cf51e5c1cc481781
Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org>
Date:   Tue Sep 20 16:51:32 2022 +0200

    model: implement specialized attribute-validator functions
    
    This should reduces function calls and speeds things up.
    
    It might be useful to introduce even more specialized validator in the
    future. It would also be useful to skip the intermediate try/except.
    
    Some of this will be done in later changesets.

commit 1dfea32477023bec219334eb4cf4921203e76f78
Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org>
Date:   Tue Sep 20 15:48:55 2022 +0200

    model: prepare the filtering of type_validator into something faster
    
    This is currently doing nothing, but prepare for actually changing the
    generic validator into faster specialized variants.
Test Plan

ran tox on each commit, timing and profiling on the full stack.

Diff Detail

Repository
rDMOD Data model
Branch
faster_validation (branched from master)
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 31635
Build 49481: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 49480: arc lint + arc unit

Event Timeline

Build has FAILED

Patch application report for D8512 (id=30654)

Could not rebase; Attempt merge onto 9ce6feb9d6...

Updating 9ce6feb..a780d6f
Fast-forward
 CONTRIBUTORS                  |   1 +
 swh/model/from_disk.py        |  10 +-
 swh/model/model.py            | 401 ++++++++++++++++++++++++++++++------------
 swh/model/tests/test_model.py |  38 +++-
 4 files changed, 326 insertions(+), 124 deletions(-)
Changes applied before test
commit a780d6f17777b87417c77921adae75531941771d
Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org>
Date:   Tue Sep 20 17:09:04 2022 +0200

    model: drop the `type_validator()` indirection
    
    This indirection seems useless and is probably the remains of some long
    forgotten rituals.

commit c632c5539c45acb6598c188dbb06ade13583ec54
Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org>
Date:   Tue Sep 20 16:51:32 2022 +0200

    model: implement specialized attribute-validator functions
    
    This should reduces function calls and speeds things up.
    
    It might be useful to introduce even more specialized validator in the
    future. It would also be useful to skip the intermediate try/except.

commit 8801fb08e46d3089b15a0ecfd5e5fc394cb1b5c6
Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org>
Date:   Tue Sep 20 15:48:55 2022 +0200

    model: prepare the filtering of type_validator into something faster
    
    This is currently doing nothing, but prepare for actually changing the
    generic validator into faster specialized variants.

commit 36185705f1805465d565b00b0df917b0e4f1c8e4
Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org>
Date:   Tue Sep 20 14:26:17 2022 +0200

    from_disk: only build a model object once
    
    Before this change, a Directory object was built to compute the `id` of
    we fed to the Directory object we built for `to_model`.

Link to build: https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/506/
See console output for more information: https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/506/console

Harbormaster returned this revision to the author for changes because remote builds failed.Sep 20 2022, 5:39 PM
Harbormaster failed remote builds in B31633: Diff 30654!
  • model: also optimize combined validator

Build is green

Patch application report for D8512 (id=30656)

Could not rebase; Attempt merge onto 9ce6feb9d6...

Updating 9ce6feb..f2bbf1f
Fast-forward
 CONTRIBUTORS                  |   1 +
 swh/model/from_disk.py        |  10 +-
 swh/model/model.py            | 414 ++++++++++++++++++++++++++++++------------
 swh/model/tests/test_model.py |  38 +++-
 4 files changed, 339 insertions(+), 124 deletions(-)
Changes applied before test
commit f2bbf1fc7adeb7b91296df4cdfc2f8d0c15b79b2
Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org>
Date:   Tue Sep 20 18:53:01 2022 +0200

    model: also optimize combined validator
    
    This should provide use with enver more speedup.

commit a780d6f17777b87417c77921adae75531941771d
Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org>
Date:   Tue Sep 20 17:09:04 2022 +0200

    model: drop the `type_validator()` indirection
    
    This indirection seems useless and is probably the remains of some long
    forgotten rituals.

commit c632c5539c45acb6598c188dbb06ade13583ec54
Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org>
Date:   Tue Sep 20 16:51:32 2022 +0200

    model: implement specialized attribute-validator functions
    
    This should reduces function calls and speeds things up.
    
    It might be useful to introduce even more specialized validator in the
    future. It would also be useful to skip the intermediate try/except.

commit 8801fb08e46d3089b15a0ecfd5e5fc394cb1b5c6
Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org>
Date:   Tue Sep 20 15:48:55 2022 +0200

    model: prepare the filtering of type_validator into something faster
    
    This is currently doing nothing, but prepare for actually changing the
    generic validator into faster specialized variants.

commit 36185705f1805465d565b00b0df917b0e4f1c8e4
Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org>
Date:   Tue Sep 20 14:26:17 2022 +0200

    from_disk: only build a model object once
    
    Before this change, a Directory object was built to compute the `id` of
    we fed to the Directory object we built for `to_model`.

See https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/507/ for more details.

  • model: remove the try/except
  • model: delete unused code dead code

The individual commit should be fairly clean. I'll provides some timing or profile soon™

Build is green

Patch application report for D8512 (id=30686)

Could not rebase; Attempt merge onto 9ce6feb9d6...

Updating 9ce6feb..1ce836a
Fast-forward
 CONTRIBUTORS                  |   1 +
 swh/model/from_disk.py        |  10 +-
 swh/model/model.py            | 502 +++++++++++++++++++++++++++++-------------
 swh/model/tests/test_model.py |  12 +-
 4 files changed, 363 insertions(+), 162 deletions(-)
Changes applied before test
commit 1ce836a90b3dd15c72f473e07ef6d49b4c635963
Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org>
Date:   Thu Sep 22 11:50:39 2022 +0200

    model: delete unused code dead code
    
    This should avoid dead code drifting.

commit 3f243aa5d08940f05922cca59131f47b3b9f01a7
Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org>
Date:   Thu Sep 22 11:40:40 2022 +0200

    model: remove the try/except
    
    This should speeds things up a bit.

commit f2bbf1fc7adeb7b91296df4cdfc2f8d0c15b79b2
Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org>
Date:   Tue Sep 20 18:53:01 2022 +0200

    model: also optimize combined validator
    
    This should provide use with enver more speedup.

commit a780d6f17777b87417c77921adae75531941771d
Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org>
Date:   Tue Sep 20 17:09:04 2022 +0200

    model: drop the `type_validator()` indirection
    
    This indirection seems useless and is probably the remains of some long
    forgotten rituals.

commit c632c5539c45acb6598c188dbb06ade13583ec54
Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org>
Date:   Tue Sep 20 16:51:32 2022 +0200

    model: implement specialized attribute-validator functions
    
    This should reduces function calls and speeds things up.
    
    It might be useful to introduce even more specialized validator in the
    future. It would also be useful to skip the intermediate try/except.

commit 8801fb08e46d3089b15a0ecfd5e5fc394cb1b5c6
Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org>
Date:   Tue Sep 20 15:48:55 2022 +0200

    model: prepare the filtering of type_validator into something faster
    
    This is currently doing nothing, but prepare for actually changing the
    generic validator into faster specialized variants.

commit 36185705f1805465d565b00b0df917b0e4f1c8e4
Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org>
Date:   Tue Sep 20 14:26:17 2022 +0200

    from_disk: only build a model object once
    
    Before this change, a Directory object was built to compute the `id` of
    we fed to the Directory object we built for `to_model`.

See https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/508/ for more details.

  • model: optimization pass on custom validator

Build is green

Patch application report for D8512 (id=30688)

Could not rebase; Attempt merge onto 9ce6feb9d6...

Updating 9ce6feb..1698904
Fast-forward
 CONTRIBUTORS                  |   1 +
 swh/model/from_disk.py        |  10 +-
 swh/model/model.py            | 626 +++++++++++++++++++++++++++++-------------
 swh/model/tests/test_model.py |  12 +-
 4 files changed, 444 insertions(+), 205 deletions(-)
Changes applied before test
commit 1698904aac7b6104f9c9ca2bf8f4b7bc3cf3e4a1
Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org>
Date:   Thu Sep 22 14:48:03 2022 +0200

    model: optimization pass on custom validator
    
    We inline the type-checking and avoid creating intermediate tuple.

commit 1ce836a90b3dd15c72f473e07ef6d49b4c635963
Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org>
Date:   Thu Sep 22 11:50:39 2022 +0200

    model: delete unused code dead code
    
    This should avoid dead code drifting.

commit 3f243aa5d08940f05922cca59131f47b3b9f01a7
Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org>
Date:   Thu Sep 22 11:40:40 2022 +0200

    model: remove the try/except
    
    This should speeds things up a bit.

commit f2bbf1fc7adeb7b91296df4cdfc2f8d0c15b79b2
Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org>
Date:   Tue Sep 20 18:53:01 2022 +0200

    model: also optimize combined validator
    
    This should provide use with enver more speedup.

commit a780d6f17777b87417c77921adae75531941771d
Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org>
Date:   Tue Sep 20 17:09:04 2022 +0200

    model: drop the `type_validator()` indirection
    
    This indirection seems useless and is probably the remains of some long
    forgotten rituals.

commit c632c5539c45acb6598c188dbb06ade13583ec54
Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org>
Date:   Tue Sep 20 16:51:32 2022 +0200

    model: implement specialized attribute-validator functions
    
    This should reduces function calls and speeds things up.
    
    It might be useful to introduce even more specialized validator in the
    future. It would also be useful to skip the intermediate try/except.

commit 8801fb08e46d3089b15a0ecfd5e5fc394cb1b5c6
Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org>
Date:   Tue Sep 20 15:48:55 2022 +0200

    model: prepare the filtering of type_validator into something faster
    
    This is currently doing nothing, but prepare for actually changing the
    generic validator into faster specialized variants.

commit 36185705f1805465d565b00b0df917b0e4f1c8e4
Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org>
Date:   Tue Sep 20 14:26:17 2022 +0200

    from_disk: only build a model object once
    
    Before this change, a Directory object was built to compute the `id` of
    we fed to the Directory object we built for `to_model`.

See https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/509/ for more details.

  • model: inline _check_swhid

Build is green

Patch application report for D8512 (id=30690)

Could not rebase; Attempt merge onto 9ce6feb9d6...

Updating 9ce6feb..64ae080
Fast-forward
 CONTRIBUTORS                  |   1 +
 swh/model/from_disk.py        |  10 +-
 swh/model/model.py            | 652 ++++++++++++++++++++++++++++--------------
 swh/model/tests/test_model.py |  12 +-
 4 files changed, 460 insertions(+), 215 deletions(-)
Changes applied before test
commit 64ae080f93ba770697e46a464d584cb5ca25e363
Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org>
Date:   Thu Sep 22 14:55:26 2022 +0200

    model: inline _check_swhid
    
    I am a compiler.

commit 1698904aac7b6104f9c9ca2bf8f4b7bc3cf3e4a1
Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org>
Date:   Thu Sep 22 14:48:03 2022 +0200

    model: optimization pass on custom validator
    
    We inline the type-checking and avoid creating intermediate tuple.

commit 1ce836a90b3dd15c72f473e07ef6d49b4c635963
Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org>
Date:   Thu Sep 22 11:50:39 2022 +0200

    model: delete unused code dead code
    
    This should avoid dead code drifting.

commit 3f243aa5d08940f05922cca59131f47b3b9f01a7
Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org>
Date:   Thu Sep 22 11:40:40 2022 +0200

    model: remove the try/except
    
    This should speeds things up a bit.

commit f2bbf1fc7adeb7b91296df4cdfc2f8d0c15b79b2
Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org>
Date:   Tue Sep 20 18:53:01 2022 +0200

    model: also optimize combined validator
    
    This should provide use with enver more speedup.

commit a780d6f17777b87417c77921adae75531941771d
Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org>
Date:   Tue Sep 20 17:09:04 2022 +0200

    model: drop the `type_validator()` indirection
    
    This indirection seems useless and is probably the remains of some long
    forgotten rituals.

commit c632c5539c45acb6598c188dbb06ade13583ec54
Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org>
Date:   Tue Sep 20 16:51:32 2022 +0200

    model: implement specialized attribute-validator functions
    
    This should reduces function calls and speeds things up.
    
    It might be useful to introduce even more specialized validator in the
    future. It would also be useful to skip the intermediate try/except.

commit 8801fb08e46d3089b15a0ecfd5e5fc394cb1b5c6
Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org>
Date:   Tue Sep 20 15:48:55 2022 +0200

    model: prepare the filtering of type_validator into something faster
    
    This is currently doing nothing, but prepare for actually changing the
    generic validator into faster specialized variants.

commit 36185705f1805465d565b00b0df917b0e4f1c8e4
Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org>
Date:   Tue Sep 20 14:26:17 2022 +0200

    from_disk: only build a model object once
    
    Before this change, a Directory object was built to compute the `id` of
    we fed to the Directory object we built for `to_model`.

See https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/510/ for more details.

  • model: prepare the filtering of type_validator into something faster
  • model: implement specialized attribute-validator functions
  • model: drop the type_validator() indirection
  • model: also optimize combined validator
  • model: remove the try/except
  • model: delete unused code dead code
  • model: optimization pass on custom validator
  • model: inline _check_swhid

Build is green

Patch application report for D8512 (id=30715)

Could not rebase; Attempt merge onto 9ce6feb9d6...

Updating 9ce6feb..10b150c
Fast-forward
 CONTRIBUTORS                  |   1 +
 swh/model/from_disk.py        |  33 ++-
 swh/model/git_objects.py      |  14 +-
 swh/model/model.py            | 652 ++++++++++++++++++++++++++++--------------
 swh/model/tests/test_model.py |  12 +-
 5 files changed, 492 insertions(+), 220 deletions(-)
Changes applied before test
commit 10b150cbd6e35d186ad5a2e5fd5ed38d1d10b474
Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org>
Date:   Thu Sep 22 14:55:26 2022 +0200

    model: inline _check_swhid
    
    I am a compiler.

commit e032ee271efc84e7624945e9cc2339c55a1d939b
Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org>
Date:   Thu Sep 22 14:48:03 2022 +0200

    model: optimization pass on custom validator
    
    We inline the type-checking and avoid creating intermediate tuple.

commit fc9622f51ba18b984220ddc167b5e9e760b79122
Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org>
Date:   Thu Sep 22 11:50:39 2022 +0200

    model: delete unused code dead code
    
    This should avoid dead code drifting.

commit cc8a6b770d682f26a488a755730c803c0a538e73
Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org>
Date:   Thu Sep 22 11:40:40 2022 +0200

    model: remove the try/except
    
    This should speeds things up a bit.

commit a2900955815ad9bb81cdb1141ba2650bbfbcd694
Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org>
Date:   Tue Sep 20 18:53:01 2022 +0200

    model: also optimize combined validator
    
    This should provide use with enver more speedup.

commit 3b47a5ba435859252b5f96f1ee15e971518995b8
Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org>
Date:   Tue Sep 20 17:09:04 2022 +0200

    model: drop the `type_validator()` indirection
    
    This indirection seems useless and is probably the remains of some long
    forgotten rituals.

commit 80e336c19093360aff92c082f2b1049005b483f3
Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org>
Date:   Tue Sep 20 16:51:32 2022 +0200

    model: implement specialized attribute-validator functions
    
    This should reduces function calls and speeds things up.
    
    It might be useful to introduce even more specialized validator in the
    future. It would also be useful to skip the intermediate try/except.

commit 923b61ed334613b268229b4789810dd332100ccf
Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org>
Date:   Tue Sep 20 15:48:55 2022 +0200

    model: prepare the filtering of type_validator into something faster
    
    This is currently doing nothing, but prepare for actually changing the
    generic validator into faster specialized variants.

commit ad7774bec2b48af2ee74d59e34dc6de02751896a
Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org>
Date:   Thu Sep 22 18:24:46 2022 +0200

    from_disk: skip intermediate dictionnary creation when building model
    
    This should speed things up.

commit 41500db7ea8856f75812f4fe8d8a66c6fdc43a82
Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org>
Date:   Thu Sep 22 18:12:43 2022 +0200

    model: avoid another extra creation of Model object
    
    Do not create model object while sorting entry before creating model
    object.

commit 36185705f1805465d565b00b0df917b0e4f1c8e4
Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org>
Date:   Tue Sep 20 14:26:17 2022 +0200

    from_disk: only build a model object once
    
    Before this change, a Directory object was built to compute the `id` of
    we fed to the Directory object we built for `to_model`.

See https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/517/ for more details.

Build is green

Patch application report for D8512 (id=30723)

Could not rebase; Attempt merge onto 9ce6feb9d6...

Updating 9ce6feb..10b150c
Fast-forward
 CONTRIBUTORS                  |   1 +
 swh/model/from_disk.py        |  33 ++-
 swh/model/git_objects.py      |  14 +-
 swh/model/model.py            | 652 ++++++++++++++++++++++++++++--------------
 swh/model/tests/test_model.py |  12 +-
 5 files changed, 492 insertions(+), 220 deletions(-)
Changes applied before test
commit 10b150cbd6e35d186ad5a2e5fd5ed38d1d10b474
Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org>
Date:   Thu Sep 22 14:55:26 2022 +0200

    model: inline _check_swhid
    
    I am a compiler.

commit e032ee271efc84e7624945e9cc2339c55a1d939b
Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org>
Date:   Thu Sep 22 14:48:03 2022 +0200

    model: optimization pass on custom validator
    
    We inline the type-checking and avoid creating intermediate tuple.

commit fc9622f51ba18b984220ddc167b5e9e760b79122
Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org>
Date:   Thu Sep 22 11:50:39 2022 +0200

    model: delete unused code dead code
    
    This should avoid dead code drifting.

commit cc8a6b770d682f26a488a755730c803c0a538e73
Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org>
Date:   Thu Sep 22 11:40:40 2022 +0200

    model: remove the try/except
    
    This should speeds things up a bit.

commit a2900955815ad9bb81cdb1141ba2650bbfbcd694
Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org>
Date:   Tue Sep 20 18:53:01 2022 +0200

    model: also optimize combined validator
    
    This should provide use with enver more speedup.

commit 3b47a5ba435859252b5f96f1ee15e971518995b8
Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org>
Date:   Tue Sep 20 17:09:04 2022 +0200

    model: drop the `type_validator()` indirection
    
    This indirection seems useless and is probably the remains of some long
    forgotten rituals.

commit 80e336c19093360aff92c082f2b1049005b483f3
Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org>
Date:   Tue Sep 20 16:51:32 2022 +0200

    model: implement specialized attribute-validator functions
    
    This should reduces function calls and speeds things up.
    
    It might be useful to introduce even more specialized validator in the
    future. It would also be useful to skip the intermediate try/except.

commit 923b61ed334613b268229b4789810dd332100ccf
Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org>
Date:   Tue Sep 20 15:48:55 2022 +0200

    model: prepare the filtering of type_validator into something faster
    
    This is currently doing nothing, but prepare for actually changing the
    generic validator into faster specialized variants.

commit ad7774bec2b48af2ee74d59e34dc6de02751896a
Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org>
Date:   Thu Sep 22 18:24:46 2022 +0200

    from_disk: skip intermediate dictionnary creation when building model
    
    This should speed things up.

commit 41500db7ea8856f75812f4fe8d8a66c6fdc43a82
Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org>
Date:   Thu Sep 22 18:12:43 2022 +0200

    model: avoid another extra creation of Model object
    
    Do not create model object while sorting entry before creating model
    object.

commit 36185705f1805465d565b00b0df917b0e4f1c8e4
Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org>
Date:   Tue Sep 20 14:26:17 2022 +0200

    from_disk: only build a model object once
    
    Before this change, a Directory object was built to compute the `id` of
    we fed to the Directory object we built for `to_model`.

See https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/521/ for more details.

I am in the process to adding more timing/profile data to the commit message (then to fight to get phabricator to display them)

Build is green

Patch application report for D8512 (id=30728)

Could not rebase; Attempt merge onto 9ce6feb9d6...

Updating 9ce6feb..313cc97
Fast-forward
 CONTRIBUTORS                  |   1 +
 swh/model/from_disk.py        |  33 ++-
 swh/model/git_objects.py      |  14 +-
 swh/model/model.py            | 652 ++++++++++++++++++++++++++++--------------
 swh/model/tests/test_model.py |  12 +-
 5 files changed, 492 insertions(+), 220 deletions(-)
Changes applied before test
commit 313cc97581a06b3cfe8fc561d4e0b38ccccf152c
Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org>
Date:   Thu Sep 22 14:55:26 2022 +0200

    model: inline the call to `_check_swhid`
    
    This reduce the number of function call and should be faster.
    
    The mashup of blind optimisation in the previous changeset yield some
    interesting results in total.
    
    It would be insightful to measure them individually, but that would
    take more time than we currently have.
    
    When testing all the validator changes on our previous "benchmark" we
    see quite interesting improvement.
    
        swh loader run mercurial https://foss.heptapod.net/mercurial/mercurial-devel directory=/data/repos/mercurial-devel
    
    = Median time of 3 run =
    base:   17 minutes 48 seconds
    before: 11 minutes 50 seconds
    after:  11 minutes 11 seconds
    
    On a profile of the same run, the `to_model` call of the from_disk's `Directory` class took the following percentage:
    base:   43%
    before: 15%
    after:  11%

commit 54a19a2aa522787ec368f3707336824db7f13b76
Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org>
Date:   Thu Sep 22 14:48:03 2022 +0200

    model: optimization pass on custom validator
    
    (This commit is actually doing two things /o\)
    
    - we inline the type-checking in the custom validator to reduce the
      number of function call.
    
    - we optimize some of the custom validator by skipping the creation of
      intermediate tuples.

commit 14ba3c3e19f8c0d8505b4428aa158a594cca6f9d
Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org>
Date:   Thu Sep 22 11:50:39 2022 +0200

    model: delete unused validator code
    
    Since all `generic_type_validator` are optimized away, the code will no
    longer be called. So we remove that code to avoid any drifting.
    
    A nice "exception" is provided in case this start getting called again
    in the future.

commit b851355ccf71b4d66bfcf11025f40fe8de5275f5
Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org>
Date:   Thu Sep 22 11:40:40 2022 +0200

    model: remove the try/except
    
    Since try/except context are known to be expensive in Python, it seems
    useful to remove them.

commit 6add176754eefb6c7b37c7a0b21695ae1ae8f7b9
Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org>
Date:   Tue Sep 20 18:53:01 2022 +0200

    model: also optimize combined validator
    
    This ensure we don't have any remaining `generic_type_validator` call
    that have not been optimized away.

commit cc61b9d3860de28afc2eb23dc35da372fdaf1964
Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org>
Date:   Tue Sep 20 17:09:04 2022 +0200

    model: drop the `type_validator()` indirection
    
    This indirection seems useless and is probably the remains of some long
    forgotten rituals.

commit bf6fe4be7697ad032cca6901cf51e5c1cc481781
Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org>
Date:   Tue Sep 20 16:51:32 2022 +0200

    model: implement specialized attribute-validator functions
    
    This should reduces function calls and speeds things up.
    
    It might be useful to introduce even more specialized validator in the
    future. It would also be useful to skip the intermediate try/except.
    
    Some of this will be done in later changesets.

commit 1dfea32477023bec219334eb4cf4921203e76f78
Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org>
Date:   Tue Sep 20 15:48:55 2022 +0200

    model: prepare the filtering of type_validator into something faster
    
    This is currently doing nothing, but prepare for actually changing the
    generic validator into faster specialized variants.

commit a2e8f18c2a685c7d4aad53345c77a7177190854e
Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org>
Date:   Thu Sep 22 18:24:46 2022 +0200

    from_disk: skip intermediate dictionnary creation when building model
    
    Before this change we would do the following :
    
    1) translate from_disk's object into `dict`,
    2) sort these dict,
    3) feed the list to `Directory.from_dict`,
    4) create DirectoryEntry from these dict.
    
    Skipping the directory creating and directly creating the
    DirectoryEntries provide us with a small but stable and noticeable
    performance win.
    
    We tested this change on simple information of the Mercurial loader,
    with a noop-loader stockage:
    
        swh loader run mercurial https://foss.heptapod.net/mercurial/mercurial-devel directory=/data/repos/mercurial-devel
    
    = Median time of 3 run =
    before: 11 minute  56 seconds
    aftere: 11 minute  50 seconds
    
    On a profile of the same run, the `to_model` call of the from_disk's `Directory` class took the following percentage:
    before: 17%
    after:  15%

commit ad3ecac9beae015b9e9144c1da180097958b2827
Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org>
Date:   Thu Sep 22 18:12:43 2022 +0200

    model: avoid another extra creation of Model object
    
    Do not create model object while sorting entry before creating model
    object.
    
    This is another case of "let us create object X to prepare the creation
    of object X", slowing things down.
    
    In practice, we will likely skip this code-path after the next
    changeset, however this seems useful to get this performance footgun
    out the way.
    
    We tested this change on simple information of the Mercurial loader,
    with a noop-loader stockage:
    
        swh loader run mercurial https://foss.heptapod.net/mercurial/mercurial-devel directory=/data/repos/mercurial-devel
    
    = Median time of 3 run =
    before  12 minutes 59 seconds
    after:  11 minute  56 seconds
    
    On a profile of the same run, the `to_model` call of the from_disk's `Directory` class took the following percentage:
    before: 24%
    after:  17%

commit 814a6c8416d56f5f8b3e590d419d5aea7a888ab2
Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org>
Date:   Tue Sep 20 14:26:17 2022 +0200

    from_disk: only build a model object once
    
    Before this change, a Directory object was built to compute the `id` of
    we fed to the Directory object we built for `to_model`.
    
    We tested this change on simple information of the Mercurial loader,
    with a noop-loader stockage:
    
        swh loader run mercurial https://foss.heptapod.net/mercurial/mercurial-devel directory=/data/repos/mercurial-devel
    
    = Median time of 3 run =
    before: 17 minutes 48 seconds
    after:  12 minutes 59 seconds
    
    On a profile of the same run, the `to_model` call of the from_disk's `Directory` class took the following percentage:
    before: 43%
    after:  24%

See https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/525/ for more details.

marmoute retitled this revision from model: prepare the filtering of type_validator into something faster to various optimisation to the model validation logic..Sep 23 2022, 4:54 PM
marmoute edited the summary of this revision. (Show Details)
marmoute edited the test plan for this revision. (Show Details)
marmoute added a reviewer: Reviewers.
vlorentz added inline comments.
swh/model/model.py
157–176

could you add tests for this? it seems uncovered

This revision is now accepted and ready to land.Sep 23 2022, 5:36 PM
swh/model/model.py
157–176

Good point, fixing it in the next update.

Build is green

Patch application report for D8512 (id=30735)

Could not rebase; Attempt merge onto 9ce6feb9d6...

Updating 9ce6feb..2d65a24
Fast-forward
 CONTRIBUTORS                  |   1 +
 swh/model/from_disk.py        |  33 ++-
 swh/model/git_objects.py      |  14 +-
 swh/model/model.py            | 652 ++++++++++++++++++++++++++++--------------
 swh/model/tests/test_model.py |  22 +-
 5 files changed, 502 insertions(+), 220 deletions(-)
Changes applied before test
commit 2d65a24a5f413f3c647688e5d2ad7aa3268c1919
Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org>
Date:   Thu Sep 22 14:55:26 2022 +0200

    model: inline the call to `_check_swhid`
    
    This reduce the number of function call and should be faster.
    
    The mashup of blind optimisation in the previous changeset yield some
    interesting results in total.
    
    It would be insightful to measure them individually, but that would
    take more time than we currently have.
    
    When testing all the validator changes on our previous "benchmark" we
    see quite interesting improvement.
    
        swh loader run mercurial https://foss.heptapod.net/mercurial/mercurial-devel directory=/data/repos/mercurial-devel
    
    = Median time of 3 run =
    base:   17 minutes 48 seconds
    before: 11 minutes 50 seconds
    after:  11 minutes 11 seconds
    
    On a profile of the same run, the `to_model` call of the from_disk's `Directory` class took the following percentage:
    base:   43%
    before: 15%
    after:  11%

commit 3608271a0aaaa3e419389ec1834de74a14bbf59e
Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org>
Date:   Thu Sep 22 14:48:03 2022 +0200

    model: optimization pass on custom validator
    
    (This commit is actually doing two things /o\)
    
    - we inline the type-checking in the custom validator to reduce the
      number of function call.
    
    - we optimize some of the custom validator by skipping the creation of
      intermediate tuples.

commit 3796e5ba30ff809ce9c75b548e2a6dbe97936fd2
Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org>
Date:   Thu Sep 22 11:50:39 2022 +0200

    model: delete unused validator code
    
    Since all `generic_type_validator` are optimized away, the code will no
    longer be called. So we remove that code to avoid any drifting.
    
    A nice "exception" is provided in case this start getting called again
    in the future.

commit b7267a890888adc07b35282c944450e642133c80
Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org>
Date:   Thu Sep 22 11:40:40 2022 +0200

    model: remove the try/except
    
    Since try/except context are known to be expensive in Python, it seems
    useful to remove them.

commit cf529cd1622ee942950ca487d0dc3e71b132e39f
Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org>
Date:   Tue Sep 20 18:53:01 2022 +0200

    model: also optimize combined validator
    
    This ensure we don't have any remaining `generic_type_validator` call
    that have not been optimized away.

commit 6ababdebb749e63b3c11efb59f4c022cab43cd46
Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org>
Date:   Tue Sep 20 17:09:04 2022 +0200

    model: drop the `type_validator()` indirection
    
    This indirection seems useless and is probably the remains of some long
    forgotten rituals.

commit edb57fb15359b20050ae972b59003d7a3c8db772
Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org>
Date:   Tue Sep 20 16:51:32 2022 +0200

    model: implement specialized attribute-validator functions
    
    This should reduces function calls and speeds things up.
    
    It might be useful to introduce even more specialized validator in the
    future. It would also be useful to skip the intermediate try/except.
    
    Some of this will be done in later changesets.

commit 1dfea32477023bec219334eb4cf4921203e76f78
Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org>
Date:   Tue Sep 20 15:48:55 2022 +0200

    model: prepare the filtering of type_validator into something faster
    
    This is currently doing nothing, but prepare for actually changing the
    generic validator into faster specialized variants.

commit a2e8f18c2a685c7d4aad53345c77a7177190854e
Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org>
Date:   Thu Sep 22 18:24:46 2022 +0200

    from_disk: skip intermediate dictionnary creation when building model
    
    Before this change we would do the following :
    
    1) translate from_disk's object into `dict`,
    2) sort these dict,
    3) feed the list to `Directory.from_dict`,
    4) create DirectoryEntry from these dict.
    
    Skipping the directory creating and directly creating the
    DirectoryEntries provide us with a small but stable and noticeable
    performance win.
    
    We tested this change on simple information of the Mercurial loader,
    with a noop-loader stockage:
    
        swh loader run mercurial https://foss.heptapod.net/mercurial/mercurial-devel directory=/data/repos/mercurial-devel
    
    = Median time of 3 run =
    before: 11 minute  56 seconds
    aftere: 11 minute  50 seconds
    
    On a profile of the same run, the `to_model` call of the from_disk's `Directory` class took the following percentage:
    before: 17%
    after:  15%

commit ad3ecac9beae015b9e9144c1da180097958b2827
Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org>
Date:   Thu Sep 22 18:12:43 2022 +0200

    model: avoid another extra creation of Model object
    
    Do not create model object while sorting entry before creating model
    object.
    
    This is another case of "let us create object X to prepare the creation
    of object X", slowing things down.
    
    In practice, we will likely skip this code-path after the next
    changeset, however this seems useful to get this performance footgun
    out the way.
    
    We tested this change on simple information of the Mercurial loader,
    with a noop-loader stockage:
    
        swh loader run mercurial https://foss.heptapod.net/mercurial/mercurial-devel directory=/data/repos/mercurial-devel
    
    = Median time of 3 run =
    before  12 minutes 59 seconds
    after:  11 minute  56 seconds
    
    On a profile of the same run, the `to_model` call of the from_disk's `Directory` class took the following percentage:
    before: 24%
    after:  17%

commit 814a6c8416d56f5f8b3e590d419d5aea7a888ab2
Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org>
Date:   Tue Sep 20 14:26:17 2022 +0200

    from_disk: only build a model object once
    
    Before this change, a Directory object was built to compute the `id` of
    we fed to the Directory object we built for `to_model`.
    
    We tested this change on simple information of the Mercurial loader,
    with a noop-loader stockage:
    
        swh loader run mercurial https://foss.heptapod.net/mercurial/mercurial-devel directory=/data/repos/mercurial-devel
    
    = Median time of 3 run =
    before: 17 minutes 48 seconds
    after:  12 minutes 59 seconds
    
    On a profile of the same run, the `to_model` call of the from_disk's `Directory` class took the following percentage:
    before: 43%
    after:  24%

See https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/529/ for more details.