Do not create model object while sorting entry before creating model
object.
This is another case of "let us create object X to prepare the creation
of object X", slowing things down.
In practice, we will likely skip this code-path after the next
changeset, however this seems useful to get this performance footgun
out the way.
We tested this change on simple information of the Mercurial loader,
with a noop-loader stockage:
swh loader run mercurial https://foss.heptapod.net/mercurial/mercurial-devel directory=/data/repos/mercurial-devel
= Median time of 3 run =
before 12 minutes 59 seconds
after: 11 minute 56 seconds
On a profile of the same run, the `to_model` call of the from_disk's `Directory` class took the following percentage:
before: 24%
after: 17%Details
- Reviewers
olasd vlorentz - Group Reviewers
Reviewers - Maniphest Tasks
- T4595: Improve common tooling for loading
- Commits
- rDMODad3ecac9beae: model: avoid another extra creation of Model object
ran tox, timing and profile
Diff Detail
- Repository
- rDMOD Data model
- Branch
- D8527
- Lint
No Linters Available - Unit
No Unit Test Coverage - Build Status
Buildable 31690 Build 49579: Phabricator diff pipeline on jenkins Jenkins console · Jenkins Build 49578: arc lint + arc unit
Event Timeline
Build is green
Patch application report for D8527 (id=30710)
Could not rebase; Attempt merge onto 9ce6feb9d6...
Updating 9ce6feb..41500db Fast-forward CONTRIBUTORS | 1 + swh/model/from_disk.py | 10 +++++++--- swh/model/git_objects.py | 14 +++++++++----- 3 files changed, 17 insertions(+), 8 deletions(-)
Changes applied before test
commit 41500db7ea8856f75812f4fe8d8a66c6fdc43a82
Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org>
Date: Thu Sep 22 18:12:43 2022 +0200
model: avoid another extra creation of Model object
Do not create model object while sorting entry before creating model
object.
commit 36185705f1805465d565b00b0df917b0e4f1c8e4
Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org>
Date: Tue Sep 20 14:26:17 2022 +0200
from_disk: only build a model object once
Before this change, a Directory object was built to compute the `id` of
we fed to the Directory object we built for `to_model`.See https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/513/ for more details.
Build is green
Patch application report for D8527 (id=30713)
Could not rebase; Attempt merge onto 9ce6feb9d6...
Updating 9ce6feb..41500db Fast-forward CONTRIBUTORS | 1 + swh/model/from_disk.py | 10 +++++++--- swh/model/git_objects.py | 14 +++++++++----- 3 files changed, 17 insertions(+), 8 deletions(-)
Changes applied before test
commit 41500db7ea8856f75812f4fe8d8a66c6fdc43a82
Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org>
Date: Thu Sep 22 18:12:43 2022 +0200
model: avoid another extra creation of Model object
Do not create model object while sorting entry before creating model
object.
commit 36185705f1805465d565b00b0df917b0e4f1c8e4
Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org>
Date: Tue Sep 20 14:26:17 2022 +0200
from_disk: only build a model object once
Before this change, a Directory object was built to compute the `id` of
we fed to the Directory object we built for `to_model`.See https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/515/ for more details.
Build is green
Patch application report for D8527 (id=30721)
Could not rebase; Attempt merge onto 9ce6feb9d6...
Updating 9ce6feb..41500db Fast-forward CONTRIBUTORS | 1 + swh/model/from_disk.py | 10 +++++++--- swh/model/git_objects.py | 14 +++++++++----- 3 files changed, 17 insertions(+), 8 deletions(-)
Changes applied before test
commit 41500db7ea8856f75812f4fe8d8a66c6fdc43a82
Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org>
Date: Thu Sep 22 18:12:43 2022 +0200
model: avoid another extra creation of Model object
Do not create model object while sorting entry before creating model
object.
commit 36185705f1805465d565b00b0df917b0e4f1c8e4
Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org>
Date: Tue Sep 20 14:26:17 2022 +0200
from_disk: only build a model object once
Before this change, a Directory object was built to compute the `id` of
we fed to the Directory object we built for `to_model`.See https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/519/ for more details.
I am in the process to adding more timing/profile data to the commit message (then to fight to get phabricator to display them)
Build is green
Patch application report for D8527 (id=30726)
Could not rebase; Attempt merge onto 9ce6feb9d6...
Updating 9ce6feb..ad3ecac Fast-forward CONTRIBUTORS | 1 + swh/model/from_disk.py | 10 +++++++--- swh/model/git_objects.py | 14 +++++++++----- 3 files changed, 17 insertions(+), 8 deletions(-)
Changes applied before test
commit ad3ecac9beae015b9e9144c1da180097958b2827
Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org>
Date: Thu Sep 22 18:12:43 2022 +0200
model: avoid another extra creation of Model object
Do not create model object while sorting entry before creating model
object.
This is another case of "let us create object X to prepare the creation
of object X", slowing things down.
In practice, we will likely skip this code-path after the next
changeset, however this seems useful to get this performance footgun
out the way.
We tested this change on simple information of the Mercurial loader,
with a noop-loader stockage:
swh loader run mercurial https://foss.heptapod.net/mercurial/mercurial-devel directory=/data/repos/mercurial-devel
= Median time of 3 run =
before 12 minutes 59 seconds
after: 11 minute 56 seconds
On a profile of the same run, the `to_model` call of the from_disk's `Directory` class took the following percentage:
before: 24%
after: 17%
commit 814a6c8416d56f5f8b3e590d419d5aea7a888ab2
Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org>
Date: Tue Sep 20 14:26:17 2022 +0200
from_disk: only build a model object once
Before this change, a Directory object was built to compute the `id` of
we fed to the Directory object we built for `to_model`.
We tested this change on simple information of the Mercurial loader,
with a noop-loader stockage:
swh loader run mercurial https://foss.heptapod.net/mercurial/mercurial-devel directory=/data/repos/mercurial-devel
= Median time of 3 run =
before: 17 minutes 48 seconds
after: 12 minutes 59 seconds
On a profile of the same run, the `to_model` call of the from_disk's `Directory` class took the following percentage:
before: 43%
after: 24%See https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/523/ for more details.
Build is green
Patch application report for D8527 (id=30733)
Could not rebase; Attempt merge onto 9ce6feb9d6...
Updating 9ce6feb..ad3ecac Fast-forward CONTRIBUTORS | 1 + swh/model/from_disk.py | 10 +++++++--- swh/model/git_objects.py | 14 +++++++++----- 3 files changed, 17 insertions(+), 8 deletions(-)
Changes applied before test
commit ad3ecac9beae015b9e9144c1da180097958b2827
Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org>
Date: Thu Sep 22 18:12:43 2022 +0200
model: avoid another extra creation of Model object
Do not create model object while sorting entry before creating model
object.
This is another case of "let us create object X to prepare the creation
of object X", slowing things down.
In practice, we will likely skip this code-path after the next
changeset, however this seems useful to get this performance footgun
out the way.
We tested this change on simple information of the Mercurial loader,
with a noop-loader stockage:
swh loader run mercurial https://foss.heptapod.net/mercurial/mercurial-devel directory=/data/repos/mercurial-devel
= Median time of 3 run =
before 12 minutes 59 seconds
after: 11 minute 56 seconds
On a profile of the same run, the `to_model` call of the from_disk's `Directory` class took the following percentage:
before: 24%
after: 17%
commit 814a6c8416d56f5f8b3e590d419d5aea7a888ab2
Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org>
Date: Tue Sep 20 14:26:17 2022 +0200
from_disk: only build a model object once
Before this change, a Directory object was built to compute the `id` of
we fed to the Directory object we built for `to_model`.
We tested this change on simple information of the Mercurial loader,
with a noop-loader stockage:
swh loader run mercurial https://foss.heptapod.net/mercurial/mercurial-devel directory=/data/repos/mercurial-devel
= Median time of 3 run =
before: 17 minutes 48 seconds
after: 12 minutes 59 seconds
On a profile of the same run, the `to_model` call of the from_disk's `Directory` class took the following percentage:
before: 43%
after: 24%See https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/527/ for more details.