Page MenuHomeSoftware Heritage

model: avoid another extra creation of Model object
ClosedPublic

Authored by marmoute on Sep 23 2022, 11:31 AM.

Details

Summary
Do not create model object while sorting entry before creating model
object.

This is another case of "let us create object X to prepare the creation
of object X", slowing things down.

In practice, we will likely skip this code-path after the next
changeset, however this seems useful to get this performance footgun
out the way.

We tested this change on simple information of the Mercurial loader,
with a noop-loader stockage:

    swh loader run mercurial https://foss.heptapod.net/mercurial/mercurial-devel directory=/data/repos/mercurial-devel

= Median time of 3 run =
before  12 minutes 59 seconds
after:  11 minute  56 seconds

On a profile of the same run, the `to_model` call of the from_disk's `Directory` class took the following percentage:
before: 24%
after:  17%
Test Plan

ran tox, timing and profile

Diff Detail

Repository
rDMOD Data model
Branch
D8527
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 31703
Build 49601: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 49600: arc lint + arc unit

Event Timeline

Build is green

Patch application report for D8527 (id=30710)

Could not rebase; Attempt merge onto 9ce6feb9d6...

Updating 9ce6feb..41500db
Fast-forward
 CONTRIBUTORS             |  1 +
 swh/model/from_disk.py   | 10 +++++++---
 swh/model/git_objects.py | 14 +++++++++-----
 3 files changed, 17 insertions(+), 8 deletions(-)
Changes applied before test
commit 41500db7ea8856f75812f4fe8d8a66c6fdc43a82
Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org>
Date:   Thu Sep 22 18:12:43 2022 +0200

    model: avoid another extra creation of Model object
    
    Do not create model object while sorting entry before creating model
    object.

commit 36185705f1805465d565b00b0df917b0e4f1c8e4
Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org>
Date:   Tue Sep 20 14:26:17 2022 +0200

    from_disk: only build a model object once
    
    Before this change, a Directory object was built to compute the `id` of
    we fed to the Directory object we built for `to_model`.

See https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/513/ for more details.

Build is green

Patch application report for D8527 (id=30713)

Could not rebase; Attempt merge onto 9ce6feb9d6...

Updating 9ce6feb..41500db
Fast-forward
 CONTRIBUTORS             |  1 +
 swh/model/from_disk.py   | 10 +++++++---
 swh/model/git_objects.py | 14 +++++++++-----
 3 files changed, 17 insertions(+), 8 deletions(-)
Changes applied before test
commit 41500db7ea8856f75812f4fe8d8a66c6fdc43a82
Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org>
Date:   Thu Sep 22 18:12:43 2022 +0200

    model: avoid another extra creation of Model object
    
    Do not create model object while sorting entry before creating model
    object.

commit 36185705f1805465d565b00b0df917b0e4f1c8e4
Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org>
Date:   Tue Sep 20 14:26:17 2022 +0200

    from_disk: only build a model object once
    
    Before this change, a Directory object was built to compute the `id` of
    we fed to the Directory object we built for `to_model`.

See https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/515/ for more details.

Build is green

Patch application report for D8527 (id=30721)

Could not rebase; Attempt merge onto 9ce6feb9d6...

Updating 9ce6feb..41500db
Fast-forward
 CONTRIBUTORS             |  1 +
 swh/model/from_disk.py   | 10 +++++++---
 swh/model/git_objects.py | 14 +++++++++-----
 3 files changed, 17 insertions(+), 8 deletions(-)
Changes applied before test
commit 41500db7ea8856f75812f4fe8d8a66c6fdc43a82
Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org>
Date:   Thu Sep 22 18:12:43 2022 +0200

    model: avoid another extra creation of Model object
    
    Do not create model object while sorting entry before creating model
    object.

commit 36185705f1805465d565b00b0df917b0e4f1c8e4
Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org>
Date:   Tue Sep 20 14:26:17 2022 +0200

    from_disk: only build a model object once
    
    Before this change, a Directory object was built to compute the `id` of
    we fed to the Directory object we built for `to_model`.

See https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/519/ for more details.

I am in the process to adding more timing/profile data to the commit message (then to fight to get phabricator to display them)

Build is green

Patch application report for D8527 (id=30726)

Could not rebase; Attempt merge onto 9ce6feb9d6...

Updating 9ce6feb..ad3ecac
Fast-forward
 CONTRIBUTORS             |  1 +
 swh/model/from_disk.py   | 10 +++++++---
 swh/model/git_objects.py | 14 +++++++++-----
 3 files changed, 17 insertions(+), 8 deletions(-)
Changes applied before test
commit ad3ecac9beae015b9e9144c1da180097958b2827
Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org>
Date:   Thu Sep 22 18:12:43 2022 +0200

    model: avoid another extra creation of Model object
    
    Do not create model object while sorting entry before creating model
    object.
    
    This is another case of "let us create object X to prepare the creation
    of object X", slowing things down.
    
    In practice, we will likely skip this code-path after the next
    changeset, however this seems useful to get this performance footgun
    out the way.
    
    We tested this change on simple information of the Mercurial loader,
    with a noop-loader stockage:
    
        swh loader run mercurial https://foss.heptapod.net/mercurial/mercurial-devel directory=/data/repos/mercurial-devel
    
    = Median time of 3 run =
    before  12 minutes 59 seconds
    after:  11 minute  56 seconds
    
    On a profile of the same run, the `to_model` call of the from_disk's `Directory` class took the following percentage:
    before: 24%
    after:  17%

commit 814a6c8416d56f5f8b3e590d419d5aea7a888ab2
Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org>
Date:   Tue Sep 20 14:26:17 2022 +0200

    from_disk: only build a model object once
    
    Before this change, a Directory object was built to compute the `id` of
    we fed to the Directory object we built for `to_model`.
    
    We tested this change on simple information of the Mercurial loader,
    with a noop-loader stockage:
    
        swh loader run mercurial https://foss.heptapod.net/mercurial/mercurial-devel directory=/data/repos/mercurial-devel
    
    = Median time of 3 run =
    before: 17 minutes 48 seconds
    after:  12 minutes 59 seconds
    
    On a profile of the same run, the `to_model` call of the from_disk's `Directory` class took the following percentage:
    before: 43%
    after:  24%

See https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/523/ for more details.

marmoute retitled this revision from avoid another extra creation of Model object to model: avoid another extra creation of Model object.Sep 23 2022, 4:51 PM
marmoute edited the summary of this revision. (Show Details)
marmoute added reviewers: olasd, vlorentz.
marmoute edited the summary of this revision. (Show Details)
marmoute edited the test plan for this revision. (Show Details)
marmoute retitled this revision from model: avoid another extra creation of Model object to model: avoid another extra creation of Model object.
marmoute edited the summary of this revision. (Show Details)

batch-update

Build is green

Patch application report for D8527 (id=30733)

Could not rebase; Attempt merge onto 9ce6feb9d6...

Updating 9ce6feb..ad3ecac
Fast-forward
 CONTRIBUTORS             |  1 +
 swh/model/from_disk.py   | 10 +++++++---
 swh/model/git_objects.py | 14 +++++++++-----
 3 files changed, 17 insertions(+), 8 deletions(-)
Changes applied before test
commit ad3ecac9beae015b9e9144c1da180097958b2827
Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org>
Date:   Thu Sep 22 18:12:43 2022 +0200

    model: avoid another extra creation of Model object
    
    Do not create model object while sorting entry before creating model
    object.
    
    This is another case of "let us create object X to prepare the creation
    of object X", slowing things down.
    
    In practice, we will likely skip this code-path after the next
    changeset, however this seems useful to get this performance footgun
    out the way.
    
    We tested this change on simple information of the Mercurial loader,
    with a noop-loader stockage:
    
        swh loader run mercurial https://foss.heptapod.net/mercurial/mercurial-devel directory=/data/repos/mercurial-devel
    
    = Median time of 3 run =
    before  12 minutes 59 seconds
    after:  11 minute  56 seconds
    
    On a profile of the same run, the `to_model` call of the from_disk's `Directory` class took the following percentage:
    before: 24%
    after:  17%

commit 814a6c8416d56f5f8b3e590d419d5aea7a888ab2
Author: Pierre-Yves David <pierre-yves.david@ens-lyon.org>
Date:   Tue Sep 20 14:26:17 2022 +0200

    from_disk: only build a model object once
    
    Before this change, a Directory object was built to compute the `id` of
    we fed to the Directory object we built for `to_model`.
    
    We tested this change on simple information of the Mercurial loader,
    with a noop-loader stockage:
    
        swh loader run mercurial https://foss.heptapod.net/mercurial/mercurial-devel directory=/data/repos/mercurial-devel
    
    = Median time of 3 run =
    before: 17 minutes 48 seconds
    after:  12 minutes 59 seconds
    
    On a profile of the same run, the `to_model` call of the from_disk's `Directory` class took the following percentage:
    before: 43%
    after:  24%

See https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/527/ for more details.

This revision is now accepted and ready to land.Sep 26 2022, 1:35 PM