Page MenuHomeSoftware Heritage

Remove dead code from the Python interface
ClosedPublic

Authored by seirl on May 11 2022, 4:13 PM.

Details

Summary
  • api_client: unused CLI command that just queried a server and printed its statistics
  • maps / swhid.py: was used to translate the mappings of the SWHIDs to node IDs. Now one of the mappings has disappeared (MPH is used instead) and this translation entirely happens in the Java side. This entire code is no longer needed.
  • cachemount: can be trivially replaced by a simple systemd service, already how it currently is set up in production:
[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=mkdir -p /dev/shm/swh-graph/default
ExecStart=sh -c "ln -s /.../compressed/* /dev/shm/swh-graph/default"
ExecStart=sh -c "cp --remove-destination /.../compressed/graph.graph /dev/shm/swh-graph/default"
ExecStart=sh -c "cp --remove-destination /.../compressed/graph-transposed.graph /dev/shm/swh-graph/default"
ExecStop=rm -rf /dev/shm/swh-graph/default

Prod should decide what are the correct tradeoffs as to what should be
put in ram and what should be symlinked, instead of leaving that
responsibility to the cachemount command.

Diff Detail

Repository
rDGRPH Compressed graph representation
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

seirl edited the summary of this revision. (Show Details)

Build is green

Patch application report for D7814 (id=28220)

Rebasing onto 6a7d2ad35e...

Current branch diff-target is up to date.
Changes applied before test
commit c5a5a76acb28599103df64d62c60e5b3bc3eb728
Author: Antoine Pietri <antoine.pietri1@gmail.com>
Date:   Wed May 11 16:05:28 2022 +0200

    Remove dead code from the Python interface
    
    - api_client: unused CLI command that just queried a server and printed
      its statistics
    
    - maps / swhid.py: was used to translate the mappings of the SWHIDs to
      node IDs. Now one of the mappings has disappeared (MPH is used
      instead) and this translation entirely happens in the Java side. This
      entire code is no longer needed.
    
    - cachemount: can be trivially replaced by a simple systemd service,
      already how it currently is set up in production:
    
            [Service]
            Type=oneshot
            RemainAfterExit=yes
            ExecStart=mkdir -p /dev/shm/swh-graph/default
            ExecStart=sh -c "ln -s /.../compressed/* /dev/shm/swh-graph/default"
            ExecStart=sh -c "cp --remove-destination /.../compressed/graph.graph /dev/shm/swh-graph/default"
            ExecStart=sh -c "cp --remove-destination /.../compressed/graph-transposed.graph /dev/shm/swh-graph/default"
            ExecStop=rm -rf /dev/shm/swh-graph/default
    
      Prod should decide what are the correct tradeoffs as to what should be
      put in ram and what should be symlinked, instead of leaving that
      responsibility to the cachemount command.

See https://jenkins.softwareheritage.org/job/DGRPH/job/tests-on-diff/187/ for more details.

seirl requested review of this revision.May 11 2022, 4:18 PM
This revision is now accepted and ready to land.May 11 2022, 4:57 PM

@seirl: So, for future version deployments, nothing changes, right?

@seirl: So, for future version deployments, nothing changes, right?

Nope, nothing at all. It's literally dead code.

@seirl: So, for future version deployments, nothing changes, right?

Nope, nothing at all. It's literally dead code.

Great, thx.

zack requested changes to this revision.May 11 2022, 10:40 PM

I'm fine with this code cleanup, with one caveat: that we document/ship the systemd startup service (and its meaning, including some intuitions about the trade-offs you mention) somewhere, in replacement of the cachemount command.

This revision now requires changes to proceed.May 11 2022, 10:40 PM

I'm fine with this code cleanup, with one caveat: that we document/ship the systemd startup service (and its meaning, including some intuitions about the trade-offs you mention) somewhere, in replacement of the cachemount command.

right.

In D7814#203336, @zack wrote:

I'm fine with this code cleanup, with one caveat: that we document/ship the systemd startup service (and its meaning, including some intuitions about the trade-offs you mention) somewhere, in replacement of the cachemount command.

It's already in puppet (swh-site/site-modules/profile/templates/swh/deploy/graph/swhgraphshm.service.erb).

I will document the memory tradeoffs, but this will be part of a larger doc section and it can't really be part of this diff.

In D7814#203592, @seirl wrote:
In D7814#203336, @zack wrote:

I'm fine with this code cleanup, with one caveat: that we document/ship the systemd startup service (and its meaning, including some intuitions about the trade-offs you mention) somewhere, in replacement of the cachemount command.

It's already in puppet (swh-site/site-modules/profile/templates/swh/deploy/graph/swhgraphshm.service.erb).

But that is precisely my point. We have users of swh-graph who have no idea about the SWH infrastructure setup, but they might want to be able to charge the graph in memory only once and conduct multiple experiments on it via the Java API. Before this change, although with the limitations of the cachemount command, they had a chance to discover this way of working. With the cachemount command gone and the systemd manifest in an entirely different place, they will have no chance of finding out about this.

Put it otherwise: documenting the systemd service is documentation for *external* users, whereas the puppet manifest is documentation for *internal* (= SWH) needs.

This revision is now accepted and ready to land.May 13 2022, 4:37 PM