Page MenuHomeSoftware Heritage

opam: Share opam root directory even on multiple instances
ClosedPublic

Authored by ardumont on Sep 21 2021, 6:23 PM.

Details

Summary

That avoid having multiple distinct opam root directories per opam lister instance,
hence use less side effects (less disk space, less network) . The current opam commands
used by the lister are actually listing specifically per instance.

Related to P1171
Related to T3590

Test Plan

tox

Diff Detail

Repository
rDLS Listers
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build is green

Patch application report for D6316 (id=22950)

Rebasing onto 5ab6b00408...

Current branch diff-target is up to date.
Changes applied before test
commit 9af44c655ab9526854108f07a43a8f986b9d09ca
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Tue Sep 21 18:18:49 2021 +0200

    opam: Share opam root directory even on multiple instances
    
    That avoid having multiple distinct opam root directories per opam lister instance. The
    current opam commands used by the lister are actually listing specifically per instance.
    
    Related to P1171

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/377/ for more details.

Can two listers run in parallel while using the same directory?

Can two listers run in parallel while using the same directory?

yes, but that state should really be dealt with outside the lister.
I'm working on that in D6305.

It's still there though so maybe it's actually runnable within
say the docker dev environment.

Sorry, my question was about opam itself. Are you sure it won't corrupt data, or wait on a global lock when accessing that directory?

Sorry, my question was about opam itself. Are you sure it won't corrupt data, or wait on a global lock when accessing that directory?

That's a good question which i don't know the answer to.
I do plan to actually test this in docker (i need some adaptations in the loader first), then in staging to check if everyting is running smoothly without corruption nor lock.

Ok.

What is the motivation for merging directories?

Ok.

What is the motivation for merging directories?

Use less disk space and less network access during the run per worker.

I think there was already a problem before, but since we have now more chance to hit it, I'd really like the opam_init process to lock the directory when running opam commands.

Following what i said in the loader diff, i'm actually closing this.
Ack on the lock folder but i won't attend to it immediately.

[1] D6318

Following what i said in the loader diff, i'm actually closing this.
Ack on the lock folder but i won't attend to it immediately.

[1] D6318

As i was wrong in my implementation of the loader implementation and @aleo made me realize, i've fixed it.
So now that lister diff becomes relevant again, so claimed it back.

I think there was already a problem before, but since we have now more chance to hit it, I'd really like the opam_init process to lock the directory when running opam commands.

It's a great idea but i've no idea how to actually do that though.

Maybe adding --safe flag [1] during the command that actually list the packages would be enough instead.
I've actually added that for the loader [2] (for the command that also read information)

[1]

--safe, --readonly
    Make sure nothing will be automatically updated or rewritten. Useful for calling from completion scripts, for example. Will fail whenever such an operation is needed ; also avoids waiting for locks, skips interactive
    questions and overrides the $OPAMDEBUG variable. This is equivalent to set environment variable $OPAMSAFE.

[2] D6318

Following what i said in the loader diff, i'm actually closing this.
Ack on the lock folder but i won't attend to it immediately.

[1] D6318

As i was wrong in my implementation of the loader implementation and @aleo made me realize, i've fixed it.
So now that lister diff becomes relevant again, so claimed it back.

I think there was already a problem before, but since we have now more chance to hit it, I'd really like the opam_init process to lock the directory when running opam commands.

It's a great idea but i've no idea how to actually do that though.

Maybe adding --safe flag [1] during the command that actually list the packages would be enough instead.
I've actually added that for the loader [2] (for the command that also read information)

[1]

--safe, --readonly
    Make sure nothing will be automatically updated or rewritten. Useful for calling from completion scripts, for example. Will fail whenever such an operation is needed ; also avoids waiting for locks, skips interactive
    questions and overrides the $OPAMDEBUG variable. This is equivalent to set environment variable $OPAMSAFE.

[2] D6318

You may use fcntl.flock for this

You may use fcntl.flock for this

I mean using an empty (lock) file in the opam_root directory.

You may use fcntl.flock for this

I mean using an empty (lock) file in the opam_root directory.

See also 3rd party libraries like https://pypi.org/project/filelock/

Add tests around mock-init (no lock just yet, maybe in another diff)

Build has FAILED

Patch application report for D6316 (id=23037)

Rebasing onto 5ab6b00408...

Current branch diff-target is up to date.
Changes applied before test
commit 1f43183537afe8f5671357dcee929b41ffc632e0
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Tue Sep 21 18:18:49 2021 +0200

    opam: Share opam root directory even on multiple instances
    
    That avoids having multiple distinct opam root directories per opam lister instance. The
    current opam commands used by the lister are actually listing specifically per instance.
    
    Related to P1171

Link to build: https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/379/
See console output for more information: https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/379/console

Build is green

Patch application report for D6316 (id=23039)

Rebasing onto 5ab6b00408...

Current branch diff-target is up to date.
Changes applied before test
commit e7716c0122e9c4a411ed54dde821b48939eb2ebb
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Tue Sep 21 18:18:49 2021 +0200

    opam: Share opam root directory even on multiple instances
    
    That avoids having multiple distinct opam root directories per opam lister instance. The
    current opam commands used by the lister are actually listing specifically per instance.
    
    Related to P1171

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/380/ for more details.

You may use fcntl.flock for this

I mean using an empty (lock) file in the opam_root directory.

See also 3rd party libraries like https://pypi.org/project/filelock/

@douardda I've started a work [1] on this.

But in the end, i don't think that's the quite needed.
The expected workers for loading will actually be deployed with a concurrency of 1 [2]
(and the lister will run once a day).

What do you think?

[1] D6337

[2] That was sufficiently fast that way.

You may use fcntl.flock for this

I mean using an empty (lock) file in the opam_root directory.

See also 3rd party libraries like https://pypi.org/project/filelock/

@douardda I've started a work [1] on this.

But in the end, i don't think that's the quite needed.
The expected workers for loading will actually be deployed with a concurrency of 1 [2]
(and the lister will run once a day).

What do you think?

[1] D6337

As mentioned in the ^, opam should take care of the lock.

[2] That was sufficiently fast that way.

This revision is now accepted and ready to land.Sep 28 2021, 6:55 PM