Page MenuHomeSoftware Heritage

Google code svn import - 'Too many open file' errors
Closed, ResolvedPublic

Description

On some repositories with supposedly lots of files, an os error about too many open file error is raised.

Steps to reproduce on a local storage with latest swh-loader-svn.

Use /srv/storage/space/mirrors/code.google.com/sources/v2/code.google.com/c/c-semantics/c-semantics-repo.svndump.gz:

dump = 'c-semantics-repo.svndump.gz'
origin_url = 'http://foo/bar/svn'

import logging
logging.basicConfig(level=logging.DEBUG)

from swh.loader.svn.tasks import MountAndLoadSvnRepositoryTsk

t = MountAndLoadSvnRepositoryTsk()
t.run(archive_path=dump, origin_url=origin_url, visit_date='2016-05-03T15:16:32+00:00')

output:

INFO:swh.loader.svn.SvnLoader:Archive to mount and load c-semantics-repo.svndump.gz
DEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:Creating svn origin for http://foo/bar/svn
DEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:Done creating svn origin for http://foo/bar/svn
DEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:Creating svn origin for http://foo/bar/svn
DEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:Done creating svn origin for http://foo/bar/svn
################## ... Strange but not lethal...  (could possibly explain an integrity error i've seen some time ago on origin creation...)
################## oh, yeah, loader-svn needs the origin created prior to the actual base class (loader-core) initializing it
################## possibly, this can be improved but this is not the task at end
################## fixed!
DEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:Creating origin_visit for origin 7 at time 2016-05-03T15:16:32+00:00
DEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:Done Creating origin_visit for origin 7 at time 2016-05-03T15:16:32+00:00
INFO:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:[revision_start-revision_end]: [1-1320]
INFO:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:Processing {'local_url': b'/tmp/tmp.v_ezqr26.swh.loader/tmp.a6ouqrqy.swh.loader.svn', 'remote_url': 'file:///tmp/tmp.a6ouqrqy.swh.loader.svn', 'swh-origin': 7, 'uuid': b'9230c010-68a0-4913-9f46-c5e1b1c9998b'}.
DEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:rev: 1, swhrev: 21cd3d56a4f4fe553d6b0c494579dc2b9590da89, dir: 24e1d3e7645aff42951637d824c0857e91058120
DEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:rev: 2, swhrev: 482ce1333a76e1f8325ec7cdf9d9d590f3dfbed6, dir: 630bb014fb6ce836ba339a7cf66a7b9c346bd71b
DEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:rev: 3, swhrev: 0a910abbe53b7d27c961c2c56f65aa6ea4571fea, dir: b310a8dcdfc5e4a6875b0c17a36af372e1acb31c
...
DEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:rev: 515, swhrev: 3de5df05ced29ce2765ea8007e81e2ec6e469f58, dir: 8e5bbfc7b14f28f8ba2f5acaae081e3f368ce17e
ERROR:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:Eventful partial visit. Detail: [Errno 24] Can't open file '/tmp/tmp.a6ouqrqy.swh.loader.svn/db/revs/0/420': Too many open files
DEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:occ: {'visit': 1, 'target_type': 'revision', 'origin': 7, 'target': b'\x84\xe8\\c^\xba0!\xa5RrzR\xc2\x90H\x9d\xb2\xaaw', 'branch': 'master'}
ERROR:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:Loading failure, updating to `partial` status
Traceback (most recent call last):
  File "/home/tony/work/inria/repo/swh/swh-environment/swh-loader-svn/swh/loader/svn/loader.py", line 216, in process_swh_revisions
    self.config['revision_packet_size']):
  File "/home/tony/work/inria/repo/swh/swh-environment/swh-core/swh/core/utils.py", line 40, in grouper
    for _data in itertools.zip_longest(*args, fillvalue=None):
  File "/home/tony/work/inria/repo/swh/swh-environment/swh-loader-svn/swh/loader/svn/loader.py", line 160, in process_svn_revisions
    for rev, nextrev, commit, new_objects, root_directory in gen_revs:
  File "/home/tony/work/inria/repo/swh/swh-environment/swh-loader-svn/swh/loader/svn/svn.py", line 266, in swh_hash_data_per_revision
    objects = self.swhreplay.compute_hashes(rev)
  File "/home/tony/work/inria/repo/swh/swh-environment/swh-loader-svn/swh/loader/svn/ra.py", line 333, in compute_hashes
    self.replay(rev)
  File "/home/tony/work/inria/repo/swh/swh-environment/swh-loader-svn/swh/loader/svn/ra.py", line 318, in replay
    self.conn.replay(rev, rev+1, self.editor)
OSError: [Errno 24] Can't open file '/tmp/tmp.a6ouqrqy.swh.loader.svn/db/revs/0/420': Too many open files

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/tony/work/inria/repo/swh/swh-environment/swh-loader-core/swh/loader/core/loader.py", line 681, in load
    self.store_data()
  File "/home/tony/work/inria/repo/swh/swh-environment/swh-loader-svn/swh/loader/svn/loader.py", line 308, in store_data
    self.last_known_swh_revision)
  File "/home/tony/work/inria/repo/swh/swh-environment/swh-loader-svn/swh/loader/svn/loader.py", line 496, in process_repository
    svnrepo, revision_start, revision_end, revision_parents)
  File "/home/tony/work/inria/repo/swh/swh-environment/swh-loader-svn/swh/loader/svn/loader.py", line 235, in process_swh_revisions
    'id': _id,
##################### raises an eyebrow... where is the rest of the stack trace?
swh.loader.svn.loader.SvnLoaderEventful: [Errno 24] Can't open file '/tmp/tmp.a6ouqrqy.swh.loader.svn/db/revs/0/420': Too many open files
DEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:Updating origin_visit for origin 7 with status partial
DEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:Done updating origin_visit for origin 7 with status partial
DEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:Sending 1 occurrences
DEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:Done sending 1 occurrences
DEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:Clean up temp directory /tmp/tmp.a6ouqrqy.swh.loader.svn for project

The loading fails at revision 515.

Event Timeline

ardumont created this task.Oct 24 2017, 2:03 PM

There is no file descriptor leak.

Everything that's opened is closed at https://forge.softwareheritage.org/source/swh-loader-svn/browse/master/swh/loader/svn/ra.py;8cedf7417dea0b26edca3c6736939bde0a351e35$39.

I'm more inclined to think that the ulimit -n is too low for the loader-svn's case.
The azure workers have one of 65536 (connected and checked).

As i don't see anything in our puppet-environment regarding this, it must be set to the default for our swh-workers (which seems to be 1024).
I cannot check them explicitely, we are in between migration right now, so they are down.
Still, uffizi, getty, prado, etc... are indeed set to 1024 (and i believe they are provisioned the same way).

Anyway, increasing the ulimit -n in my shell (1024 -> 65536), the loader goes far beyond that point.
It then fails on another edge case (about directory already existing - fixed).

ardumont added a comment.EditedOct 25 2017, 10:50 AM

We could use the loader-svn's systemd [service] property LimitNOFILE:

  • 'LimitNOFILE= ulimit -n Number of File Descriptors'.

source:

Looking further into it.
Defining such option for this in our puppet manifest would make this eventually end up in /etc/systemd/system/swh-worker@${SERVICE_NAME}.service.d/parameters.conf.
In this case ${SERVICE_NAME} being swh_loader_svn.

If not provided (the default), the result would be the actual state.

source:

ardumont closed this task as Resolved.Nov 7 2017, 6:38 PM
ardumont claimed this task.
ardumont renamed this task from Google code svn dump - 'Too many open file' errors to Google code svn import - 'Too many open file' errors.Nov 7 2017, 6:45 PM

As usual, to be complete...

uring handling of the above exception, another exception occurred:
...

  1. raises an eyebrow... where is the rest of the stack trace?

I misunderstood the stack trace so it's indeed complete.

It's not perfect but, in that part of the code, we trap all exceptions and raise another one SvnLoaderEventful.
This translates to the error message of one exception was raised within another one.

I found this ugly though but don't know any better.