Page MenuHomeSoftware Heritage

Out of memory on granet
Started, Work in Progress, HighPublic

Description

Granet is running out of memory each night since a couple of days.

When the OOM occurs, the graph backend is killed interrupting the service (among other things:

ep 08 01:18:43 granet swh[3702779]: INFO:aiohttp.access:192.168.100.31 [08/Sep/2022:01:18:14 +0000] "GET /graph/leaves/swh:1:cnt:aea17e58c32146ba8ab7cd6db067c8effb7a4161?direction=backward&resolve_origins=true&limit=1&max_edges=0 HTTP/1.1" 200 206 "-" "python-requests/2.
Sep 08 01:18:43 granet swh[3702779]: INFO:aiohttp.access:192.168.100.31 [08/Sep/2022:01:18:14 +0000] "GET /graph/leaves/swh:1:cnt:5db7ca17bcd5bc303981444e5d24bb11d9bd9ca1?direction=backward&resolve_origins=true&limit=1&max_edges=0 HTTP/1.1" 200 206 "-" "python-requests/2.
Sep 08 01:18:43 granet swh[3702779]: INFO:aiohttp.access:192.168.100.31 [08/Sep/2022:01:18:13 +0000] "GET /graph/leaves/swh:1:cnt:08b12086f6e478d0ab4523cc5468808185e0bec4?direction=backward&resolve_origins=true&limit=1&max_edges=0 HTTP/1.1" 200 206 "-" "python-requests/2.
Sep 08 01:18:43 granet swh[3702779]: INFO:aiohttp.access:192.168.100.31 [08/Sep/2022:01:18:24 +0000] "GET /graph/leaves/swh:1:cnt:2fa8bdd4f1f4a75b116ae90ddc31f757325f8b61?direction=backward&resolve_origins=true&limit=1&max_edges=0 HTTP/1.1" 200 206 "-" "python-requests/2.
Sep 08 01:18:49 granet systemd[1]: prometheus-node-exporter-ipmitool-sensor.service: Failed to fork: Cannot allocate memory
Sep 08 01:18:49 granet systemd[1]: prometheus-node-exporter-ipmitool-sensor.service: Failed to run 'start' task: Cannot allocate memory
Sep 08 01:18:49 granet systemd[1]: prometheus-node-exporter-ipmitool-sensor.service: Failed with result 'resources'.
Sep 08 01:18:50 granet systemd[1]: Failed to start Collect ipmitool sensor metrics for prometheus-node-exporter.
Sep 08 01:18:50 granet swh[3702779]: INFO:aiohttp.access:192.168.100.31 [08/Sep/2022:01:18:26 +0000] "GET /graph/leaves/swh:1:cnt:c196bd382501941f5fa8bddf1c9e097b75baa64a?direction=backward&resolve_origins=true&limit=1&max_edges=0 HTTP/1.1" 200 4236 "-" "python-requests/2
Sep 08 01:18:51 granet swh[3702779]: INFO:aiohttp.access:192.168.100.31 [08/Sep/2022:01:18:30 +0000] "GET /graph/leaves/swh:1:cnt:bdd879ad9a0b41df6f0a9a6435b14567ecc57fa3?direction=backward&resolve_origins=true&limit=1&max_edges=0 HTTP/1.1" 200 308 "-" "python-requests/2.
Sep 08 01:18:52 granet swh[3702779]: INFO:aiohttp.access:192.168.100.31 [08/Sep/2022:01:18:30 +0000] "GET /graph/leaves/swh:1:cnt:5c1d96699c8fa3fa3d75859eb90c2c5a9312b93c?direction=backward&resolve_origins=true&limit=1&max_edges=0 HTTP/1.1" 200 206 "-" "python-requests/2.
Sep 08 01:18:52 granet swh[3702779]: INFO:aiohttp.access:192.168.100.31 [08/Sep/2022:01:18:30 +0000] "GET /graph/leaves/swh:1:cnt:5b827189f4245d9c485898ab81c69d57703b1658?direction=backward&resolve_origins=true&limit=1&max_edges=0 HTTP/1.1" 200 206 "-" "python-requests/2.
Sep 08 01:18:55 granet swh[3702779]: INFO:aiohttp.access:192.168.100.31 [08/Sep/2022:01:18:30 +0000] "GET /graph/leaves/swh:1:cnt:1d4b62add77313ef18e87faa34776c3c71c3aba5?direction=backward&resolve_origins=true&limit=1&max_edges=0 HTTP/1.1" 200 768 "-" "python-requests/2.
Sep 08 01:19:01 granet swh[3702779]: INFO:aiohttp.access:192.168.100.31 [08/Sep/2022:01:18:34 +0000] "GET /graph/leaves/swh:1:cnt:8cdca2dffa7f1a0ed1afabe81757bc6a3daf886b?direction=backward&resolve_origins=true&limit=1&max_edges=0 HTTP/1.1" 200 308 "-" "python-requests/2.
Sep 08 01:19:02 granet swh[3702779]: INFO:aiohttp.access:192.168.100.31 [08/Sep/2022:01:18:34 +0000] "GET /graph/leaves/swh:1:cnt:97ecfd5b6eaf83587eac967a9aa4d0c24d7bf0c0?direction=backward&resolve_origins=true&limit=1&max_edges=0 HTTP/1.1" 200 206 "-" "python-requests/2.
Sep 08 01:19:11 granet sshd[3958]: error: fork: Cannot allocate memory
Sep 08 01:19:41 granet sshd[3958]: error: fork: Cannot allocate memory
Sep 08 01:19:42 granet swh[3702779]: INFO:aiohttp.access:192.168.100.31 [08/Sep/2022:01:18:40 +0000] "GET /graph/leaves/swh:1:cnt:120dfcd453a1c1e1f4a7c19534f22e66a0de0402?direction=backward&resolve_origins=true&limit=1&max_edges=0 HTTP/1.1" 200 206 "-" "python-requests/2.
Sep 08 01:19:43 granet swh[3702779]: ERROR:root:Cannot write to closing transport
...
Sep 08 01:20:16 granet swh[3702779]:     await self._flush_buffer()
Sep 08 01:20:57 granet kernel: pool-1-thread-1 invoked oom-killer: gfp_mask=0x6200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null), order=0, oom_score_adj=0
Sep 08 01:20:57 granet kernel: pool-1-thread-1 cpuset=/ mems_allowed=0-1
Sep 08 01:20:57 granet kernel: CPU: 16 PID: 3740405 Comm: pool-1-thread-1 Tainted: P           OE     4.19.0-20-amd64 #1 Debian 4.19.235-1
Sep 08 01:20:57 granet kernel: Hardware name: Dell Inc. PowerEdge R740xd/014X06, BIOS 2.13.3 12/13/2021
Sep 08 01:20:57 granet kernel: Call Trace:
Sep 08 01:20:57 granet kernel:  dump_stack+0x66/0x81
Sep 08 01:20:57 granet kernel:  dump_header+0x6b/0x283
Sep 08 01:20:57 granet kernel:  oom_kill_process.cold.30+0xb/0x1cf
Sep 08 01:20:57 granet kernel:  ? oom_badness+0x23/0x140

Related Objects

Event Timeline

vsellier triaged this task as High priority.Thu, Sep 8, 9:38 AM
vsellier created this task.

@vlorentz I assigned the task to you because if I'm not wrong you are running some experiments on granet.
I don't know what, but you should be more gentle with the server

I'll try reducing -Xmx again...

Nope, I can't lower it.

I guess I'll have to rewrite seirl's FindEarliestRevision.java to use the gRPC protocol instead of loading the graph itself

vlorentz changed the task status from Open to Work in Progress.Fri, Sep 9, 2:36 PM
vlorentz moved this task from Backlog to In progress on the Compressed graph service board.