Page MenuHomeSoftware Heritage

failing worker consumes remaining tasks without processing them
Closed, ResolvedPublic

Description

What is implicit in the parent task T964:

  • out of ram worker is killed (it cannot clean up since it's killed)
  • the node running the worker is then mostly idle for that particular work (in regards to the other sister nodes)
  • so it starts consuming the queue faster than the other workers (since they do actual work)
  • and fails faster
  • resulting in an empty queue in the end

That is what i was trying to solve in T964 (well finding proper solution to implement for the moment).

As I realized it was not explicitly mentioned, opening a dedicated issue for it.

Event Timeline

ardumont created this task.Mar 5 2018, 11:41 AM
ardumont triaged this task as Normal priority.

So far, the solutions i foresee:

|---------------------------------+---------------------------------------+---------------------------------+------------------------------------|
| Possible solution               | Pros                                  | Cons                            | Question                           |
|---------------------------------+---------------------------------------+---------------------------------+------------------------------------|
| 1. Capture OOM event -> force a | - separation of concern (code)        |                                 | - is it possible?                  |
| cleanup (somehow                |                                       |                                 | -> check syslog for killed process |
|                                 |                                       |                                 | -> listen to dedicated event       |
|                                 |                                       |                                 | (per worker)                       |
|                                 |                                       |                                 | -> mem_notify                      |
|                                 |                                       |                                 | https://lwn.net/Articles/267013    |
|---------------------------------+---------------------------------------+---------------------------------+------------------------------------|
| 2. Loader starts -> post a      | - separation of concern (code)        | - edge case can go bad,         | how to determine:                  |
| cleanup message to a queue      |                                       | generating quite some messages  | - which worker to clean?           |
| (as soon as it can)             |                                       | in queue until the current      | -> worker-name as task message     |
|                                 |                                       | clean up takes place            | parameter                          |
|                                 |                                       |                                 | - when to clean?                   |
|                                 |                                       |                                 | -> pid alive check                 |
|                                 |                                       |                                 | (python3-psutil)                   |
|---------------------------------+---------------------------------------+---------------------------------+------------------------------------|
| 3. Loader starts -> check for   | - short-term: pragmatic (current      | - Separation of concern not     |                                    |
| dangling files in a specific    | state)                                | respected (doing much more than |                                    |
| location                        | - can be plugged in loader-core (     | loading                         |                                    |
|                                 | all workers benefit)                  | - must implement this           |                                    |
|                                 | - simpler to implement                | potentially on other layers     |                                    |
|                                 |                                       | (lister, etc                    |                                    |
|---------------------------------+---------------------------------------+---------------------------------+------------------------------------|
| 4. Worker (node?) supervisor    | - separation of concern               |                                 | 1. How to detect falsy behavior:   |
|                                 | - long-term solution                  |                                 | -> checks per worker type:         |
|                                 |                                       |                                 | - disk status                      |
|                                 |                                       |                                 | - ram usage                        |
|                                 |                                       |                                 | - ~task consuming rate             |
|                                 |                                       |                                 | 2. existing techno or ad-hoc?      |
|---------------------------------+---------------------------------------+---------------------------------+------------------------------------|
| 5. Separate nodes for           | - More control over the resource      | - The main issue could still    |                                    |
| specialized workers             | allocation (e.g. more disk for svn    | happen (but probably less       |                                    |
|                                 | and hg workers, less for git ones...) | often though)                   |                                    |
|                                 | - separation of concern (system       | - push the problem at           |                                    |
|                                 | level now)                            | system/provision/deployment     |                                    |
|                                 | - long-term solution                  | level                           |                                    |
|---------------------------------+---------------------------------------+---------------------------------+------------------------------------|

Note: org-mode table

ardumont added a comment.EditedMar 5 2018, 4:36 PM

Still asserting the possibilities (and reading documentation):

|---------------------------------+---------------------------------------+---------------------------------+------------------------------------|
| Possible solution               | Pros                                  | Cons                            | Question                           |
|---------------------------------+---------------------------------------+---------------------------------+------------------------------------|
| 6. Check before scheduling task | - separation of concern               |                                 | How to check node's state?         |
| (same way scheduling is sent to |                                       |                                 |                                    |
| queue or not depending on size) |                                       |                                 |                                    |
|---------------------------------+---------------------------------------+---------------------------------+------------------------------------|
| 7. Chain temporary folder       | - separation of concern (code)        | Same as 2.                      | pre-requisite:                     |
| step creation task + loading    | - can be applied to all loaders, ...  |                                 | force chaining to execute on       |
| task using temporary folder +   |                                       |                                 | the same worker (chains [2])       |
| + cleaning up task (independent |                                       |                                 |                                    |
| from loading task result        |                                       |                                 |                                    |
|---------------------------------+---------------------------------------+---------------------------------+------------------------------------|

[2] http://docs.celeryproject.org/en/latest/userguide/canvas.html#canvas-chain
ardumont added a comment.EditedMar 7 2018, 3:48 PM
|---------------------------------+---------------------------------------+---------------------------------+------------------------------------|
| Possible solution               | Pros                                  | Cons                            | Question                           |
|---------------------------------+---------------------------------------+---------------------------------+------------------------------------|
| 8. Celery signal on postrun     | - separation of concern               | Not working, in OOM kill        | Documentation [3]                  |
|                                 |                                       | scenario, task is killed        |                                    |
|                                 |                                       | (tested)                        |                                    |
|---------------------------------+---------------------------------------+---------------------------------+------------------------------------|


[3] http://docs.celeryproject.org/en/latest/userguide/signals.html#task-postrun
ardumont closed this task as Resolved.Mar 9 2018, 11:17 AM

Well, after much digging documentations and some tryouts. Finally gave in to solution 3.

So, now the loaders can implement a pre_cleanup method (loader-core, does nothing by default) to try and clean up after their typed siblings (svn, mercurial).

The loaders already used temporary folders for their computations (which is now sandboxed at the systemd level, so no collision should happen between loaders).
Now, they still use temporary folders, but those follows patterned names (swh.loader.{type}-{unique-noise}-{pid}).

Logic:
Waking up, a task checks for dangling folders in the upper root temporary location (configuration).
They check the folder name pattern matching (according to type) and pid existence:
If (no folder or nothing matches or name matches and pid live) then do nothing and continue with loading
If name matches and pid does not exist then clean up that folder and continue with loading

This has been implemented for svn and mercurial using a common method installed in the loader-core.