Page MenuHomeSoftware Heritage

Content archiver director tests failure
Closed, MigratedEdits Locked

Description

Hi,

While reviewing D40, I have noticed that the following tests for the content archiver fail:

======================================================================
ERROR: A content missing with enough copies shouldn't be archived.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/ndandrim/work/swh-environment/swh-storage/swh/storage/tests/test_archiver.py", line 143, in archive_already_enough
    director.run()
  File "/home/ndandrim/work/swh-environment/swh-storage/swh/storage/archiver/director.py", line 107, in run
    run_fn(batch)
  File "/home/ndandrim/work/swh-environment/swh-storage/swh/storage/archiver/director.py", line 119, in run_sync_worker
    task = app.tasks[task_name]
  File "/usr/lib/python3/dist-packages/celery/app/registry.py", line 26, in __missing__
    raise self.NotRegistered(key)
celery.exceptions.NotRegistered: 'swh.storage.archiver.tasks.SWHArchiverTask'

======================================================================
ERROR: Run archiver on a missing content should archive it.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/ndandrim/work/swh-environment/swh-storage/swh/storage/tests/test_archiver.py", line 123, in archive_missing_content
    self.archiver.run()
  File "/home/ndandrim/work/swh-environment/swh-storage/swh/storage/archiver/director.py", line 107, in run
    run_fn(batch)
  File "/home/ndandrim/work/swh-environment/swh-storage/swh/storage/archiver/director.py", line 119, in run_sync_worker
    task = app.tasks[task_name]
  File "/usr/lib/python3/dist-packages/celery/app/registry.py", line 26, in __missing__
    raise self.NotRegistered(key)
celery.exceptions.NotRegistered: 'swh.storage.archiver.tasks.SWHArchiverTask'

======================================================================
ERROR: A content that is not 'missing' shouldn't be archived.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/ndandrim/work/swh-environment/swh-storage/swh/storage/tests/test_archiver.py", line 133, in archive_present_content
    self.archiver.run()
  File "/home/ndandrim/work/swh-environment/swh-storage/swh/storage/archiver/director.py", line 107, in run
    run_fn(batch)
  File "/home/ndandrim/work/swh-environment/swh-storage/swh/storage/archiver/director.py", line 119, in run_sync_worker
    task = app.tasks[task_name]
  File "/usr/lib/python3/dist-packages/celery/app/registry.py", line 26, in __missing__
    raise self.NotRegistered(key)
celery.exceptions.NotRegistered: 'swh.storage.archiver.tasks.SWHArchiverTask'

----------------------------------------------------------------------

Nothing has jumped to my eyes in the director code (the tasks module is properly imported) so I don't really know what goes wrong.

Event Timeline

I can't recreate the same bug, when I run the archiver test in asynchronous mode.
Maybe this is a celery configuration issue ? Seems like the task is not registered into a worker.

@qcampos: I'm running make test in the swh-storage repo. I believe @jbertran had the same test failures when we looked at it yesterday.

Definitely a celery issue : make test run without any error on my side when tests runs in synchronous mode (using run_sync_worker).

A content missing with enough copies shouldn't be archived. ... ok
Run archiver on a missing content should archive it. ... ok
A content that is not 'missing' shouldn't be archived. ... ok

Also, I noticed that the docstring of a nosetest function is used not only when the test fail, to explain the expected behavior, but also when it succeed, replacing it's name in the list. Don't know if thats good or not, but that don't match the other tests.
Should I change those docstrings into simple comments ?

@olasd I don't reproduce this.
make test is ok on my side.
(I'm on the last commit of every repository from swh-environment.)

In T436#6866, @qcampos wrote:

Definitely a celery issue : make test run without any error on my side when tests runs in synchronous mode (using run_sync_worker).

What does "when tests run in synchronous mode" mean? Is that a configuration variable? Where is it supposed to be set?

Definitely a celery issue :

I'm sorry, I don't follow, what makes you say that?

make test run without any error on my side when tests runs in synchronous mode (using run_sync_worker).

Please, @qcampos, refresh me on this, how do we make test in async mode?
Ah yes, adding the 'asynchronous' flag in some .ini file... I don't remember which one though...
(can you please refresh me on this as well?)

FWIW, regarding swh-storage, i have only ~/.swh/storage.ini on my side.
With the bare minimum:

[main]
db=service=swh-dev
storage_base=/home/storage/swh-storage/

So the configuration used for the test, should be the defaulted one embedded in the archiver.
Or am i wrong somewhere?

Also, I noticed that the docstring of a nosetest function is used not only when the test fail, to explain the expected behavior, but also when it succeed, replacing it's name in the list. Don't know if thats good or not, but that don't match the other tests.
Should I change those docstrings into simple comments ?

I, for one, did not know about that. This is nice to explain the intent of your tests.
Also, this looks more like the way forward. So, I'd say let them like this ^^

What does "when tests run in synchronous mode" mean?

It's the default, it's a while True that runs indefinitely.
Which can be avoided by using celery in asynchronous mode.

https://forge.softwareheritage.org/diffusion/DSTO/browse/master/swh/storage/archiver/director.py;363f3e6bcfce6488e505ca8949fb67b36a7bfa8f$94

Is that a configuration variable?

yes, 'asynchronous'.

Where is it supposed to be set?

I don't remember...
Ah, yes, you pass the configuration file to the archiver's cli.
https://forge.softwareheritage.org/diffusion/DSTO/browse/master/swh/storage/archiver/director.py;363f3e6bcfce6488e505ca8949fb67b36a7bfa8f$222

In T436#6870, @olasd wrote:

What does "when tests run in synchronous mode" mean? Is that a configuration variable? Where is it supposed to be set?

In T436#6871, @ardumont wrote:

make test run without any error on my side when tests runs in synchronous mode (using run_sync_worker).

Please, @qcampos, refresh me on this, how do we make test in async mode?
Ah yes, adding the 'asynchronous' flag in some .ini file... I don't remember which one though...
(can you please refresh me on this as well?)

[...]

FWIW, regarding swh-storage, i have only ~/.swh/storage.ini on my side.
With the bare minimum:

[main]
db=service=swh-dev
storage_base=/home/storage/swh-storage/

So the configuration used for the test, should be the defaulted one embedded in the archiver.
Or am i wrong somewhere?

This can be set in the archiver config file. However, this is just a flag in the archiver director constructor that is set to false to allow the tests to run without celery.
In the error @olasd copy-pasted, the tests run in synchronous mode, that means the task-whole-thing is used, but not the celery queue.

Could you please try to test when you move all the softwareheritage-related configs away?

This can be set in the archiver config file.
However, this is just a flag in the archiver director constructor that is set to false to allow the tests to run without celery.

yub yub.
Thanks for the refresher ^^

In the error @olasd copy-pasted, the tests run in synchronous mode, that means the task-whole-thing is used, but not the celery queue.

Yes, when i reviewed your code, we agreed to make the tests run in sync mode to avoid having to deal with queues in the test context. It's even commented ^^

The question is, how come we have diverging behavior?

Could you please try to test when you move all the softwareheritage-related configs away?

running... and still ok.

I don't know if it's related, but I got a segmentation fault when running the archiver's tests in asynchronous mode.

celery@grand-palais ready.
[2016-06-13 14:58:36,858: INFO/MainProcess] Received task: swh.storage.archiver.tasks.SWHArchiverTask[ad7b8191-a2cb-4281-aaf1-694358d8d520]
Segmentation fault

The test report show an error 400 for the request

requests.packages.urllib3.connectionpool: INFO: Starting new HTTP connection (1): 127.0.0.1
requests.packages.urllib3.connectionpool: DEBUG: "POST /content/get HTTP/1.1" 400 71

However

  • Running the tests in synchronous mode works
  • Running tests/manual_test_archive.py in asynchronous mode (which is exactly the same as the test that fails : run asynchronously the archiver on a batch of content to a remote storage) works as well.

Don't know what's going on. I thought it might be a problem with the background storage launched for the tests, and a small delay for the task to be executed, but adding a retry loop with timeout around the get that cause the error 400 didn't solve it.

Edit: Seems that the segmentation fault was not related to the error, as it occurs even when all the test succeed (The test that expect the archiver to ignore a file succeed, because celery had a segfault and didn't archive it, but that's not an error during the test that caused celery to crash).

I really don't understand those two error.

Gah. ._. I had some local changes in swh-scheduler and the tasks wouldn't be registered in celery. Sorry for the hassle...