Page MenuHomeSoftware Heritage

docker: Use swh-search with memory backend as default search engine
ClosedPublic

Authored by anlambert on Sep 15 2022, 3:41 PM.

Details

Summary

In order to have a search behavior close to the production one, add services
related to archive search in the default docker-compose.yml file:

  • swh-search: remote search service using a memory backend
  • swh-search-journal-client-objects: journal client feeding swh-search with objects loaded into the archive (typically origins)
  • swh-search-journal-client-indexed: journal client feeding swh-search with results of swh-indexer processing on archived objects

As a consequence, it simplifies the docker-compose.search.yml file enabling
to use a swh-search service with a more costly elasticsearch backend.

Related to D8485
Related to D8487

Diff Detail

Repository
rDENV Development environment
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

with a more costly

i think you meant *without* ;)

with a more costly

i think you meant *without* ;)

Nope, when using the docker-compose.search.yml, an elasticsearch service is created, which is clearly more costly than a memory backend.

ok

I guess the memory implementation is enough to reproduce the same behavior as the actual search service.

This revision is now accepted and ready to land.Sep 15 2022, 3:46 PM

with a more costly

i think you meant *without* ;)

Nope, when using the docker-compose.search.yml, an elasticsearch service is created, which is clearly more costly than a memory backend.

ah yeah, ok got it.

By default, we are not using that file, hence why you integrated that search rpc service part inside the main docker-compose.
With this ^, We are using the default memory backend.

But if we want to use a real elasticsearch backend, we still can by using -f docker-compose.search.yml which will use a real es container (and backend).

ok

I guess the memory implementation is enough to reproduce the same behavior as the actual search service.

Yes, we took care of it with @vlorentz.

The cool thing of using a swh-search service as default is that list of visit types for loaded
origins are automatically updated in the webapp (well once D8487 got accepted and landed).

Also it should help to spot some bugs related to archive search as related services are now included by default..

with a more costly

i think you meant *without* ;)

Nope, when using the docker-compose.search.yml, an elasticsearch service is created, which is clearly more costly than a memory backend.

ah yeah, ok got it.

By default, we are not using that file, hence why you integrated that search rpc service part inside the main docker-compose.
With this ^, We are using the default memory backend.

But if we want to use a real elasticsearch backend, we still can by using -f docker-compose.search.yml which will use a real es container (and backend).

Exactly !

I hope the memory usage won't be an issue :/

I hope the memory usage won't be an issue :/

Our docker environment already requires a decent amount of memory to be executed so this should be fine.