Page MenuHomeSoftware Heritage

Metadata search is failing due to a boolean field in the mapping of the metadata fields
Closed, MigratedEdits Locked

Description

After having ingested the indexed metadata on the production cluster, there is an issue when a search is done :

This is the request executed on ES:

{"query":{"bool":{"must":[{"nested":{"path":"intrinsic_metadata","query":{"multi_match":{"query":"Barkovsky","type":"cross_fields","operator":"and","fields":["intrinsic_metadata.*"]}}}}],"must_not":[{"term":{"blocklisted":true}}]}},"sort":[{"_score":"desc"},{"sha1":"asc"}]}

This is the error:

Error 400.
{"error":{"root_cause":[{"type":"query_shard_exception","reason":"failed to create query: Can't parse boolean value [Barkovsky], expected [true] or [false]","index_uuid":"hZfuv0lVRImjOjO_rYgDzg","index":"origin-production"}],"type":"search_phase_execution_exception","reason":"all shards failed","phase":"query","grouped":true,"failed_shards":[{"shard":0,"index":"origin-production","node":"aWtTAVJtTf2Pw8xLPGQt3w","reason":{"type":"query_shard_exception","reason":"failed to create query: Can't parse boolean value [Barkovsky], expected [true] or [false]","index_uuid":"hZfuv0lVRImjOjO_rYgDzg","index":"origin-production","caused_by":{"type":"illegal_argument_exception","reason":"Can't parse boolean value [Barkovsky], expected [true] or [false]"}}}]},"status":400}

The same query is working in staging so a mapping issue is probably at the origin of the problem

Event Timeline

vsellier changed the task status from Open to Work in Progress.Jun 11 2021, 10:29 AM
vsellier triaged this task as Normal priority.
vsellier created this task.

This is the mappings of the staging and production environments.
As expected, the production mapping has more fields than the staging one.

There is only one boolean field in the production mapping, so I suppose it's the problem:

jq '."origin-production".mappings.properties.intrinsic_metadata.properties."http://schema".properties."org/isAccessibleForFree"' /tmp/mapping_production.json      
{
  "properties": {
    "@value": {
      "type": "boolean"
    }
  }
}

vsellier renamed this task from Metadata search error.... to Metadata search is failing due to a boolean field in the mapping of the metadata fields.Jun 11 2021, 10:51 AM

The fix is released in version 'v0.9.0'.
I will deploy it on staging and launch a complete reindexation of the metadata (+origin but needed by other changes on this release)

The metadata indexation is finished, https://webapp1.internal.softwareheritage.org can now search on them via elasticsearch without any issue.
Let's now wait for the lag on origin* topics to recover: https://grafana.softwareheritage.org/goto/iOvBK6gnk?orgId=1

The lag has recovered, the search on webapp1[1] is now fully up-to-date and can be tested before changing the configuration on the main webapp.

[1] https://webapp1.internal.softwareheritage.org/

there are no more errors. The fix will deployed in production with the deployment of swh-search:v0.11.0 (T3433)