ElasticSearch Review

  1. Want exact match of string without applying analyzer: define the type to be keyword , not text
  2. Add data in bulk: curl -H "Content-Type:application/json" -XPUT 127.0.0.1:9200/_bulk?pretty --data-binary @movies.json
  3. Handle concurrency: use if_seq_no=10 or retry_on_conflict=5 for optimistic concurrency control.
  4. Choose between normalization or denormalization: normalized data minimizes the data storage and easy to change data. Denormalized data minimizes the number of queries.
  5. Define parent-child relationship:
'"mappings":{
"properties":{
"film_to_franchise":{
"type":"join",
"relations":{"franchise":"film"}
}}}}'

6. Search using parent-child relationship

'{
"query":{
"has_parent":{
"parent_type":"franchise",
"query":{
"match":{
"title":"Star Wars"
}}}}}'
'{
"query":{
"has_child":{
"type":"film",
"query":{
"match":{
"title":"The Force Awakens"
}}}}}'

7. Get cluster state

curl -H "Content-Type:application/json" -XGET "http://127.0.0.1:9200/_cluster/state?pretty=true" >> es-cluster-state.json

8. Mapping explosion & flattened datatype: all nodes have to be sync with the updated cluster state before performing basic operations like indexing and searching. This can cause memory issues within the nodes and cause delay. When an elastic search cluster crashes because of too many fields in a map we call this a mapping explosion. You can use "type":"flattened" in mapping to avoid adding new fields in mapping when the real data is actually having new fields.

9. Downside of flattened datatype: they will be treated as keyword . This means it has limited search and analyzing ability, no analyzer or tokenizer will be applied to those new fields. For example, it does not support partial match and cannot find “Bionic Beaver” if your match is “Beaver”.

10. Supported queries for flattened datatype:

  • term, terms and terms_set
  • prefix
  • range(non numerical range operations)
  • match and multi_match (we have to supply exact keywords)
  • query_string and simple_query_string
  • exists

11. Query vs Filter

  • Filter: ask yes or no questions, faster and cacheable, wrapped in "filter":{}
  • Query: return data in terms of relavance, wrapped in "query":{}
  • They can nest each other

12. Match phrase

{
"query":{
"match":{
"title":"star wars"
}}}
{
"query":{
"match_phrase":{
"title":{"query":"star wars", "slop":1}
}}}

13. Search

curl -H "Content-Type:application/json" -XGET "127.0.0.1:9200/movies/_search?q=%2Byear%3A%3E>1980+%3Btitle%3Astar%20wars&pretty"

I am a Machine Learning Engineer with special interest in mental health and finance.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store