ElasticSearch Review

2 min readJan 14, 2021

Want exact match of string without applying analyzer: define the type to be keyword , not text
Add data in bulk: curl -H "Content-Type:application/json" -XPUT 127.0.0.1:9200/_bulk?pretty --data-binary @movies.json
Handle concurrency: use if_seq_no=10 or retry_on_conflict=5 for optimistic concurrency control.
Choose between normalization or denormalization: normalized data minimizes the data storage and easy to change data. Denormalized data minimizes the number of queries.
Define parent-child relationship:

'"mappings":{
    "properties":{
        "film_to_franchise":{
             "type":"join",
             "relations":{"franchise":"film"}
}}}}'

6. Search using parent-child relationship

'{
"query":{
    "has_parent":{
        "parent_type":"franchise",
        "query":{
           "match":{
               "title":"Star Wars"
}}}}}''{
"query":{
    "has_child":{
        "type":"film",
        "query":{
            "match":{
                "title":"The Force Awakens"
}}}}}'

7. Get cluster state

curl -H "Content-Type:application/json" -XGET "http://127.0.0.1:9200/_cluster/state?pretty=true" >> es-cluster-state.json

8. Mapping explosion & flattened datatype: all nodes have to be sync with the updated cluster state before performing basic operations like indexing and searching. This can cause memory issues within the nodes and cause delay. When an elastic search cluster crashes because of too many fields in a map we call this a mapping explosion. You can use "type":"flattened" in mapping to avoid adding new fields in mapping when the real data is actually having new fields.

9. Downside of flattened datatype: they will be treated as keyword . This means it has limited search and analyzing ability, no analyzer or tokenizer will be applied to those new fields. For example, it does not support partial match and cannot find “Bionic Beaver” if your match is “Beaver”.

10. Supported queries for flattened datatype:

term, terms and terms_set
prefix
range(non numerical range operations)
match and multi_match (we have to supply exact keywords)
query_string and simple_query_string
exists

11. Query vs Filter

Filter: ask yes or no questions, faster and cacheable, wrapped in "filter":{}
Query: return data in terms of relavance, wrapped in "query":{}
They can nest each other

12. Match phrase

{
"query":{
    "match":{
        "title":"star wars"
}}}{
"query":{
    "match_phrase":{
        "title":{"query":"star wars", "slop":1}
}}}

13. Search

curl -H "Content-Type:application/json" -XGET "127.0.0.1:9200/movies/_search?q=%2Byear%3A%3E>1980+%3Btitle%3Astar%20wars&pretty"

ElasticSearch Review

Written by Moon