The Common Used Elasticsearch Actions with Python Client

Contents
  • match: Compare the similarity between the requested sentence and text field in the documents after each is processed by the text analyzer.

  • term: Match the keyword field in the documents and the matched document must has the same content exactly.

Property specify for the name of the objects with the field type inside the document.

  • text: The text field would be analyzed into the word vector and searched by match query

  • keyword: Used on the term query which means the searched keyword should match the request sentence exactly.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
es_client: Elasticsearch # Need to create the instance

es_client.indices.create(index="<INDEX_NAME>")
es_client.indices.put_mapping(
    index="<INDEX_NAME>",
    body={
        "properties": {
            "<OBJECT_NAME_1>": {
                "type": "text",
            },
            "<OBJECT_NAME_2>": {
                "type": "text",
                "fields": {"keyword": {"type": "keyword"}},
            },
            "<OBJECT_NAME_3>": {
                "type": "nested", # Nested object
                "properties": {
                    "<INNER_OBJECT_NAME_1>": {"type": "keyword"}
                }
            }
        }
    }
)
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
import uuid
from elasticsearch.helpers import parallel_bulk

def gen_es_bulk_create_documents(to_create_docs: List[Dict]) -> Generator:
    for doc in to_create_docs:
        bulk_document = {
            "_op_type": "create",
            "_index": "<INDEX_NAME>",
            "_source": doc,
            "_id": uuid.uuid4()
        }

        yield bulk_document

es_client: Elasticsearch # Need to create the instance
to_create_docs: List[Dict] # Need to create the instance

for success, info in parallel_bulk(
    client=es_client,
    actions=gen_es_bulk_create_documents(to_create_docs)
):
    if not success:
        logger.error("Elasticsearch bulk create error: " + str(info))
  • must: The clause (query) must appear in matching documents and will contribute to the score.

  • filter: The clause (query) must appear in matching documents. However unlike must the score of the query will be ignored.

  • should: The clause (query) should appear in the matching document and will contribute to the score.

  • must_not: The clause (query) must not appear in the matching documents.

Note: Boolean query is a greedy matching, so you better combine must with should to provide the final score for each documents.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
es_client: Elasticsearch # Need to create the instance

query_body = {
  "query": {
    "bool": {
      "must": ["match": {"<FIELD>": "<MATCH_VALUE>"}],
      "filter": ["term": {"<FIELD>": "<MATCH_VALUE>"}]
    }
  },
  "size": 5
}

es_client.search(
	body=query_body,
	index="<INDEX_NAME>"
)
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
es_client: Elasticsearch # Need to create the instance

es_query_bodies = [
  {"index": "<INDEX_NAME>"},
  {
    "query": {
      "bool": {
        "must": ["match": {"<FIELD>": "<MATCH_VALUE>"}],
        "filter": ["term": {"<FIELD>": "<MATCH_VALUE>"}]
      }
    },
    "size": 5
  },
  {"index": "<INDEX_NAME>"},
  {
    "query": ...
  },
  ...
]

response = es_client.msearch(es_query_bodies, "<INDEX_NAME>")
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
es_client: Elasticsearch # Need to create the instance

query_body = {
  "query": {
    "bool": {
      "must": ["match": {"<FIELD>": "<MATCH_VALUE>"}],
      "filter": ["term": {"<FIELD>": "<MATCH_VALUE>"}]
    }
  },
  "size": 5
}

es_client.delete_by_query(index="<INDEX_NAME>", body=query_body)

Boolean Query