The Common Used Elasticsearch Actions with Python Client

Yoshi Gao included in Software Engineer Notes

2023-02-12 515 words 3 minutes

Contents

Query Types

match: Compare the similarity between the requested sentence and text field in the documents after each is processed by the text analyzer.
term: Match the keyword field in the documents and the matched document must has the same content exactly.

Document in Elasticsearch

Property

Property specify for the name of the objects with the field type inside the document.

Field Type

text: The text field would be analyzed into the word vector and searched by match query
keyword: Used on the term query which means the searched keyword should match the request sentence exactly.

Setting Up the index and property with field mapping

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
es_client: Elasticsearch # Need to create the instance

es_client.indices.create(index="<INDEX_NAME>")
es_client.indices.put_mapping(
    index="<INDEX_NAME>",
    body={
        "properties": {
            "<OBJECT_NAME_1>": {
                "type": "text",
            },
            "<OBJECT_NAME_2>": {
                "type": "text",
                "fields": {"keyword": {"type": "keyword"}},
            },
            "<OBJECT_NAME_3>": {
                "type": "nested", # Nested object
                "properties": {
                    "<INNER_OBJECT_NAME_1>": {"type": "keyword"}
                }
            }
        }
    }
)

Bulky create the documents

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
import uuid
from elasticsearch.helpers import parallel_bulk

def gen_es_bulk_create_documents(to_create_docs: List[Dict]) -> Generator:
    for doc in to_create_docs:
        bulk_document = {
            "_op_type": "create",
            "_index": "<INDEX_NAME>",
            "_source": doc,
            "_id": uuid.uuid4()
        }

        yield bulk_document

es_client: Elasticsearch # Need to create the instance
to_create_docs: List[Dict] # Need to create the instance

for success, info in parallel_bulk(
    client=es_client,
    actions=gen_es_bulk_create_documents(to_create_docs)
):
    if not success:
        logger.error("Elasticsearch bulk create error: " + str(info))

Search the documents with boolean query

must: The clause (query) must appear in matching documents and will contribute to the score.
filter: The clause (query) must appear in matching documents. However unlike must the score of the query will be ignored.
should: The clause (query) should appear in the matching document and will contribute to the score.
must_not: The clause (query) must not appear in the matching documents.

Note: Boolean query is a greedy matching, so you better combine must with should to provide the final score for each documents.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
es_client: Elasticsearch # Need to create the instance

query_body = {
  "query": {
    "bool": {
      "must": ["match": {"<FIELD>": "<MATCH_VALUE>"}],
      "filter": ["term": {"<FIELD>": "<MATCH_VALUE>"}]
    }
  },
  "size": 5
}

es_client.search(
	body=query_body,
	index="<INDEX_NAME>"
)

Multi-search

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
es_client: Elasticsearch # Need to create the instance

es_query_bodies = [
  {"index": "<INDEX_NAME>"},
  {
    "query": {
      "bool": {
        "must": ["match": {"<FIELD>": "<MATCH_VALUE>"}],
        "filter": ["term": {"<FIELD>": "<MATCH_VALUE>"}]
      }
    },
    "size": 5
  },
  {"index": "<INDEX_NAME>"},
  {
    "query": ...
  },
  ...
]

response = es_client.msearch(es_query_bodies, "<INDEX_NAME>")

Delete by query

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
es_client: Elasticsearch # Need to create the instance

query_body = {
  "query": {
    "bool": {
      "must": ["match": {"<FIELD>": "<MATCH_VALUE>"}],
      "filter": ["term": {"<FIELD>": "<MATCH_VALUE>"}]
    }
  },
  "size": 5
}

es_client.delete_by_query(index="<INDEX_NAME>", body=query_body)

Reference

Boolean Query