Query Types
match: Compare the similarity between the requested sentence and text field in the documents after each is processed
by the text analyzer.
term: Match the keyword field in the documents and the matched document must has the same content exactly.
Document in Elasticsearch
Property
Property specify for the name of the objects with the field type inside the document.
Field Type
Setting Up the index and property with field mapping
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
| es_client: Elasticsearch # Need to create the instance
es_client.indices.create(index="<INDEX_NAME>")
es_client.indices.put_mapping(
index="<INDEX_NAME>",
body={
"properties": {
"<OBJECT_NAME_1>": {
"type": "text",
},
"<OBJECT_NAME_2>": {
"type": "text",
"fields": {"keyword": {"type": "keyword"}},
},
"<OBJECT_NAME_3>": {
"type": "nested", # Nested object
"properties": {
"<INNER_OBJECT_NAME_1>": {"type": "keyword"}
}
}
}
}
)
|
Bulky create the documents
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
| import uuid
from elasticsearch.helpers import parallel_bulk
def gen_es_bulk_create_documents(to_create_docs: List[Dict]) -> Generator:
for doc in to_create_docs:
bulk_document = {
"_op_type": "create",
"_index": "<INDEX_NAME>",
"_source": doc,
"_id": uuid.uuid4()
}
yield bulk_document
es_client: Elasticsearch # Need to create the instance
to_create_docs: List[Dict] # Need to create the instance
for success, info in parallel_bulk(
client=es_client,
actions=gen_es_bulk_create_documents(to_create_docs)
):
if not success:
logger.error("Elasticsearch bulk create error: " + str(info))
|
Search the documents with boolean query
must: The clause (query) must
appear in matching documents and will contribute to the score
.
filter: The clause (query) must
appear in matching documents. However unlike must the score of the query will be ignored
.
should: The clause (query) should
appear in the matching document and will contribute to the score
.
must_not: The clause (query) must not
appear in the matching documents.
Note: Boolean query is a greedy matching, so you better combine must
with should
to provide the final score for
each documents.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
| es_client: Elasticsearch # Need to create the instance
query_body = {
"query": {
"bool": {
"must": ["match": {"<FIELD>": "<MATCH_VALUE>"}],
"filter": ["term": {"<FIELD>": "<MATCH_VALUE>"}]
}
},
"size": 5
}
es_client.search(
body=query_body,
index="<INDEX_NAME>"
)
|
Multi-search
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
| es_client: Elasticsearch # Need to create the instance
es_query_bodies = [
{"index": "<INDEX_NAME>"},
{
"query": {
"bool": {
"must": ["match": {"<FIELD>": "<MATCH_VALUE>"}],
"filter": ["term": {"<FIELD>": "<MATCH_VALUE>"}]
}
},
"size": 5
},
{"index": "<INDEX_NAME>"},
{
"query": ...
},
...
]
response = es_client.msearch(es_query_bodies, "<INDEX_NAME>")
|
Delete by query
1
2
3
4
5
6
7
8
9
10
11
12
13
| es_client: Elasticsearch # Need to create the instance
query_body = {
"query": {
"bool": {
"must": ["match": {"<FIELD>": "<MATCH_VALUE>"}],
"filter": ["term": {"<FIELD>": "<MATCH_VALUE>"}]
}
},
"size": 5
}
es_client.delete_by_query(index="<INDEX_NAME>", body=query_body)
|
Reference
Boolean Query