Expand description
Highlighters enable you to get highlighted snippets from one or more fields in your search results so you can show users where the query matches are.
When you request highlights, the
response contains an additional highlight
element for each search hit that includes the
highlighted fields and the highlighted fragments.
§Offsets Strategy
To create meaningful search snippets from the terms being queried, the highlighter needs to know the start and end character offsets of each word in the original text. These offsets can be obtained from:
- The postings list. If
index_options
is set tooffsets
in the mapping, theunified
highlighter uses this information to highlight documents without re-analyzing the text. It re-runs the original query directly on the postings and extracts the matching offsets from the index, limiting the collection to the highlighted documents. This is important if you have large fields because it doesn’t require reanalyzing the text to be highlighted. It also requires less disk space than usingterm_vectors
. - Term vectors. If
term_vector
information is provided by settingterm_vector
towith_positions_offsets
in the mapping, theunified
highlighter automatically uses theterm_vector
to highlight the field. It’s fast especially for large fields (>1MB
) and for highlighting multi-term queries likeprefix
orwildcard
because it can access the dictionary of terms for each document. Thefvh
highlighter always uses term vectors. - Plain highlighting. This mode is used by the
unified
when there is no other alternative. It creates a tiny in-memory index and re-runs the original query criteria through Lucene’s query execution planner to get access to low-level match information on the current document. This is repeated for every field and every document that needs highlighting. Theplain
highlighter always uses plain highlighting.
Warning
Plain highlighting for large texts may require substantial amount of time and memory. To protect against this, the maximum number of text characters that will be analyzed has been limited to 1000000. This default limit can be changed for a particular index with the index settingindex.highlight.max_analyzed_offset
.
https://www.elastic.co/guide/en/elasticsearch/reference/current/highlighting.html
Structs§
- Default
Highlighter - Highlighting settings can be set on a global level and overridden at the field level
- Fast
Vector Highlighter - The
fvh
highlighter uses the Lucene Fast Vector highlighter. This highlighter can be used on fields withterm_vector
set towith_positions_offsets
in the mapping. The fast vector highlighter: - Highlight
- Highlight structure
- Matched
Fields - Reexports Matched fields logic with type conversions
- Plain
Highlighter - The
plain
highlighter uses the standard Lucene highlighter. It attempts to reflect the query matching logic in terms of understanding word importance and any word positioning criteria in phrase queries. - PrePost
Tags - Contains
pre_tags
andpost_tags
highlighting values - Unified
Highlighter - The
unified
highlighter uses the Lucene Unified Highlighter. This highlighter breaks the text into sentences and uses the BM25 algorithm to score individual sentences as if they were documents in the corpus. It also supports accurate phrase and multi-term (fuzzy, prefix, regex) highlighting. This is the default highlighter.
Enums§
- Encoder
- Indicates if the snippet should be HTML encoded.
- Fragmenter
- Specifies how text should be broken up in highlight snippets.
- FvhBoundary
Scanner - Specifies how to break the highlighted fragments. Defaults to
sentence
. - Highlighter
- Highlighter settings
- Order
- Sorts highlighted fragments by score when set to
score
. By default, fragments will be output in the order they appear in the field (order:none
). Setting this option toscore
will output the most relevant fragments first. Each highlighter applies its own logic to compute relevancy scores. See the document How highlighters work internally for more details how different highlighters find the best fragments. - Tags
- Set to
styled
to use the built-in tag schema or use custom tags - Unified
Boundary Scanner - Specifies how to break the highlighted fragments. Defaults to
sentence
.