1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
// SPDX-License-Identifier: Apache-2.0
use io;
use MemSize;
use crateToken;
use crateField;
use crateSegmentAccumulator;
use crateSegmentContext;
/// Indicates whether a consumer wants to receive tokens for a field.
///
/// Returned by [`FieldConsumer::start_field`] so the worker knows
/// which consumers to include in the token loop.
/// Lifecycle trait for processing document data during indexing.
///
/// Each consumer handles one aspect of field data (postings, stored
/// fields, norms, term vectors, doc values, points). The SegmentWorker
/// calls these methods in order for every document — the consumer
/// decides internally whether to act on a given field or ignore it.
///
/// This trait is the core of the indexing pipeline. All data flows
/// through these lifecycle methods.
///
/// # Segment accumulator
///
/// Methods that accumulate data receive `&mut SegmentAccumulator` —
/// the shared state owned by the worker. This includes memory pools
/// for data accumulation and cross-consumer metadata (e.g., field
/// properties discovered during processing). Only one consumer
/// borrows the accumulator at a time; the worker passes it
/// sequentially.
///
/// # Call sequence per document
///
/// ```text
/// start_document(doc_id)
/// for each field:
/// interest = start_field(field_id, field, &mut accumulator)
/// if tokenized and interest == WantsTokens:
/// for each token from analyzer:
/// add_token(field_id, field, token, &mut accumulator)
/// finish_field(field_id, field, &mut accumulator)
/// finish_document(doc_id, &mut accumulator, &context)
/// ```
///
/// # Flush
///
/// After one or more documents have been processed, `flush` is called
/// to write accumulated data to segment files. Consumers are flushed
/// in the order they appear in the worker's consumer list. This order
/// matters — some consumers may read files written by earlier consumers
/// during their own flush. The consumer is then dropped along with
/// the worker.