typst_library/introspection/locator.rs
1use std::fmt::{self, Debug, Formatter};
2use std::hash::Hash;
3use std::sync::OnceLock;
4
5use comemo::{Track, Tracked};
6use rustc_hash::FxHashMap;
7use typst_syntax::Span;
8
9use crate::diag::{SourceDiagnostic, warning};
10use crate::engine::Engine;
11use crate::introspection::{History, Introspect, Introspector, Location};
12
13/// Provides locations for elements in the document.
14///
15/// A [`Location`] is a unique ID for an element generated during realization.
16///
17/// # How to use this
18/// The same content may yield different results when laid out in different
19/// parts of the document. To reflect this, every layout operation receives a
20/// locator and every layout operation requires a locator. In code:
21///
22/// - all layouters receive an owned `Locator`
23/// - all layout functions take an owned `Locator`
24///
25/// When a layouter only requires a single sublayout call, it can simply pass on
26/// its locator. When a layouter needs to call multiple sublayouters, we need to
27/// make an explicit decision:
28///
29/// - Split: When we're layouting multiple distinct children (or other pieces of
30/// content), we need to split up the locator with [`Locator::split`]. This
31/// allows us to produce multiple new `Locator`s for the sublayouts. When we
32/// split the locator, each sublocator will be a distinct entity and using it
33/// to e.g. layout the same piece of figure content will yield distinctly
34/// numbered figures.
35///
36/// - Relayout: When we're layouting the same content multiple times (e.g. when
37/// measuring something), we can call [`Locator::relayout`] to use the same
38/// locator multiple times. This indicates to the compiler that it's actually
39/// the same content. Using it to e.g. layout the same piece of figure content
40/// will yield the same figure number both times. Typically, when we layout
41/// something multiple times using `relayout`, only one of the outputs
42/// actually ends up in the document, while the other outputs are only used
43/// for measurement and then discarded.
44///
45/// The `Locator` intentionally does not implement `Copy` and `Clone` so that it
46/// can only be used once. This ensures that whenever we are layouting multiple
47/// things, we make an explicit decision whether we want to split or relayout.
48///
49/// # How it works
50/// There are two primary considerations for the assignment of locations:
51///
52/// 1. Locations should match up over multiple layout iterations, so that
53/// elements can be identified as being the same: That's the whole point of
54/// them.
55///
56/// 2. Locations should be as stable as possible across document edits, so that
57/// incremental compilation is effective.
58///
59/// 3. We want to assign them with as little long-lived state as possible to
60/// enable parallelization of the layout process.
61///
62/// Let's look at a few different assignment strategies to get a feeling for
63/// these requirements:
64///
65/// - A very simple way to generate unique IDs would be to just increase a
66/// counter for each element. In this setup, (1) is somewhat satisfied: In
67/// principle, the counter will line up across iterations, but things start to
68/// break down once we generate content dependent on introspection since the
69/// IDs generated for that new content will shift the IDs for all following
70/// elements in the document. (2) is not satisfied since an edit in the middle
71/// of the document shifts all later IDs. (3) is obviously not satisfied.
72/// Conclusion: Not great.
73///
74/// - To make things more robust, we can incorporate some stable knowledge about
75/// the element into the ID. For this, we can use the element's span since it
76/// is already mostly unique: Elements resulting from different source code
77/// locations are guaranteed to have different spans. However, we can also
78/// have multiple distinct elements generated from the same source location:
79/// e.g. `#for _ in range(5) { figure(..) }`. To handle this case, we can then
80/// disambiguate elements with the same span with an increasing counter. In
81/// this setup, (1) is mostly satisfied: Unless we do stuff like generating
82/// colliding counter updates dependent on introspection, things will line up.
83/// (2) is also reasonably well satisfied, as typical edits will only affect
84/// the single element at the currently edited span. Only if we edit inside of
85/// a function, loop, or similar construct, we will affect multiple elements.
86/// (3) is still a problem though, since we count up.
87///
88/// - What's left is to get rid of the mutable state. Note that layout is a
89/// recursive process and has a tree-shaped execution graph. Thus, we can try
90/// to determine an element's ID based on the path of execution taken in this
91/// graph. Something like "3rd element in layer 1, 7th element in layer 2,
92/// ..". This is basically the first approach, but on a per-layer basis. Thus,
93/// we can again apply our trick from the second approach, and use the span +
94/// disambiguation strategy on a per-layer basis: "1st element with span X in
95/// layer 1, 3rd element with span Y in layer 2". The chance for a collision
96/// is now pretty low and our state is wholly local to each level. So, if we
97/// want to parallelize layout within a layer, we can generate the IDs for
98/// that layer upfront and then start forking out. The final remaining
99/// question is how we can compactly encode this information: For this, as
100/// always, we use hashing! We incorporate the ID information from each layer
101/// into a single hash and thanks to the collision resistance of 128-bit
102/// SipHash, we get almost guaranteed unique locations. We don't even store
103/// the full layer information at all, but rather hash _hierarchically:_ Let
104/// `k_x` be our local per-layer ID for layer `x` and `h_x` be the full
105/// combined hash for layer `x`. We compute `h_n = hash(h_(n-1), k_n)`.
106///
107/// So that's what's going on conceptually in this type. For efficient
108/// memoization, we do all of this in a tracked fashion, such that we only
109/// observe the hash for all the layers above us, if we actually need to
110/// generate a [`Location`]. Thus, if we have a piece of content that does not
111/// contain any locatable elements, we can cache its layout even if it occurs in
112/// different places.
113///
114/// # Dealing with measurement
115/// As explained above, any kind of measurement the compiler performs requires a
116/// locator that matches the one used during real layout. This ensures that the
117/// locations assigned during measurement match up exactly with the locations of
118/// real document elements. Without this guarantee, many introspection-driven
119/// features (like counters, state, and citations) don't work correctly (since
120/// they perform queries dependent on concrete locations).
121///
122/// This is all fine and good, but things get really tricky when the _user_
123/// measures such introspecting content since the user isn't kindly managing
124/// locators for us. Our standard `Locator` workflow assigns locations that
125/// depend a lot on the exact placement in the hierarchy of elements. For this
126/// reason, something that is measured, but then placed into something like a
127/// grid will get a location influenced by the grid. Without a locator, we can't
128/// make the connection between the measured content and the real content, so we
129/// can't ensure that the locations match up.
130///
131/// One possible way to deal with this is to force the user to uniquely identify
132/// content before being measured after all. This would mean that the user needs
133/// to come up with an identifier that is unique within the surrounding context
134/// block and attach it to the content in some way. However, after careful
135/// consideration, I have concluded that this is simply too big of an ask from
136/// users: Understanding why this is even necessary is pretty complicated and
137/// how to best come up with a unique ID is even more so.
138///
139/// For this reason, I chose an alternative best-effort approach: The locator
140/// has a custom "measurement mode" (entered through [`LocatorLink::measure`]),
141/// in which it does its best to assign locations that match up. Specifically,
142/// it uses the key hashes of the individual locatable elements in the measured
143/// content (which may not be unique if content is reused) and combines them
144/// with the context's location to find the most likely matching real element.
145/// This approach works correctly almost all of the time (especially for
146/// "normal" hand-written content where the key hashes rarely collide, as
147/// opposed to code-heavy things where they do).
148///
149/// Support for enhancing this with user-provided uniqueness can still be added
150/// in the future. It will most likely anyway be added simply because it's
151/// automatically included when we add a way to "freeze" content for things like
152/// slidehows. But it will be opt-in because it's just too much complication.
153pub struct Locator<'a> {
154 /// A local hash that incorporates all layers since the last memoization
155 /// boundary.
156 local: u128,
157 /// A pointer to an outer cached locator, which contributes the information
158 /// for all the layers beyond the memoization boundary on-demand.
159 outer: Option<&'a LocatorLink<'a>>,
160}
161
162impl<'a> Locator<'a> {
163 /// Create a new root-level locator.
164 ///
165 /// Should typically only be created at the document level, though there
166 /// are a few places where we use it as well that just don't support
167 /// introspection (e.g. tilings).
168 pub fn root() -> Self {
169 Self { local: 0, outer: None }
170 }
171
172 /// Creates a new synthetic locator.
173 ///
174 /// This can be used to create a new dependent layout based on an element.
175 /// This is used for layouting footnote entries based on the location
176 /// of the associated footnote.
177 pub fn synthesize(location: Location) -> Self {
178 Self { local: location.hash(), outer: None }
179 }
180
181 /// Creates a new locator that points to the given link.
182 pub fn link(link: &'a LocatorLink<'a>) -> Self {
183 Self { local: 0, outer: Some(link) }
184 }
185}
186
187impl<'a> Locator<'a> {
188 /// Returns a type that can be used to generate `Locator`s for multiple
189 /// child elements. See the type-level docs for more details.
190 pub fn split(self) -> SplitLocator<'a> {
191 SplitLocator {
192 local: self.local,
193 outer: self.outer,
194 disambiguators: FxHashMap::default(),
195 }
196 }
197
198 /// Creates a copy of this locator for measurement or relayout of the same
199 /// content. See the type-level docs for more details.
200 ///
201 /// This is effectively just `Clone`, but the `Locator` doesn't implement
202 /// `Clone` to make this operation explicit.
203 pub fn relayout(&self) -> Self {
204 Self { local: self.local, outer: self.outer }
205 }
206}
207
208#[comemo::track]
209#[allow(clippy::needless_lifetimes)]
210impl<'a> Locator<'a> {
211 /// Resolves the locator based on its local and the outer information.
212 fn resolve(&self) -> Resolved {
213 match self.outer {
214 None => Resolved::Hash(self.local),
215 Some(outer) => match outer.resolve() {
216 Resolved::Hash(outer) => {
217 Resolved::Hash(typst_utils::hash128(&(self.local, outer)))
218 }
219 Resolved::Measure(base, span) => Resolved::Measure(base, span),
220 },
221 }
222 }
223}
224
225impl Debug for Locator<'_> {
226 fn fmt(&self, f: &mut Formatter) -> fmt::Result {
227 write!(f, "Locator({:?})", self.resolve())
228 }
229}
230
231/// The fully resolved value of a locator.
232#[derive(Debug, Copy, Clone, Hash)]
233enum Resolved {
234 /// The full hash, incorporating the local and all outer information.
235 Hash(u128),
236 /// Indicates that the locator is in measurement mode, with the given base
237 /// location.
238 Measure(Location, Span),
239}
240
241/// A type that generates unique sublocators.
242pub struct SplitLocator<'a> {
243 /// A local hash that incorporates all layers since the last memoization
244 /// boundary.
245 local: u128,
246 /// A pointer to an outer cached locator, which contributes the information
247 /// for all the layers beyond the memoization boundary on-demand.
248 outer: Option<&'a LocatorLink<'a>>,
249 /// Simply counts up the number of times we've seen each local hash.
250 disambiguators: FxHashMap<u128, usize>,
251}
252
253impl<'a> SplitLocator<'a> {
254 /// Produces a sublocator for a subtree keyed by `key`. The keys do *not*
255 /// need to be unique among the `next()` calls on this split locator. (They
256 /// can even all be `&()`.)
257 ///
258 /// However, stable & mostly unique keys lead to more stable locations
259 /// throughout edits, improving incremental compilation performance.
260 ///
261 /// A common choice for a key is the span of the content that will be
262 /// layouted with this locator.
263 pub fn next<K: Hash>(&mut self, key: &K) -> Locator<'a> {
264 self.next_inner(typst_utils::hash128(key))
265 }
266
267 /// Produces a sublocator for a subtree.
268 pub fn next_inner(&mut self, key: u128) -> Locator<'a> {
269 // Produce a locator disambiguator, for elements with the same key
270 // within this `SplitLocator`.
271 let disambiguator = {
272 let slot = self.disambiguators.entry(key).or_default();
273 std::mem::replace(slot, *slot + 1)
274 };
275
276 // Combine the key, disambiguator and local hash into a sub-local hash.
277 // The outer information is not yet merged into this, it is added
278 // on-demand in `Locator::resolve`.
279 let local = typst_utils::hash128(&(key, disambiguator, self.local));
280
281 Locator { outer: self.outer, local }
282 }
283
284 /// Produces a unique location for an element.
285 pub fn next_location(
286 &mut self,
287 engine: &mut Engine,
288 key: u128,
289 elem_span: Span,
290 ) -> Location {
291 match self.next_inner(key).resolve() {
292 Resolved::Hash(hash) => Location::new(hash),
293 Resolved::Measure(base, measure_span) => {
294 let introspection =
295 MeasureIntrospection { key, base, elem_span, measure_span };
296
297 // If we aren't able to find a matching element in the document,
298 // default to the base location, so that it's at least remotely
299 // in the right area (so that counters can be resolved).
300 engine.introspect(introspection).unwrap_or(base)
301 }
302 }
303 }
304}
305
306/// Tries to find the closest matching element in the document during
307/// measurement.
308#[derive(Debug, Clone, PartialEq, Hash)]
309struct MeasureIntrospection {
310 key: u128,
311 base: Location,
312 measure_span: Span,
313 elem_span: Span,
314}
315
316impl Introspect for MeasureIntrospection {
317 type Output = Option<Location>;
318
319 fn introspect(
320 &self,
321 _: &mut Engine,
322 introspector: Tracked<dyn Introspector + '_>,
323 ) -> Self::Output {
324 introspector.locator(self.key, self.base)
325 }
326
327 fn diagnose(&self, _: &History<Self::Output>) -> SourceDiagnostic {
328 let mut diag = warning!(
329 self.measure_span, "a measured element did not stabilize";
330 hint: "measurement tries to resolve introspections by finding the \
331 closest matching elements in the real document";
332 );
333 if !self.elem_span.is_detached() {
334 diag.spanned_hint(
335 "the closest match for this element did not stabilize",
336 self.elem_span,
337 );
338 }
339 diag
340 }
341}
342
343/// A locator can be linked to this type to only access information across the
344/// memoization boundary on-demand, improving the cache hit chance.
345pub struct LocatorLink<'a> {
346 /// The link itself.
347 kind: LinkKind<'a>,
348 /// The cached resolved link.
349 resolved: OnceLock<Resolved>,
350}
351
352/// The different kinds of locator links.
353enum LinkKind<'a> {
354 /// An outer `Locator`, which we can resolved if necessary.
355 ///
356 /// We need to override the constraint's lifetime here so that `Tracked` is
357 /// covariant over the constraint. If it becomes invariant, we're in for a
358 /// world of lifetime pain.
359 Outer(Tracked<'a, Locator<'a>, <Locator<'static> as Track>::Call>),
360 /// A link which indicates that we are in measurement mode.
361 Measure(Location, Span),
362}
363
364impl<'a> LocatorLink<'a> {
365 /// Create a locator link.
366 pub fn new(outer: Tracked<'a, Locator<'a>>) -> Self {
367 LocatorLink {
368 kind: LinkKind::Outer(outer),
369 resolved: OnceLock::new(),
370 }
371 }
372
373 /// Creates a link that puts any linked downstream locator into measurement
374 /// mode.
375 ///
376 /// Read the "Dealing with measurement" section of the [`Locator`] docs for
377 /// more details.
378 pub fn measure(base: Location, span: Span) -> Self {
379 LocatorLink {
380 kind: LinkKind::Measure(base, span),
381 resolved: OnceLock::new(),
382 }
383 }
384
385 /// Resolve the link.
386 ///
387 /// The result is cached in this link, so that we don't traverse the link
388 /// chain over and over again.
389 fn resolve(&self) -> Resolved {
390 *self.resolved.get_or_init(|| match self.kind {
391 LinkKind::Outer(outer) => outer.resolve(),
392 LinkKind::Measure(base, span) => Resolved::Measure(base, span),
393 })
394 }
395}