typst_library/introspection/
locator.rs

1use std::fmt::{self, Debug, Formatter};
2use std::hash::Hash;
3use std::sync::OnceLock;
4
5use comemo::{Track, Tracked};
6use rustc_hash::FxHashMap;
7
8use crate::introspection::{Introspector, Location};
9
10/// Provides locations for elements in the document.
11///
12/// A [`Location`] is a unique ID for an element generated during realization.
13///
14/// # How to use this
15/// The same content may yield different results when laid out in different
16/// parts of the document. To reflect this, every layout operation receives a
17/// locator and every layout operation requires a locator. In code:
18///
19/// - all layouters receive an owned `Locator`
20/// - all layout functions take an owned `Locator`
21///
22/// When a layouter only requires a single sublayout call, it can simply pass on
23/// its locator. When a layouter needs to call multiple sublayouters, we need to
24/// make an explicit decision:
25///
26/// - Split: When we're layouting multiple distinct children (or other pieces of
27///   content), we need to split up the locator with [`Locator::split`]. This
28///   allows us to produce multiple new `Locator`s for the sublayouts. When we
29///   split the locator, each sublocator will be a distinct entity and using it
30///   to e.g. layout the same piece of figure content will yield distinctly
31///   numbered figures.
32///
33/// - Relayout: When we're layouting the same content multiple times (e.g. when
34///   measuring something), we can call [`Locator::relayout`] to use the same
35///   locator multiple times. This indicates to the compiler that it's actually
36///   the same content. Using it to e.g. layout the same piece of figure content
37///   will yield the same figure number both times. Typically, when we layout
38///   something multiple times using `relayout`, only one of the outputs
39///   actually ends up in the document, while the other outputs are only used
40///   for measurement and then discarded.
41///
42/// The `Locator` intentionally does not implement `Copy` and `Clone` so that it
43/// can only be used once. This ensures that whenever we are layouting multiple
44/// things, we make an explicit decision whether we want to split or relayout.
45///
46/// # How it works
47/// There are two primary considerations for the assignment of locations:
48///
49/// 1. Locations should match up over multiple layout iterations, so that
50///    elements can be identified as being the same: That's the whole point of
51///    them.
52///
53/// 2. Locations should be as stable as possible across document edits, so that
54///    incremental compilation is effective.
55///
56/// 3. We want to assign them with as little long-lived state as possible to
57///    enable parallelization of the layout process.
58///
59/// Let's look at a few different assignment strategies to get a feeling for
60/// these requirements:
61///
62/// - A very simple way to generate unique IDs would be to just increase a
63///   counter for each element. In this setup, (1) is somewhat satisfied: In
64///   principle, the counter will line up across iterations, but things start to
65///   break down once we generate content dependent on introspection since the
66///   IDs generated for that new content will shift the IDs for all following
67///   elements in the document. (2) is not satisfied since an edit in the middle
68///   of the document shifts all later IDs. (3) is obviously not satisfied.
69///   Conclusion: Not great.
70///
71/// - To make things more robust, we can incorporate some stable knowledge about
72///   the element into the ID. For this, we can use the element's span since it
73///   is already mostly unique: Elements resulting from different source code
74///   locations are guaranteed to have different spans. However, we can also
75///   have multiple distinct elements generated from the same source location:
76///   e.g. `#for _ in range(5) { figure(..) }`. To handle this case, we can then
77///   disambiguate elements with the same span with an increasing counter. In
78///   this setup, (1) is mostly satisfied: Unless we do stuff like generating
79///   colliding counter updates dependent on introspection, things will line up.
80///   (2) is also reasonably well satisfied, as typical edits will only affect
81///   the single element at the currently edited span. Only if we edit inside of
82///   a function, loop, or similar construct, we will affect multiple elements.
83///   (3) is still a problem though, since we count up.
84///
85/// - What's left is to get rid of the mutable state. Note that layout is a
86///   recursive process and has a tree-shaped execution graph. Thus, we can try
87///   to determine an element's ID based on the path of execution taken in this
88///   graph. Something like "3rd element in layer 1, 7th element in layer 2,
89///   ..". This is basically the first approach, but on a per-layer basis. Thus,
90///   we can again apply our trick from the second approach, and use the span +
91///   disambiguation strategy on a per-layer basis: "1st element with span X in
92///   layer 1, 3rd element with span Y in layer 2". The chance for a collision
93///   is now pretty low and our state is wholly local to each level. So, if we
94///   want to parallelize layout within a layer, we can generate the IDs for
95///   that layer upfront and then start forking out. The final remaining
96///   question is how we can compactly encode this information: For this, as
97///   always, we use hashing! We incorporate the ID information from each layer
98///   into a single hash and thanks to the collision resistance of 128-bit
99///   SipHash, we get almost guaranteed unique locations. We don't even store
100///   the full layer information at all, but rather hash _hierarchically:_ Let
101///   `k_x` be our local per-layer ID for layer `x` and `h_x` be the full
102///   combined hash for layer `x`. We compute `h_n = hash(h_(n-1), k_n)`.
103///
104/// So that's what's going on conceptually in this type. For efficient
105/// memoization, we do all of this in a tracked fashion, such that we only
106/// observe the hash for all the layers above us, if we actually need to
107/// generate a [`Location`]. Thus, if we have a piece of content that does not
108/// contain any locatable elements, we can cache its layout even if it occurs in
109/// different places.
110///
111/// # Dealing with measurement
112/// As explained above, any kind of measurement the compiler performs requires a
113/// locator that matches the one used during real layout. This ensures that the
114/// locations assigned during measurement match up exactly with the locations of
115/// real document elements. Without this guarantee, many introspection-driven
116/// features (like counters, state, and citations) don't work correctly (since
117/// they perform queries dependent on concrete locations).
118///
119/// This is all fine and good, but things get really tricky when the _user_
120/// measures such introspecting content since the user isn't kindly managing
121/// locators for us. Our standard `Locator` workflow assigns locations that
122/// depend a lot on the exact placement in the hierarchy of elements. For this
123/// reason, something that is measured, but then placed into something like a
124/// grid will get a location influenced by the grid. Without a locator, we can't
125/// make the connection between the measured content and the real content, so we
126/// can't ensure that the locations match up.
127///
128/// One possible way to deal with this is to force the user to uniquely identify
129/// content before being measured after all. This would mean that the user needs
130/// to come up with an identifier that is unique within the surrounding context
131/// block and attach it to the content in some way. However, after careful
132/// consideration, I have concluded that this is simply too big of an ask from
133/// users: Understanding why this is even necessary is pretty complicated and
134/// how to best come up with a unique ID is even more so.
135///
136/// For this reason, I chose an alternative best-effort approach: The locator
137/// has a custom "measurement mode" (entered through [`LocatorLink::measure`]),
138/// in which it does its best to assign locations that match up. Specifically,
139/// it uses the key hashes of the individual locatable elements in the measured
140/// content (which may not be unique if content is reused) and combines them
141/// with the context's location to find the most likely matching real element.
142/// This approach works correctly almost all of the time (especially for
143/// "normal" hand-written content where the key hashes rarely collide, as
144/// opposed to code-heavy things where they do).
145///
146/// Support for enhancing this with user-provided uniqueness can still be added
147/// in the future. It will most likely anyway be added simply because it's
148/// automatically included when we add a way to "freeze" content for things like
149/// slidehows. But it will be opt-in because it's just too much complication.
150pub struct Locator<'a> {
151    /// A local hash that incorporates all layers since the last memoization
152    /// boundary.
153    local: u128,
154    /// A pointer to an outer cached locator, which contributes the information
155    /// for all the layers beyond the memoization boundary on-demand.
156    outer: Option<&'a LocatorLink<'a>>,
157}
158
159impl<'a> Locator<'a> {
160    /// Create a new root-level locator.
161    ///
162    /// Should typically only be created at the document level, though there
163    /// are a few places where we use it as well that just don't support
164    /// introspection (e.g. tilings).
165    pub fn root() -> Self {
166        Self { local: 0, outer: None }
167    }
168
169    /// Creates a new synthetic locator.
170    ///
171    /// This can be used to create a new dependent layout based on an element.
172    /// This is used for layouting footnote entries based on the location
173    /// of the associated footnote.
174    pub fn synthesize(location: Location) -> Self {
175        Self { local: location.hash(), outer: None }
176    }
177
178    /// Creates a new locator that points to the given link.
179    pub fn link(link: &'a LocatorLink<'a>) -> Self {
180        Self { local: 0, outer: Some(link) }
181    }
182}
183
184impl<'a> Locator<'a> {
185    /// Returns a type that can be used to generate `Locator`s for multiple
186    /// child elements. See the type-level docs for more details.
187    pub fn split(self) -> SplitLocator<'a> {
188        SplitLocator {
189            local: self.local,
190            outer: self.outer,
191            disambiguators: FxHashMap::default(),
192        }
193    }
194
195    /// Creates a copy of this locator for measurement or relayout of the same
196    /// content. See the type-level docs for more details.
197    ///
198    /// This is effectively just `Clone`, but the `Locator` doesn't implement
199    /// `Clone` to make this operation explicit.
200    pub fn relayout(&self) -> Self {
201        Self { local: self.local, outer: self.outer }
202    }
203}
204
205#[comemo::track]
206#[allow(clippy::needless_lifetimes)]
207impl<'a> Locator<'a> {
208    /// Resolves the locator based on its local and the outer information.
209    fn resolve(&self) -> Resolved {
210        match self.outer {
211            None => Resolved::Hash(self.local),
212            Some(outer) => match outer.resolve() {
213                Resolved::Hash(outer) => {
214                    Resolved::Hash(typst_utils::hash128(&(self.local, outer)))
215                }
216                Resolved::Measure(anchor) => Resolved::Measure(anchor),
217            },
218        }
219    }
220}
221
222impl Debug for Locator<'_> {
223    fn fmt(&self, f: &mut Formatter) -> fmt::Result {
224        write!(f, "Locator({:?})", self.resolve())
225    }
226}
227
228/// The fully resolved value of a locator.
229#[derive(Debug, Copy, Clone, Hash)]
230enum Resolved {
231    /// The full hash, incorporating the local and all outer information.
232    Hash(u128),
233    /// Indicates that the locator is in measurement mode, with the given anchor
234    /// location.
235    Measure(Location),
236}
237
238/// A type that generates unique sublocators.
239pub struct SplitLocator<'a> {
240    /// A local hash that incorporates all layers since the last memoization
241    /// boundary.
242    local: u128,
243    /// A pointer to an outer cached locator, which contributes the information
244    /// for all the layers beyond the memoization boundary on-demand.
245    outer: Option<&'a LocatorLink<'a>>,
246    /// Simply counts up the number of times we've seen each local hash.
247    disambiguators: FxHashMap<u128, usize>,
248}
249
250impl<'a> SplitLocator<'a> {
251    /// Produces a sublocator for a subtree keyed by `key`. The keys do *not*
252    /// need to be unique among the `next()` calls on this split locator. (They
253    /// can even all be `&()`.)
254    ///
255    /// However, stable & mostly unique keys lead to more stable locations
256    /// throughout edits, improving incremental compilation performance.
257    ///
258    /// A common choice for a key is the span of the content that will be
259    /// layouted with this locator.
260    pub fn next<K: Hash>(&mut self, key: &K) -> Locator<'a> {
261        self.next_inner(typst_utils::hash128(key))
262    }
263
264    /// Produces a sublocator for a subtree.
265    pub fn next_inner(&mut self, key: u128) -> Locator<'a> {
266        // Produce a locator disambiguator, for elements with the same key
267        // within this `SplitLocator`.
268        let disambiguator = {
269            let slot = self.disambiguators.entry(key).or_default();
270            std::mem::replace(slot, *slot + 1)
271        };
272
273        // Combine the key, disambiguator and local hash into a sub-local hash.
274        // The outer information is not yet merged into this, it is added
275        // on-demand in `Locator::resolve`.
276        let local = typst_utils::hash128(&(key, disambiguator, self.local));
277
278        Locator { outer: self.outer, local }
279    }
280
281    /// Produces a unique location for an element.
282    pub fn next_location(
283        &mut self,
284        introspector: Tracked<Introspector>,
285        key: u128,
286    ) -> Location {
287        match self.next_inner(key).resolve() {
288            Resolved::Hash(hash) => Location::new(hash),
289            Resolved::Measure(anchor) => {
290                // If we aren't able to find a matching element in the document,
291                // default to the anchor, so that it's at least remotely in
292                // the right area (so that counters can be resolved).
293                introspector.locator(key, anchor).unwrap_or(anchor)
294            }
295        }
296    }
297}
298
299/// A locator can be linked to this type to only access information across the
300/// memoization boundary on-demand, improving the cache hit chance.
301pub struct LocatorLink<'a> {
302    /// The link itself.
303    kind: LinkKind<'a>,
304    /// The cached resolved link.
305    resolved: OnceLock<Resolved>,
306}
307
308/// The different kinds of locator links.
309enum LinkKind<'a> {
310    /// An outer `Locator`, which we can resolved if necessary.
311    ///
312    /// We need to override the constraint's lifetime here so that `Tracked` is
313    /// covariant over the constraint. If it becomes invariant, we're in for a
314    /// world of lifetime pain.
315    Outer(Tracked<'a, Locator<'a>, <Locator<'static> as Track>::Call>),
316    /// A link which indicates that we are in measurement mode.
317    Measure(Location),
318}
319
320impl<'a> LocatorLink<'a> {
321    /// Create a locator link.
322    pub fn new(outer: Tracked<'a, Locator<'a>>) -> Self {
323        LocatorLink {
324            kind: LinkKind::Outer(outer),
325            resolved: OnceLock::new(),
326        }
327    }
328
329    /// Creates a link that puts any linked downstream locator into measurement
330    /// mode.
331    ///
332    /// Read the "Dealing with measurement" section of the [`Locator`] docs for
333    /// more details.
334    pub fn measure(anchor: Location) -> Self {
335        LocatorLink {
336            kind: LinkKind::Measure(anchor),
337            resolved: OnceLock::new(),
338        }
339    }
340
341    /// Resolve the link.
342    ///
343    /// The result is cached in this link, so that we don't traverse the link
344    /// chain over and over again.
345    fn resolve(&self) -> Resolved {
346        *self.resolved.get_or_init(|| match self.kind {
347            LinkKind::Outer(outer) => outer.resolve(),
348            LinkKind::Measure(anchor) => Resolved::Measure(anchor),
349        })
350    }
351}
typst_library/introspection/locator.rs

typst_library/introspection/
locator.rs