typst_library/introspection/locator.rs
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351
use std::collections::HashMap;
use std::fmt::{self, Debug, Formatter};
use std::hash::Hash;
use std::sync::OnceLock;
use comemo::{Tracked, Validate};
use crate::introspection::{Introspector, Location};
/// Provides locations for elements in the document.
///
/// A [`Location`] is a unique ID for an element generated during realization.
///
/// # How to use this
/// The same content may yield different results when laid out in different
/// parts of the document. To reflect this, every layout operation receives a
/// locator and every layout operation requires a locator. In code:
///
/// - all layouters receive an owned `Locator`
/// - all layout functions take an owned `Locator`
///
/// When a layouter only requires a single sublayout call, it can simply pass on
/// its locator. When a layouter needs to call multiple sublayouters, we need to
/// make an explicit decision:
///
/// - Split: When we're layouting multiple distinct children (or other pieces of
/// content), we need to split up the locator with [`Locator::split`]. This
/// allows us to produce multiple new `Locator`s for the sublayouts. When we
/// split the locator, each sublocator will be a distinct entity and using it
/// to e.g. layout the same piece of figure content will yield distinctly
/// numbered figures.
///
/// - Relayout: When we're layouting the same content multiple times (e.g. when
/// measuring something), we can call [`Locator::relayout`] to use the same
/// locator multiple times. This indicates to the compiler that it's actually
/// the same content. Using it to e.g. layout the same piece of figure content
/// will yield the same figure number both times. Typically, when we layout
/// something multiple times using `relayout`, only one of the outputs
/// actually ends up in the document, while the other outputs are only used
/// for measurement and then discarded.
///
/// The `Locator` intentionally does not implement `Copy` and `Clone` so that it
/// can only be used once. This ensures that whenever we are layouting multiple
/// things, we make an explicit decision whether we want to split or relayout.
///
/// # How it works
/// There are two primary considerations for the assignment of locations:
///
/// 1. Locations should match up over multiple layout iterations, so that
/// elements can be identified as being the same: That's the whole point of
/// them.
///
/// 2. Locations should be as stable as possible across document edits, so that
/// incremental compilation is effective.
///
/// 3. We want to assign them with as little long-lived state as possible to
/// enable parallelization of the layout process.
///
/// Let's look at a few different assignment strategies to get a feeling for
/// these requirements:
///
/// - A very simple way to generate unique IDs would be to just increase a
/// counter for each element. In this setup, (1) is somewhat satisfied: In
/// principle, the counter will line up across iterations, but things start to
/// break down once we generate content dependent on introspection since the
/// IDs generated for that new content will shift the IDs for all following
/// elements in the document. (2) is not satisfied since an edit in the middle
/// of the document shifts all later IDs. (3) is obviously not satisfied.
/// Conclusion: Not great.
///
/// - To make things more robust, we can incorporate some stable knowledge about
/// the element into the ID. For this, we can use the element's span since it
/// is already mostly unique: Elements resulting from different source code
/// locations are guaranteed to have different spans. However, we can also
/// have multiple distinct elements generated from the same source location:
/// e.g. `#for _ in range(5) { figure(..) }`. To handle this case, we can then
/// disambiguate elements with the same span with an increasing counter. In
/// this setup, (1) is mostly satisfied: Unless we do stuff like generating
/// colliding counter updates dependent on introspection, things will line up.
/// (2) is also reasonably well satisfied, as typical edits will only affect
/// the single element at the currently edited span. Only if we edit inside of
/// a function, loop, or similar construct, we will affect multiple elements.
/// (3) is still a problem though, since we count up.
///
/// - What's left is to get rid of the mutable state. Note that layout is a
/// recursive process and has a tree-shaped execution graph. Thus, we can try
/// to determine an element's ID based on the path of execution taken in this
/// graph. Something like "3rd element in layer 1, 7th element in layer 2,
/// ..". This is basically the first approach, but on a per-layer basis. Thus,
/// we can again apply our trick from the second approach, and use the span +
/// disambiguation strategy on a per-layer basis: "1st element with span X in
/// layer 1, 3rd element with span Y in layer 2". The chance for a collision
/// is now pretty low and our state is wholly local to each level. So, if we
/// want to parallelize layout within a layer, we can generate the IDs for
/// that layer upfront and then start forking out. The final remaining
/// question is how we can compactly encode this information: For this, as
/// always, we use hashing! We incorporate the ID information from each layer
/// into a single hash and thanks to the collision resistance of 128-bit
/// SipHash, we get almost guaranteed unique locations. We don't even store
/// the full layer information at all, but rather hash _hierarchically:_ Let
/// `k_x` be our local per-layer ID for layer `x` and `h_x` be the full
/// combined hash for layer `x`. We compute `h_n = hash(h_(n-1), k_n)`.
///
/// So that's what's going on conceptually in this type. For efficient
/// memoization, we do all of this in a tracked fashion, such that we only
/// observe the hash for all the layers above us, if we actually need to
/// generate a [`Location`]. Thus, if we have a piece of content that does not
/// contain any locatable elements, we can cache its layout even if it occurs in
/// different places.
///
/// # Dealing with measurement
/// As explained above, any kind of measurement the compiler performs requires a
/// locator that matches the one used during real layout. This ensures that the
/// locations assigned during measurement match up exactly with the locations of
/// real document elements. Without this guarantee, many introspection-driven
/// features (like counters, state, and citations) don't work correctly (since
/// they perform queries dependent on concrete locations).
///
/// This is all fine and good, but things get really tricky when the _user_
/// measures such introspecting content since the user isn't kindly managing
/// locators for us. Our standard `Locator` workflow assigns locations that
/// depend a lot on the exact placement in the hierarchy of elements. For this
/// reason, something that is measured, but then placed into something like a
/// grid will get a location influenced by the grid. Without a locator, we can't
/// make the connection between the measured content and the real content, so we
/// can't ensure that the locations match up.
///
/// One possible way to deal with this is to force the user to uniquely identify
/// content before being measured after all. This would mean that the user needs
/// to come up with an identifier that is unique within the surrounding context
/// block and attach it to the content in some way. However, after careful
/// consideration, I have concluded that this is simply too big of an ask from
/// users: Understanding why this is even necessary is pretty complicated and
/// how to best come up with a unique ID is even more so.
///
/// For this reason, I chose an alternative best-effort approach: The locator
/// has a custom "measurement mode" (entered through [`LocatorLink::measure`]),
/// in which it does its best to assign locations that match up. Specifically,
/// it uses the key hashes of the individual locatable elements in the measured
/// content (which may not be unique if content is reused) and combines them
/// with the context's location to find the most likely matching real element.
/// This approach works correctly almost all of the time (especially for
/// "normal" hand-written content where the key hashes rarely collide, as
/// opposed to code-heavy things where they do).
///
/// Support for enhancing this with user-provided uniqueness can still be added
/// in the future. It will most likely anyway be added simply because it's
/// automatically included when we add a way to "freeze" content for things like
/// slidehows. But it will be opt-in because it's just too much complication.
pub struct Locator<'a> {
/// A local hash that incorporates all layers since the last memoization
/// boundary.
local: u128,
/// A pointer to an outer cached locator, which contributes the information
/// for all the layers beyond the memoization boundary on-demand.
outer: Option<&'a LocatorLink<'a>>,
}
impl<'a> Locator<'a> {
/// Create a new root-level locator.
///
/// Should typically only be created at the document level, though there
/// are a few places where we use it as well that just don't support
/// introspection (e.g. tilings).
pub fn root() -> Self {
Self { local: 0, outer: None }
}
/// Creates a new synthetic locator.
///
/// This can be used to create a new dependent layout based on an element.
/// This is used for layouting footnote entries based on the location
/// of the associated footnote.
pub fn synthesize(location: Location) -> Self {
Self { local: location.hash(), outer: None }
}
/// Creates a new locator that points to the given link.
pub fn link(link: &'a LocatorLink<'a>) -> Self {
Self { local: 0, outer: Some(link) }
}
}
impl<'a> Locator<'a> {
/// Returns a type that can be used to generate `Locator`s for multiple
/// child elements. See the type-level docs for more details.
pub fn split(self) -> SplitLocator<'a> {
SplitLocator {
local: self.local,
outer: self.outer,
disambiguators: HashMap::new(),
}
}
/// Creates a copy of this locator for measurement or relayout of the same
/// content. See the type-level docs for more details.
///
/// This is effectively just `Clone`, but the `Locator` doesn't implement
/// `Clone` to make this operation explicit.
pub fn relayout(&self) -> Self {
Self { local: self.local, outer: self.outer }
}
}
#[comemo::track]
#[allow(clippy::needless_lifetimes)]
impl<'a> Locator<'a> {
/// Resolves the locator based on its local and the outer information.
fn resolve(&self) -> Resolved {
match self.outer {
None => Resolved::Hash(self.local),
Some(outer) => match outer.resolve() {
Resolved::Hash(outer) => {
Resolved::Hash(typst_utils::hash128(&(self.local, outer)))
}
Resolved::Measure(anchor) => Resolved::Measure(anchor),
},
}
}
}
impl Debug for Locator<'_> {
fn fmt(&self, f: &mut Formatter) -> fmt::Result {
write!(f, "Locator({:?})", self.resolve())
}
}
/// The fully resolved value of a locator.
#[derive(Debug, Copy, Clone, Hash)]
enum Resolved {
/// The full hash, incorporating the local and all outer information.
Hash(u128),
/// Indicates that the locator is in measurement mode, with the given anchor
/// location.
Measure(Location),
}
/// A type that generates unique sublocators.
pub struct SplitLocator<'a> {
/// A local hash that incorporates all layers since the last memoization
/// boundary.
local: u128,
/// A pointer to an outer cached locator, which contributes the information
/// for all the layers beyond the memoization boundary on-demand.
outer: Option<&'a LocatorLink<'a>>,
/// Simply counts up the number of times we've seen each local hash.
disambiguators: HashMap<u128, usize>,
}
impl<'a> SplitLocator<'a> {
/// Produces a sublocator for a subtree keyed by `key`. The keys do *not*
/// need to be unique among the `next()` calls on this split locator. (They
/// can even all be `&()`.)
///
/// However, stable & mostly unique keys lead to more stable locations
/// throughout edits, improving incremental compilation performance.
///
/// A common choice for a key is the span of the content that will be
/// layouted with this locator.
pub fn next<K: Hash>(&mut self, key: &K) -> Locator<'a> {
self.next_inner(typst_utils::hash128(key))
}
/// Produces a sublocator for a subtree.
pub fn next_inner(&mut self, key: u128) -> Locator<'a> {
// Produce a locator disambiguator, for elements with the same key
// within this `SplitLocator`.
let disambiguator = {
let slot = self.disambiguators.entry(key).or_default();
std::mem::replace(slot, *slot + 1)
};
// Combine the key, disambiguator and local hash into a sub-local hash.
// The outer information is not yet merged into this, it is added
// on-demand in `Locator::resolve`.
let local = typst_utils::hash128(&(key, disambiguator, self.local));
Locator { outer: self.outer, local }
}
/// Produces a unique location for an element.
pub fn next_location(
&mut self,
introspector: Tracked<Introspector>,
key: u128,
) -> Location {
match self.next_inner(key).resolve() {
Resolved::Hash(hash) => Location::new(hash),
Resolved::Measure(anchor) => {
// If we aren't able to find a matching element in the document,
// default to the anchor, so that it's at least remotely in
// the right area (so that counters can be resolved).
introspector.locator(key, anchor).unwrap_or(anchor)
}
}
}
}
/// A locator can be linked to this type to only access information across the
/// memoization boundary on-demand, improving the cache hit chance.
pub struct LocatorLink<'a> {
/// The link itself.
kind: LinkKind<'a>,
/// The cached resolved link.
resolved: OnceLock<Resolved>,
}
/// The different kinds of locator links.
enum LinkKind<'a> {
/// An outer `Locator`, which we can resolved if necessary.
///
/// We need to override the constraint's lifetime here so that `Tracked` is
/// covariant over the constraint. If it becomes invariant, we're in for a
/// world of lifetime pain.
Outer(Tracked<'a, Locator<'a>, <Locator<'static> as Validate>::Constraint>),
/// A link which indicates that we are in measurement mode.
Measure(Location),
}
impl<'a> LocatorLink<'a> {
/// Create a locator link.
pub fn new(outer: Tracked<'a, Locator<'a>>) -> Self {
LocatorLink {
kind: LinkKind::Outer(outer),
resolved: OnceLock::new(),
}
}
/// Creates a link that puts any linked downstream locator into measurement
/// mode.
///
/// Read the "Dealing with measurement" section of the [`Locator`] docs for
/// more details.
pub fn measure(anchor: Location) -> Self {
LocatorLink {
kind: LinkKind::Measure(anchor),
resolved: OnceLock::new(),
}
}
/// Resolve the link.
///
/// The result is cached in this link, so that we don't traverse the link
/// chain over and over again.
fn resolve(&self) -> Resolved {
*self.resolved.get_or_init(|| match self.kind {
LinkKind::Outer(outer) => outer.resolve(),
LinkKind::Measure(anchor) => Resolved::Measure(anchor),
})
}
}