jumprope/lib.rs
1//! # JumpRope
2//!
3//! A small, fast rope library for rust built on a skip list of gap buffers
4//!
5//! This library enables super fast in-memory string editing, where an edit might insert, delete
6//! or modify text from anywhere in the string. Unlike inserting and deleting in a String directly,
7//! jumprope avoids expensive memcopy / memmove operations. All editing operations are O(log n)
8//! based on the size of the string.
9//!
10//! ## Example
11//!
12//! ```
13//! use jumprope::JumpRope;
14//!
15//! let mut rope = JumpRope::from("Some large text document");
16//! rope.insert(5, "really "); // "Some really large text document"
17//! rope.replace(0..4, "My rad"); // "My rad really large text document"
18//! assert_eq!(rope, "My rad really large text document");
19//!
20//! // Extract to a string
21//! let s: String = rope.to_string();
22//! assert_eq!(s, "My rad really large text document");
23//! ```
24//!
25//! See the [`JumpRope`] type for more usage details.
26//!
27//! # Random numbers, Determinism and DoS protection
28//!
29//! Jumprope is built on top of [skip lists](https://en.wikipedia.org/wiki/Skip_list), which are a
30//! probabilistic data structure. Each node in the list uses a random number generator to decide
31//! its "height". To do this well, skip lists depend on a random number generator for performance.
32//! If a pathologically bad RNG source was used, the skip list would degrade to a linked list (with
33//! `O(n)` performance).
34//!
35//! ## Security
36//!
37//! We have plenty of high quality RNGs available in rust. However, the bad news is that if a
38//! malicious actor can:
39//!
40//! - Predict the sequence of random numbers, and
41//! - Control a sequence of insert & removal operations in the rope
42//!
43//! Then they can *force* the rope to degrade to `O(n)` performance.
44//!
45//! The obvious protection against this is to use a good RNG, seeded with a good entropy source.
46//! This makes the random sequence impossible to predict. Luckily jumprope isn't sensitive to the
47//! performance of the RNG used. The only downside is that using a CSRNG + a good entropy source
48//! makes the compiled binary bigger.
49//!
50//! So there's a feature flag: `["ddos_protection"]`. This flag configures jumprope to use a larger
51//! CSRNG instead of a PRNG. To disable it (eg for WASM), you need to compile jumprope with default
52//! features turned off:
53//!
54//! ```toml
55//! jumprope = { default-features = false }
56//! ```
57//!
58//!
59//!
60//! # A rant on character lengths
61//!
62//! There are 3 different, useful ways to measure string lengths. All of them are useful in certain
63//! situations:
64//!
65//! - The number of bytes needed to represent the string, in some specific encoding (eg UTF8)
66//! - The number of unicode characters contained within
67//! - The number of grapheme clusters in the string. This is the number of characters drawn to
68//! the screen.
69//!
70//! For example, the unicode polar bear ("🐻❄️") has a single grapheme cluster (only one
71//! character is drawn). It contains 4 unicode characters (Bear emoji + zero width joiner + snow
72//! emoji + variation selector). And it takes 16 bytes to store in UTF8.
73//!
74//! ```
75//! # use jumprope::*;
76//! assert_eq!("🐻❄️".len(), 13);
77//! assert_eq!("🐻❄️".chars().count(), 4);
78//!
79//! let rope = JumpRope::from("🐻❄️"); // One grapheme cluster
80//! assert_eq!(rope.len_bytes(), 13); // 13 UTF8 bytes
81//! assert_eq!(rope.len_chars(), 4); // 4 unicode characters
82//! ```
83//!
84//! Worse, many popular languages (including javascript and C#) use UCS2 internally and thus their
85//! `string.length` property doesn't give you a useful value for any application. Javascript reports
86//! a snowman's length as 5 - which is useless:
87//!
88//! ```shell
89//! $ node
90//! Welcome to Node.js v16.6.1.
91//! > "🐻❄️".length
92//! 5
93//! ```
94//!
95//! But there is no perfect "length" property for a string anyway:
96//!
97//! - The number of bytes is encoding-specific. The polar bear takes 16 bytes in UTF8, but only 10
98//! bytes in UTF16.
99//! - The number of grapheme clusters varies by device, font and software version. The conversion
100//! from characters to grapheme clusters is complex, and changes all the time. The polar bear
101//! icon was only added in May 2019. If your software is older than that (or uses a text library
102//! older than that), you will just see "🐻❄️".
103//!
104//! Most CRDTs and OT systems are slowly standardizing on counting unicode character positions as
105//! the default "length" property. The number of unicode characters isn't human-meaningful, but it
106//! has a number of useful properties:
107//!
108//! - Its simple and easy to define
109//! - Its stable across time (unlike grapheme clusters)
110//! - Its rarely convenient, but its very portable across different programming languages,
111//! regardless of that language's character encoding system.
112//!
113//! Jumprope follows this approach, using unicode character positions everywhere internally:
114//!
115//! ```
116//! # use jumprope::*;
117//! let mut rope = JumpRope::from("🐻❄️");
118//! rope.remove(1..4); // Remove "polar" from our polar bear
119//! assert_eq!(rope, "🐻");
120//! ```
121
122#![cfg_attr(doc_cfg, feature(doc_cfg))]
123
124mod jumprope;
125mod gapbuffer;
126mod utils;
127mod iter;
128mod fast_str_tools;
129
130pub use crate::jumprope::JumpRope;
131
132mod buffered;
133pub use crate::buffered::JumpRopeBuf;