lsp-document
Helpers to convert between LSP representations of text documents and Rust strings.
TL;DR:
LSP uses UTF16-encoded strings while Rust's strings are UTF8-encoded. This means that text offsets in LSP and in Rust are different:
- LSP offsets are in 16-bit code-units and each character is either 1 or 2 of those,
- Rust strings are indexed in bytes and each character takes from 1 to 4 bytes.
To ensure that LSP client and server "talk" about the same part of a text document we need a translation layer.
This crate provides such a layer.
Example usage
See the docs for more details.
use ;
use Position;
// Character width
// U16: 1111111111111 1111111111 1 11 1 1 111111111 21
// U8: 1111111111111 1222122221 1 13 3 3 111111111 41
// U8 offset
// 0 1 2 3 4 5
// 0123456789012 3468013579 0 12 5 8 123456789 04
let text = "Hello, world!\nКак дела?\r\n做得好\nThis is 💣!";
let text = new;
//
// Examples of using TextMap methods
//
// Pos of 💣 from its offset
assert_eq!;
// Raw line range info
assert_eq!;
// Extracting part of text between two positions
assert_eq!;
//
// Example of using TextAdapter methods
//
// Pos of `!` after 💣
assert_eq!;
assert_eq!;
Using String
s for text manipulation
Currently, the crate works for str
-like representation of text. UTF8-encoded strings are efficiently packed in memory, which means:
- 👍 There's low memory overhead of storing these strings.
- 👍 The contents is contiguous in memory, hence random access and iteration over chars are fast (the latter is important for conversion between lsp and native positions).
- 👎 Making changes to strings is slow as it requires time proportional to the length of the string.
Most likely, the performance impact of 3. won't be a problem as we query
data much more often than we change it (in the context of LSP servers). So,
using String
s should be just fine for a lot of applications.
However, having an implementation backed by a Rope
wouldn't hurt (although this is not a priority at the moment).