pub struct Snippet { /* private fields */ }Expand description
A snippet of source code prepared for annotated rendering.
§Source units and spans
Annotation spans are Range<usize> indices into the snippet’s source unit
sequence. The exact meaning of a “unit” depends on how you create the
snippet:
Snippet::with_utf8(),Snippet::with_utf8_bytes()andSnippet::with_latin1()treat a unit as a byte in the original byte sequence. In the UTF-8 case, a valid printable character may correspond to 1 to 4 source units.Snippet::with_utf16_words()treats a unit as a 16-bit word in the original UTF-16 sequence. A valid printable character may correspond to 1 or 2 source units.Snippet::with_chars()treats a unit as acharin the original character sequence.Snippet::builder()allows units to be defined by the caller.
Because the snippet may render replacements (expanded tabs, control-picture
glyphs, <XX> escapes, etc.), source-unit indices are not indices into the
final rendered UTF-8 text. The snippet keeps the necessary mapping so spans
still line up with what is shown.
§Alternate text
Some rendered fragments can be marked as “alternate” (for example, a control-character replacement). Renderers can use this to present those fragments differently (e.g., highlight them).
Implementations§
Source§impl Snippet
impl Snippet
Sourcepub fn with_chars<I>(
start_line: usize,
source: I,
tab_width: usize,
control_char_style: ControlCharStyle,
control_char_alt: bool,
) -> Selfwhere
I: IntoIterator<Item = char>,
pub fn with_chars<I>(
start_line: usize,
source: I,
tab_width: usize,
control_char_style: ControlCharStyle,
control_char_alt: bool,
) -> Selfwhere
I: IntoIterator<Item = char>,
Creates a Snippet from a char sequence.
§Source units and spans
The source unit for this builder is a char of the original
source. Any annotation span you pass later (a Range<usize>) is
interpreted as chars indices into this original source.
§Line breaks
\nand\r\nare treated as line breaks.- A lone
\ris not a line break; it is handled like any other control character.
§Control characters
Control characters are those for which
char_should_be_replaced()
returns true.
- Tabs (U+0009) are replaced with
tab_widthspaces. - ZERO WIDTH JOINER (U+200D) is replaced with nothing (but still accounts for its original source unit length).
- When
control_char_styleisControlCharStyle::Replacement, C0 controls (U+0000 to U+001F, excluding tab) and DEL (U+007F) are replaced with their Unicode Control Pictures (␀, ␁, …). - Any other control character, and C0 controls when
control_char_styleisControlCharStyle::Codepoint, are represented with the hexadecimal value of their code point, in angle brackets, with at least four digits (<U+XXXX>).
Control characters are rendered as alternate text when control_char_alt is
true, with the exception of tabs, which are never marked as alternate text.
§Examples
If source is a char slice:
let snippet = sourceannot::Snippet::with_chars(
1,
chars.iter().copied(),
4,
sourceannot::ControlCharStyle::Codepoint,
true,
);If source is a UTF-8 (str) slice, but you want source units to be
chars instead of bytes:
let snippet = sourceannot::Snippet::with_chars(
1,
source.chars(),
4,
sourceannot::ControlCharStyle::Codepoint,
true,
);Source§impl Snippet
impl Snippet
Sourcepub fn with_latin1(
start_line: usize,
source: &[u8],
tab_width: usize,
control_char_style: ControlCharStyle,
control_char_alt: bool,
) -> Self
pub fn with_latin1( start_line: usize, source: &[u8], tab_width: usize, control_char_style: ControlCharStyle, control_char_alt: bool, ) -> Self
Creates a Snippet from a Latin-1 (ISO 8859-1) source.
This builder interprets each byte of source as a Unicode scalar value
in the range U+0000 to U+00FF.
§Source units and spans
The source unit for this builder is a byte of the original source.
Any annotation span you pass later (a Range<usize>) is interpreted as
byte offsets into this original source slice.
§Line breaks
\nand\r\nare treated as line breaks.- A lone
\ris not a line break; it is handled like any other control character.
§Control characters
Control characters are those for which
char_should_be_replaced()
returns true.
- Tabs (0x09) are replaced with
tab_widthspaces. - When
control_char_styleisControlCharStyle::Replacement, C0 controls (0x00 to 0x1F, excluding tab) and DEL (0x7F) are replaced with their Unicode Control Pictures (␀, ␁, …). - Any other control character, and C0 controls when
control_char_styleisControlCharStyle::Codepoint, are represented with the hexadecimal value of their code point, in angle brackets, with at least four digits (<U+XXXX>).
Control characters are rendered as alternate text when control_char_alt is
true, with the exception of tabs, which are never marked as alternate text.
Source§impl Snippet
impl Snippet
Sourcepub fn with_utf16_words<I>(
start_line: usize,
source: I,
tab_width: usize,
control_char_style: ControlCharStyle,
control_char_alt: bool,
invalid_seq_style: InvalidSeqStyle,
invalid_seq_alt: bool,
) -> Selfwhere
I: IntoIterator<Item = u16>,
pub fn with_utf16_words<I>(
start_line: usize,
source: I,
tab_width: usize,
control_char_style: ControlCharStyle,
control_char_alt: bool,
invalid_seq_style: InvalidSeqStyle,
invalid_seq_alt: bool,
) -> Selfwhere
I: IntoIterator<Item = u16>,
Creates a Snippet from a UTF-16 (possibly invalid) source.
§Source units and spans
The source unit for this builder is a 16-bit word of the original
source. Any annotation span you pass later (a Range<usize>) is
interpreted as word offsets into this original source sequence.
§Line breaks
\nand\r\nare treated as line breaks.- A lone
\ris not a line break; it is handled like any other control character.
§Control characters
Control characters are those for which
char_should_be_replaced()
returns true.
- Tabs (U+0009) are replaced with
tab_widthspaces. - ZERO WIDTH JOINER (U+200D) is replaced with nothing (but still accounts for its original source unit length).
- When
control_char_styleisControlCharStyle::Replacement, C0 controls (U+0000 to U+001F, excluding tab) and DEL (U+007F) are replaced with their Unicode Control Pictures (␀, ␁, …). - Any other control character, and C0 controls when
control_char_styleisControlCharStyle::Codepoint, are represented with the hexadecimal value of their code point, in angle brackets, with at least four digits (<U+XXXX>).
Control characters are rendered as alternate text when control_char_alt is
true, with the exception of tabs, which are never marked as alternate text.
§Invalid UTF-16
- When
invalid_seq_styleisInvalidSeqStyle::Replacement, each unpaired surrogate word is replaced with the Unicode Replacement Character (U+FFFD,�). - When
invalid_seq_styleisInvalidSeqStyle::Hexadecimal, each unpaired surrogate word is represented with its hexadecimal value, in angle brackets, with four digits (<XXXX>).
If invalid_seq_alt is true, the replacement fragments are marked as
“alternate” text.
Source§impl Snippet
impl Snippet
Sourcepub fn with_utf8(
start_line: usize,
source: &str,
tab_width: usize,
control_char_style: ControlCharStyle,
control_char_alt: bool,
) -> Self
pub fn with_utf8( start_line: usize, source: &str, tab_width: usize, control_char_style: ControlCharStyle, control_char_alt: bool, ) -> Self
Creates a Snippet from a valid UTF-8 source.
§Source units and spans
The source unit for this builder is a byte of the original source.
Any annotation span you pass later (a Range<usize>) is interpreted as
byte offsets into this original source slice.
If you want the source units to be chars instead of bytes, use
Snippet::with_chars(), passing source.chars() as
source.
§Line breaks
\nand\r\nare treated as line breaks.- A lone
\ris not a line break; it is handled like any other control character.
§Control characters
Control characters are those for which
char_should_be_replaced()
returns true.
- Tabs (U+0009) are replaced with
tab_widthspaces. - ZERO WIDTH JOINER (U+200D) is replaced with nothing (but still accounts for its original source unit length).
- When
control_char_styleisControlCharStyle::Replacement, C0 controls (U+0000 to U+001F, excluding tab) and DEL (U+007F) are replaced with their Unicode Control Pictures (␀, ␁, …). - Any other control character, and C0 controls when
control_char_styleisControlCharStyle::Codepoint, are represented with the hexadecimal value of their code point, in angle brackets, with at least four digits (<U+XXXX>).
Control characters are rendered as alternate text when control_char_alt is
true, with the exception of tabs, which are never marked as alternate text.
Sourcepub fn with_utf8_bytes(
start_line: usize,
source: &[u8],
tab_width: usize,
control_char_style: ControlCharStyle,
control_char_alt: bool,
invalid_seq_style: InvalidSeqStyle,
invalid_seq_alt: bool,
) -> Self
pub fn with_utf8_bytes( start_line: usize, source: &[u8], tab_width: usize, control_char_style: ControlCharStyle, control_char_alt: bool, invalid_seq_style: InvalidSeqStyle, invalid_seq_alt: bool, ) -> Self
Creates a Snippet from a UTF-8 (possibly invalid) source.
§Source units and spans
The source unit for this builder is a byte of the original source.
Any annotation span you pass later (a Range<usize>) is interpreted as
byte offsets into this original source slice.
§Line breaks
\nand\r\nare treated as line breaks.- A lone
\ris not a line break; it is handled like any other control character.
§Control characters
Control characters are those for which
char_should_be_replaced()
returns true.
- Tabs (U+0009) are replaced with
tab_widthspaces. - ZERO WIDTH JOINER (U+200D) is replaced with nothing (but still accounts for its original source unit length).
- When
control_char_styleisControlCharStyle::Replacement, C0 controls (U+0000 to U+001F, excluding tab) and DEL (U+007F) are replaced with their Unicode Control Pictures (␀, ␁, …). - Any other control character, and C0 controls when
control_char_styleisControlCharStyle::Codepoint, are represented with the hexadecimal value of their code point, in angle brackets, with at least four digits (<U+XXXX>).
Control characters are rendered as alternate text when control_char_alt is
true, with the exception of tabs, which are never marked as alternate text.
§Invalid UTF-8
- When
invalid_seq_styleisInvalidSeqStyle::Replacement, each invalid UTF-8 sequence is replaced with the Unicode Replacement Character (U+FFFD,�). - When
invalid_seq_styleisInvalidSeqStyle::Hexadecimal, each byte of an invalid UTF-8 sequence is represented with its hexadecimal value, in angle brackets, with two digits (<XX>).
If invalid_seq_alt is true, the replacement fragments are marked as
“alternate” text.
Source§impl Snippet
impl Snippet
Sourcepub fn builder(start_line: usize) -> SnippetBuilder
pub fn builder(start_line: usize) -> SnippetBuilder
Creates a builder for manually constructing a snippet.
start_line is the line number to associate with the first rendered
line of the snippet. This is typically 1 when the snippet corresponds
to a whole source file.
See SnippetBuilder for details on how to use the builder.
Sourcepub fn src_pos_to_line_col(&self, pos: usize) -> (usize, usize)
pub fn src_pos_to_line_col(&self, pos: usize) -> (usize, usize)
Converts a source position to a (line, column) pair.
The line and column numbers are zero-based and column is calculated based on the rendered text.