Expand description
RustDOT is mostly the Graphviz DOT language, lightly rustified. It can be embedded as a macro or parsed from a string or file. The purpose is extracting the stucture. Layout hints are currently out of scope.
let g1 = rust_dot! {
graph {
A -- B -- C; /* semicolon is optional */
"B" -- D // quotes not needed here
}
};
println!("{} {} \"{}\" {:?} {:?}", g1.strict, g1.directed, g1.name, g1.nodes, g1.edges);
// false false "" ["A", "B", "C", "D"] [(0, 1), (1, 2), (1, 3)]
let g2 = parse_string("digraph Didi { -1 -> 2 -> .3 2 -> 4.2 }");
println!("{} {} \"{}\" {:?} {:?}", g2.strict, g2.directed, g2.name, g2.nodes, g2.edges);
// false true "Didi" ["-1", "2", ".3", "4.2"] [(0, 1), (1, 2), (1, 3)]
The return values can be fed to crates petgraph
:
let mut petgraph = petgraph::graph::Graph::new();
let nodes: Vec<_> = rust_dot_graph.nodes
.iter()
.map(|node| petgraph
.add_node(node))
.collect();
for edge in rust_dot_graph.edges {
petgraph
.add_edge(nodes[edge.0], nodes[edge.1], ());
};
or graph
/graph_builder
:
use graph::prelude::*;
let graph: DirectedCsrGraph<usize> = GraphBuilder::new()
.csr_layout(CsrLayout::Sorted)
.edges(rust_dot_graph.edges)
.build();
This is work in progress. Nothing is stabilised!
§Todo
-
Implement
strict
, it is currently ignored/skipped -
Return Err instead of panicking on wrong input
-
Put Spans on Lexemes, based on their input, maybe using crate macroex
-
Separate return type (currently
Parser
, which should be internal) -
Implement node attributes, they are currently ignored/skipped
-
Implement node defaults
-
Implement edge attributes, they are currently ignored/skipped
-
Implement edge defaults
-
Deal with graph attributes, with and without keyword
graph
-
Reimplement
rust_dot
as a proc-macro, transforming its input as const at compile time -
As an extension to DOT, allow label or weight to come from a Rust expression
-
As an extension to DOT, allow label or weight to come from invoking a closure
§Limitations
Rust macros are tokenised by the Rust lexer, which is subtly different from Graphviz. For consistency (and ease of
implementation) the parse_*
functions use the same lexer. These are the consequences:
- Macros must be in UTF-8, while the input to the
parse_*
functions may also be UTF-16 or Latin-1. You must deal with other encodings yourself. - Double quotes, parentheses, braces and brackets must be balanced and some characters are not allowed. As a workaround
you can change something like the following first line into the second. The commented quotes are seen by Rust, but
ignored as HTML (once that is implemented):
<<I>"</I> <B> )}] [{( </B> \\> <<I>"<!--"--></I> <B><!--"--> )}] [{( <!--"--></B> <!--"-->\\<!--"-->>
- Html is partially a space aware language, where Rust is not. So on the macro side it’s impossible to get space right,
and on run time input it would be quite some effort. Instead this uses a heuristic of space between everything, except
inside tags and entities and before
[,;.:!?]
(incomplete and wrong for some languages.) - Strings are not yet unescaped, when we get them, yet the Rust lexer validates them. The
parse_*
functions work around this, but inrust_dot!
you must use raw strings liker"\N"
when they contain unrusty backslash sequences. - Comments are exactly Rust comments. They differ from DOT in that block comments can nest.
- Not officially comments, but everything after
#
on the same line is also discarded. Unlike real comments, these are handled by RustDOT, after lexical analysis. This means that the rest of the line, like the 1st point above, must be balanced. And it will only end after the closing delimiter, so you should put that on the same line! Inrust_dot!
you must use//
instead! (Only the nightly compiler gives access to line numbers in macros.) - Valid identifiers should be accepted by Rust. Though (only in
rust_dot!
) confusable letters like cyrillic ‘о’ or rare scripts like runic give warnings. - Valid numbers should be accepted by Rust. And floats do not need a leading zero before decimal dot.
- RustDOT returns one graph, so it wants one in the input. The grammar doesn’t clarify multiple graphs per file, but they are accepted. However they lead to 2 svgs invalidly concatenated in one file or a png displaying only the first. Likewise it accepts an empty document – not so RustDOT.
Macros§
- rust_
dot - Embed RustDOT as a sub-language in Rust.
Functions§
- _parse_
token_ stream - The internal work horse, exposed only for
rust_dot!
. - parse_
bytes - Transform RustDOT from input at run time. If input is not UTF-8 and has a UTF-16 BOM, or (given that graphs must start with ASCII) a NUL as 1st or 2nd byte, it is treated as UTF-16, either BE or LE. Otherwise it is treated as Latin-1.
- parse_
file - Transform RustDOT from file at run time.
Input is tried as UTF-8, UTF-16 and Latin-1 (see details at
parse_bytes()
). - parse_
read - Transform RustDOT from
Read
object at run time (generalisedparse_file()
). - parse_
string - Transform RustDOT from input at run time.