rivescript 0.2.0

Implementation of a RiveScript chatbot interpreter for Rust.
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
# RiveScript in Rust

This is a port of the RiveScript interpreter for the Rust programming language.

RiveScript is a scripting language for authoring the classic "canned responses" type of chatbots, making it easy for bot authors to program triggers and responses to build a chatbot's personality. See [rivescript.com](https://www.rivescript.com) for details.

> **Current Status: Beta**
>
> This port of RiveScript is "feature complete" and functional, implementing all of the commands and tags of RiveScript, but it has not been extensively field tested and is lacking a comprehensive unit test suite.
>
> The "stable 1.0.0" version of rivescript-rs will be released when:
>
> 1. The [RiveScript Test Suite (rsts)][rsts] has been implemented to verify that the Rust port is _at least_ as accurate as the other 5 official RiveScript ports are.
> 2. A JavaScript engine for RiveScript Object Macros has been implemented, to verify that the interface for foreign language macro handlers is correctly done.
> 3. A Redis driver for [User Variable Session Management](#user-variable-session-adapters) is implemented, to verify that the trait for that works as intended.

# Usage

This crate provides both a library and a stand-alone executable, the latter of which is an interactive command line shell for testing your RiveScript bot. Run the program with the path to a folder (or file) on disk that contains your RiveScript documents. Example:

```bash
$ rivescript ./eg/brain
```

See `rivescript --help` for options it accepts, including debug mode and UTF-8 mode.

When used as a library for writing your own chatbot in Rust, the synopsis is as follows:

```rust
use rivescript::RiveScript;

#[tokio::main]
async fn main() {

    // Create a RiveScript bot instance.
    let mut bot = RiveScript::new();

    // Enable UTF-8 mode to support non-English chatbots.
    // See "UTF-8 Support" in the README for details.
    bot.utf8 = true;

    // Load a directory of RiveScript documents (.rive files)
    bot.load_directory("./eg/brain").expect("Error loading files!");

    // Load additional replies from a single .rive file.
    bot.load_file("./replies.rive").expect("Error loading file!");

    // Load RiveScript source from a string value instead of files.
    bot.stream("
        + hello bot
        - Hello, human!
    ").expect("Error parsing the streamed code!");

    // After loading your RiveScript sources, be sure to sort the triggers!
    // This populates internal sort structures to match a user's message with
    // the most optimal triggers in your bot's brain.
    bot.sort_triggers();

    // Enter a main loop to chat with the bot in your terminal.
    loop {

        // Print the prompt.
        print!("You> ");
        io::stdout().flush().expect("oops");

        // Read user input.
        let mut message = String::new();
        io::stdin()
            .read_line(&mut message)
            .expect("Failed to read line");

        // Get the reply.
        match bot.reply("local-user", &message).await {
            Ok(reply) => println!("Bot> {reply}"),
            Err(e) => println!("Error> {e}"),
        };

    }
}
```

# Configuration

After calling `RiveScript::new()` you may configure the object to customize its behavior by setting the following attributes:

* `debug: bool` to enable debug mode. This will use log::debug and log::warn to print details about RiveScript's inner execution to your console. Note: the debug output is _very_ verbose!
* `utf8: bool` can enable [UTF-8 mode](#utf-8-support).
* `depth: usize` will set the recursion depth limit (default 50). This limit protects your bot from infinite recursion errors, in case two triggers redirect to each other.
* `case_sensitive: bool` can make user messages case sensitive. The default is false, and user messages are made lowercase before matching against your triggers. If you set a true value, their message will not be made lowercase.

The `rivescript` command-line program can set some of these options with flags like `--debug` and `--utf8`. See `rivescript --help` for full details.

The recursion depth limit can also be overridden in your RiveScript brain using the `! global` command like so:

```rivescript
! global depth = 256
```

# Async API

The main `rivescript.reply()` function is an async function, so you will need to use an async runtime such as `tokio` to use this library. The example above uses an `async fn main()` using tokio.

Historically, most of the other implementations of RiveScript (written in Perl, Python, Java, and Go) were written in a synchronous (procedural) manner, where the reply() function was not async. This was OK for those languages because those languages were not generally async aware overall: common libraries for things like SQL databases and HTTP requests all had blocking (synchronous) API calls; so for example, an [Object Macro](#rust-object-macros) was able to interact with these APIs and get its answer synchronously and the main reply() function could be synchronous to match, and similarly, [User Variable Session Adapters](#user-variable-session-adapters) were able to get/set variables in a Redis cache or SQL database using the synchronous APIs common to those languages.

This model led to some friction with its JavaScript port, because JavaScript is a heavily async language and all of the useful libraries (for web requests, SQL, etc.) were asynchronous, and RiveScript wasn't able to stop and await for these during the reply() phase. Eventually, when Async/Await support dropped in JavaScript, RiveScript.js was able to await these calls while still keeping its overall logic in line with the other ports.

For the Rust port, async/await was built in from the beginning in case you want to call async crates from within a RiveScript reply.

# UTF-8 Support

RiveScript, historically, was not designed with UTF-8 in mind from the beginning. All ports of RiveScript provide a "UTF-8 mode," however, which is labeled as an 'experimental' feature of RiveScript (because its use may affect trigger matching behavior in subtle ways).

By default (without UTF-8 mode enabled), RiveScript triggers are only allowed to contain basic ASCII characters (no foreign characters), and the user's input message will be stripped of all characters except for letters, numbers and spaces. Note: this stripping happens after substitutions are run, so you can `! sub what's = what is` to normalize and process their message first (and substitutions for those kind of contractions is recommended practice).

When UTF-8 mode is enabled, these restrictions are lifted:

* Triggers in RiveScript sources will only be limited to not contain certain metacharacters such as backslashes.
* The user's message is only stripped of backslashes and HTML angled brackets (to protect from obvious XSS attacks if you use RiveScript in a web application).

    Additionally, common punctuation characters will be stripped from the user's message, with the default set being `/[.,!?;:]/` which can be overridden by providing a new regexp of your own (RiveScript.set_unicode_punctuation()).

The `<star>` tags in RiveScript would therefore be able to match the user's "raw" input strings (with non-ASCII characters preserved).

# Rust Object Macros

RiveScript has a feature called "object macros" that enable you to write custom program code to provide a dynamic response in your chatbot. For example, your bot can have a trigger for "what is the weather like in Los Angeles?" which could run custom code to fetch the answer from a weather API or similar.

All RiveScript interpreters support object macros written in their native programming language, and the Rust port is no exception!

Here is an example how to define a custom object macro subroutine in Rust:

```rust
#[tokio::main]
async fn main() {
    let mut bot = RiveScript::new();

    // Define an object macro named "hello-rust"
    bot.set_subroutine("hello-rust", |proxy, args| {
        async move {
            if args.len() >= 1 {
                let value = args.join(" ");
                return proxy.finish(format!("Hello, {value}!"));
            }
            proxy.finish("Hello, rust!".to_string())
        }.boxed()
    });

    // Example RiveScript document to call this macro.
    bot.stream("
        + hello rust
        - <call>hello-rust</call>

        + hello *
        - <call>hello-rust <star></call>
    ").expect("Failed to parse");

    bot.sort_triggers();

    assert_eq!(bot.reply("username", "hello rust").await, "Hello, rust!");
}
```

## RiveScript Proxy for Object Macro Subroutines

If you are familiar with the other RiveScript ports, the Rust version has some unique nuances due to the borrow checker: usually, object macro subroutines would receive a pointer to the master RiveScript struct and a string array of parameters, but in Rust it wouldn't be possible to send a mutual borrow of RiveScript with the subroutine.

Instead, a rivescript::macros::Proxy is passed in. The Proxy exposes a subset of useful RiveScript functions (such as get_uservar and set_uservar) which are most commonly useful for subroutines. This allows object macros to get and set user and bot variables. When getting variables, the master RiveScript struct can provide their values. When setting variables, the Proxy holds a local HashMap of 'staged' data which is committed after your subroutine returns. If you set and then get a variable within your subroutine, you will get back the 'staged' copy from the Proxy.

Here is an example subroutine that gets and sets a user variable:

```rust
bot.set_subroutine("rust-set", |proxy, args| {
    async move {
        if args.len() >= 2 {
            let username = proxy.current_username().unwrap_or(String::new());

            let name = args.get(0).unwrap();
            let value = args.get(1).unwrap();
            let orig_value = proxy.get_uservar(&name).await;

            proxy.set_uservar(name, value).await;
            let staged_value = proxy.get_uservar(&name).await;

            return proxy.finish(format!("For username {username}: The original variable '{name}' was '{orig_value}' and I have updated it to '{value}' (staged value: '{staged_value}')"));
        }
        proxy.finish("Usage: rust-set name value".to_string())
    }.boxed()
});
```

And its usage from RiveScript:

```rivescript
+ rust set * *
- <call>rust-set <star1> "<star2>"</call>
```

# User Variable Session Adapters

By default, RiveScript stores user variables in memory using a HashMap keyed by the username passed in to the reply() function. You can import and export user variables with functions like get_uservars() and set_uservars().

Like most of the other RiveScript implementations, this crate also provides support for pluggable User Variable Session Adapters so you may persist user variables proactively into something like a Redis cache or SQL database.

Examples coming soon!

# Testing It

Git clone this project and run: `cargo run -- eg/brain`

For help: `cargo run -- --help`

# Building

Install [Rust](https://www.rust-lang.org/) and build and test this project
with commands like the following:

* `cargo build`

    Builds the rivescript(.exe) binary.

# Features Supported

This port of RiveScript is "feature complete" and implements all of the commands and tags of RiveScript. The checklist below was used during the development of this module which lays out all of the tasks that a RiveScript interpreter must fulfill.

- [ ] Read and parse RiveScript source documents into memory.
    - [x] load_directory(), load_file() and stream() can access RiveScript sources.
    - [x] Parse document into complete 'abstract syntax tree' mapping out topics,
          triggers and replies.
    - [x] Support all RiveScript **commands**:
        - [x] `! DEFINITION`
        - [x] `> LABEL`
        - [x] `+ TRIGGER`
        - [x] `- RESPONSE`
        - [x] `% PREVIOUS`
        - [x] `^ CONTINUE`
        - [x] `@ REDIRECT`
        - [x] `* CONDITION`
        - [x] `// COMMENT` and `/* multiline comments */`
        - [x] Object macros (collecting names, languages, source code)
    - [x] `! local concat = none|space|newline`
    - [x] `! global depth = 25` can change recursion depth
    - [ ] Syntax checking and strict mode
- [x] Sorting the replies
    - [x] Sorting +Triggers
    - [x] Sorting %Previous
    - [x] Sorting substitution lists
    - [x] Topic inherits/includes.
- [ ] Fetch a reply for the user
    - [x] User variable storage
    - [x] Substitutions (`! sub`)
    - [x] `> begin` blocks
    - [x] -Reply, and (weighted) random responses.
    - [x] @Redirect
    - [x] %Previous
    - [x] *Condition
    - [x] Trigger Tags:
        - [x] `[optionals]`
        - [x] `@arrays`
        - [x] `<bot>` and `<get>` user vars
        - [x] `<input>` and `<reply>` tags
    - [x] Reply Tags:
        - [x] `<star>, <star1> - <starN>`
        - [x] `<botstar>, <botstar1> - <botstarN>` (%Previous)
        - [x] `<input1> - <input9>` (user vars)
        - [x] `<reply1> - <reply9>` (user vars)
        - [x] `<id>`
        - [x] `<noreply>`
        - [x] `<bot>`, `<bot name=value>`
        - [x] `<env>`, `<env name=value>`
        - [x] `<get>, <set>` (user vars)
        - [x] `<add>, <sub>, <mult>, <div>` (user vars)
        - [x] `{topic=...}` (partially; needs user var storage)
        - [x] `{weight=...}`
        - [x] `{@...}, <@>`
        - [ ] `{!...}` (~~DEPRECATED~~)
        - [x] `{random}` and `@(arrays)`
        - [x] `{person}, <person>`
        - [x] `{formal}, <formal>`
        - [x] `{sentence}, <sentence>`
        - [x] `{uppercase}, <uppercase>`
        - [x] `{lowercase}, <lowercase>`
        - [x] `<call>` (object macros)
        - [x] `{ok}`
        - [x] `\s`
        - [x] `\n`
        - [x] `\/`
        - [x] `\#`
- [ ] Make it pass the [RiveScript Test Suite][rsts] to verify it is _at least_ as accurate as the other 5 implementations.
- [ ] Followup niceties:
    - [ ] A JavaScript interpreter for built-in support for JS object macros.
    - [ ] Pluggable user variable session drivers (with e.g. Redis implementation).

# Developer Notes

This may be put somewhere else when the module is closer to "done."

Just some notes about integrating this module as compared to the
other programming languages RiveScript was written in:

* For Rust borrowing/ownership, when the parser finds a +Trigger it
  can not "give" it to the AST immediately like it does in most other
  implementations; because -Reply or *Condition need to write into the
  Trigger reference which it can't do if the AST has it. So the buffer
  for the current Trigger is given to the AST when:
    * Another +Trigger command is found which starts a new trigger;
      the current trigger is given to AST before starting the new one.
    * When a `> begin` or `> topic` is started; any trigger-in-progress
      for the old topic is committed to AST.
    * At the end of the parse phase: if one final trigger was being
      populated it is given to AST before returning.
* In the parser: most implementations do a look-ahead scan both to
  collect `^Continues` (append them to the current line) and to peek
  for `%Previous` underneath triggers. In rivescript-rs we only look
  ahead for `^Continue` and process `%Previous` in the normal command
  switch similar to `@Redirect` or `*Condition`
* A long-standing bug with topic inheritance/includes was uncovered!

    In the eg/brain/rpg.rive `rpg demo` that demonstrates the feature, the
    game would get stuck in topic `puzzle1` because of a conflict with the
    included topic `puzzle` having a duplicate trigger for "west" which
    caused the user to always be taken back to the beginning of the puzzle.

    A very long time ago, RiveScript implementations kept the sorted list
    of triggers in-memory as being a simple list of strings (`Vec<String>`),
    and when the user matched a trigger, the reply details for it were looked
    up from a HashMap. However, that HashMap approach made it impossible to
    have duplicate triggers (as you might want to have when using %Previous,
    e.g. the bot could ask multiple yes/no questions and you could program a
    trigger for `yes` having a %Previous pointing to the bot's question, but
    multiple `yes` triggers would trample over that).

    Somewhere between 2012-2014, in the "v1.0" era of the JavaScript and
    Python ports to RiveScript, the sorted trigger set was changed to hold the
    full response data too, but this introduced a bug in the way that the
    topic inherits/includes feature worked.

    With "included" topics, the sets of triggers for all topics are treated
    as equals and sorted amongst themselves, with only "inherited" topics
    having their own priority. Anyway, since `puzzle1` had a "duplicate"
    trigger "west" shared by included topic `puzzle`, and with the ordering,
    both triggers were added to the sort list but not in the correct order
    (letting puzzle2's version match first).

    This is fixed in the Rust port, by having `inherits::get_topic_triggers`
    prioritize adding the local topic's triggers _first_, while de-duplicating
    copies of those triggers from included topics (allowing the local topic's
    trigger to "shadow" the included one's), before finally mixing in the
    inherited topics. The result is that when sort_replies() is finally doing
    the final sort (by {weight} and inheritance level, etc.), only one copy of
    the `west` trigger exists (from `puzzle1` which over-shadowed `puzzle`)
    resulting in the Rust port of RiveScript being the only one in the last
    14 years that can play the RPG demo correctly.

    Apparently also, the [RiveScript Test Suite][rsts] doesn't exercise this
    feature of RiveScript at all so the bug went unnoticed for many years!

    The JavaScript, Python and Go ports of RiveScript all shared the bug, with
    only the original Perl version (with its legacy implementation) working
    correctly.
* Object macros (Rust subroutines) posed an interesting dilemma!

    In most RiveScript implementations, Subroutines can be defined in the
    native programming language and they tended to accept a reference to the
    master RiveScript struct as their first parameter (so they could get/set
    user variables or manipulate the bot's inner state).

    (Subroutines with a function signature like `(*RiveScript, []args)` can)
    be invoked from a RiveScript reply with the `<call>name args...</call>`
    syntax and have the subroutine's result substituted in its place).

    For the Rust borrow checker, it wasn't possible to share a mutable
    RiveScript with the Subroutine, so instead a Proxy object is sent in.
    The Proxy has a subset of RiveScript functions that the subroutine might
    want (to get/set variables, etc.), and when Reading a variable it will
    come directly from RiveScript or its user variable session store. The
    Proxy also stages writes to variables using its own HashMap, so if you
    set_uservar and then get_uservar you will get the staged copy while within
    your Subroutine, and then RiveScript will commit the staged changes after
    your Subroutine returns.

    See src/main.rs and eg/brain/rust.rive for examples and details.

# License

```
The MIT License (MIT)

Copyright (c) 2022-2026 Noah Petherbridge

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
```

[rsts]: https://github.com/aichaos/rsts