Ergonomic, garbage collected strings for Rust.
EZString is similar to the strings in high level languages such as
Python and Java. It is designed to be as easy to use as possible by always returning owned values,
using reference counting and copy-on-write under the hood in order to make this efficient.
# Getting Started
[easy_strings is available on crates.io](https://crates.io/crates/easy_strings).
Add the following dependency to your Cargo manifest.
```toml
[dependencies]
easy_strings = "0.2"
```
Then import it in your code.
```rust
extern crate easy_strings;
use easy_strings::{EZString, ez};
```
# Creation
The most common way to create an EZString is from a string literal, using the ez() helper
function. This interns the string so that calling it multiple times with the same string literal
won't result in multiple copies or allocations. (It still requires locking and querying the
interned string table each time.)
```rust
use easy_strings::{ez};
let s = ez("Hello, world!");
```
You can also create EZString from existing Strings or &strs.
```rust
use easy_strings::{EZString};
let s = EZString::from("foo");
let s = EZString::from("foo".to_string());
```
# Concatenation
To concatenate strings, write `&a + &b`. This syntax works regardless of the types of a and b,
whether they are EZString, &EZString, String, &String, or &str, as long as either a or b is
an EZString or &EZString.
```rust
let e = ez("E");
let re = &e;
let s = "s".to_string();
let rs = &s;
let lit = "lit";
assert_eq!(&e + &e, "EE");
assert_eq!(&e + &re, "EE");
assert_eq!(&e + &s, "Es");
assert_eq!(&e + &rs, "Es");
assert_eq!(&e + &lit, "Elit");
assert_eq!(&lit + &e, "litE");
assert_eq!(&lit + &re, "litE");
assert_eq!(&s + &re, "sE");
assert_eq!(&rs + &e, "sE");
```
Note: If you're using Clippy, you should `#[allow(needless_borrow)]` or you'll get a lot of warnings.
You can also concatenate multiple strings this way, as long as at least one of the first two is EZString
or &EZString.
```rust
assert_eq!(&lit + &re + &s + &e + &e + &rs, "litEsEEs");
```
You can also use the += operator. This is optimized to only copy the left hand string when it is not
uniquely owned. This means that the following loop is O(n) rather than O(n^2 ) and there is no
need for a seperate StringBuilder type like there is in Java.
```rust
let mut s = ez("Some numbers: ");
for i in 0..5 {
s += &i.to_string();
s += &", ";
}
assert_eq!(s, "Some numbers: 0, 1, 2, 3, 4, ");
```
# Slicing
Slicing is done via the substr() method. Note that the indices are by byte, not code point. If
the provided indices are not on a code point boundary, substr() will panic.
```rust
let mut a = ez("Hello, world!");
assert_eq!(a.substr(1..), "ello, world!");
assert_eq!(a.substr(..6), "Hello,");
assert_eq!(a.substr(1..6), "ello,");
assert_eq!(a.substr(1..a.len()-1), "ello, world");
let b = a.substr(1..3);
a += &b; // b is a copy, so we can freely mutate a
```
substr() returns the substring as a new EZString. If you want a borrowed slice instead, you
can use []. This avoids the extra copy and allocation, at the expense of forcing you to worry
about lifetimes, which easy_strings was designed to avoid.
```rust
let b = &a[1..3];
assert_eq!(b, "el");
// a += &b; // compile error because b borrowed a
```
# Equality
Equality testing between multiple EZStrings or &EZStrings just works. If you want to compare to
a String or &str, the EZString should be on the left. If it is on the right, you'll have to
prefix it with * (or ** for &EZString).
```rust
let e = ez("AAA");
let er = &e;
let s = String::from("AAA");
let sr = &s;
let lit = "AAA";
assert!(e == e);
assert!(er == er);
assert!(e == er);
assert!(er == e);
assert!(e == s);
assert!(e == sr);
assert!(e == lit);
assert!(er == s);
assert!(er == sr);
assert!(er == lit);
assert!(s == *e);
assert!(*sr == *e);
assert!(lit == *e);
assert!(s == **er);
assert!(*sr == **er);
assert!(lit == **er);
```
# Cloning
EZString is not Copy, which means you must clone it whenever you want to reuse it _by value_.
To work around this, it is recommended that your functions always take EZString parameters by
reference and return owned EZStrings. This provides maximum flexibility to the caller and avoids
requiring clone()s everywhere. EZString's own methods, such as trim() here, already do this.
```rust
// bad: requires caller to clone() argument
fn foo(s: EZString) -> EZString { s.trim() }
// good
fn bar(s: &EZString) -> EZString { s.trim() }
```
That being said, sometimes taking by value is unavoidable. In this case, you need to clone your
string. Remember, this doesn't actually copy the string, it just increments the reference count.
The simplest and most standard way is to call .clone(). However, if this is too verbose for your
taste, there is also a shorthand .c() method. c() also has the advantage of always cloning the
underlying EZString, even if you call it on nested references (clone() clones the reference
instead in this case).
```rust
let mut v: Vec<EZString> = Vec::new();
let s = ez("foo");
let rs = &s;
let rrs = &rs;
v.push(s.clone());
v.push(s.c());
v.push(rs.clone());
v.push(rs.c());
// v.push(rrs.clone()); // compile error
v.push(rrs.c());
```
# Coercions
Most libraries operate on Strings and &strs, rather than EZStrings. Luckily, EZString Derefs to
&str, so in most cases, you can pass &s in and it will just work,
```rust
fn take_str(_: &str) {}
let s = ez("");
let rs = &s;
take_str(&s);
take_str(&rs);
```
In complicated cases, such as with generic functions, inference may not work. In that case, you
can explicitly get a &str via as_str().
```rust
take_str(s.as_str());
take_str(rs.as_str());
```
If a function requires an owned String, you can use the to_string() method.
```rust
fn take_string(_: String) {}
take_string(s.to_string());
```
# String searching
The contains(), starts_with(), ends_with(), find(), and rfind() methods are generic, meaning
that you'll get a confusing compile error if you naively pass in an EZString. The easiest
solution is to use as_str() as mentioned in the previous section. Alternatively, you can write
`&*s` for EZStrings and `&**s` for &EZStrings. No special syntax is required to pass in a literal.
```rust
let s = ez("Hello, world!");
assert!(s.contains("o, wo"));
assert!(s.starts_with("Hello"));
assert!(s.ends_with("world!"));
assert!(!s.ends_with("worl"));
assert_eq!(s.find("ld"), Some(10));
assert_eq!(s.find("l"), Some(2));
assert_eq!(s.rfind("l"), Some(10));
let p = ez("wor");
let r = &p;
assert!(s.contains(&*p));
assert!(s.contains(&**r));
assert!(s.contains(p.as_str()));
assert!(s.contains(r.as_str()));
```
Note that find() and rfind() return an Option. To get behavior similar to Python's str.index(),
which throws if the substring isn't present, just call unwrap() on the result.
```rust
assert_eq!(s.find("ld").unwrap(), 10);
```
# String splitting
You can split by newlines, whitespace, or a provided substring. The returned iterators wrap
the results in new EZStrings.
```rust
let s = ez(" Hello, world!\nLine two. ");
assert_eq!(s.lines().collect::<Vec<_>>(), vec![ez(" Hello, world!"), ez("Line two. ")]);
assert_eq!(s.split_whitespace().collect::<Vec<_>>(),
vec![ez("Hello,"), ez("world!"), ez("Line"), ez("two.")]);
let s = ez("aaa-bbb-ccc");
assert_eq!(s.split("-").collect::<Vec<_>>(), vec![ez("aaa"), ez("bbb"), ez("ccc")]);
assert_eq!(s.rsplit("-").collect::<Vec<_>>(), vec![ez("ccc"), ez("bbb"), ez("aaa")]);
```
You can also limit the number of splits via splitn().
```rust
let s = ez("aaa-bbb-ccc");
assert_eq!(s.splitn(2, "-").collect::<Vec<_>>(), vec![ez("aaa"), ez("bbb-ccc")]);
assert_eq!(s.rsplitn(2, "-").collect::<Vec<_>>(), vec![ez("ccc"), ez("aaa-bbb")]);
```
split_terminator() and rsplit_terminator() are the same as split()/rsplit() except that
if the final substring is empty, it is skipped. This is useful if the string is
terminated, rather than seperated, by a seperator.
```rust
let s = ez("aaa-bbb-");
assert_eq!(s.split("-").collect::<Vec<_>>(), vec![ez("aaa"), ez("bbb"), ez("")]);
assert_eq!(s.split_terminator("-").collect::<Vec<_>>(), vec![ez("aaa"), ez("bbb")]);
assert_eq!(s.rsplit_terminator("-").collect::<Vec<_>>(), vec![ez("bbb"), ez("aaa")]);
let s = ez("aaa-bbb");
assert_eq!(s.split("-").collect::<Vec<_>>(), vec![ez("aaa"), ez("bbb")]);
assert_eq!(s.split_terminator("-").collect::<Vec<_>>(), vec![ez("aaa"), ez("bbb")]);
```
Although the iterators are lazy, they hold a reference to a copy of the string at time of
creation. Therefore, if you later modify the string, the iteration results don't change.
```rust
let mut s = ez("aaa-bbb-ccc");
let it = s.split("-");
s += &"-ddd";
assert_eq!(it.collect::<Vec<_>>(), vec![ez("aaa"), ez("bbb"), ez("ccc")]);
let it2 = s.split("-");
assert_eq!(it2.collect::<Vec<_>>(), vec![ez("aaa"), ez("bbb"), ez("ccc"), ez("ddd")]);
```
# Returning Iterators
Every iteration method returns a distinct type. If you want to return one of several iterators,
you need to either box them or eagerly evaluate them.
For example, suppose you wanted to emulate
Python's str.split() method, which splits on a substring if one is passed in and splits on
whitespace if no argument is passed. The naive approach doesn't work as EZString::split() and
EZString::split_whitespace() return distinct types. One solution is to eagerly evaluate them
and return a
list of strings.
```rust
fn split<'a, P: Into<Option<&'a str>>>(s: &EZString, sep: P) -> Vec<EZString> {
match sep.into() {
Some(sep) => s.split(sep).collect(),
None => s.split_whitespace().collect(),
}
}
let s = ez("x x-x 77x");
assert_eq!(split(&s, "x"), vec![ez(""), ez(" "), ez("-"), ez(" 77"), ez("")]);
assert_eq!(split(&s, None), vec![ez("x"), ez("x-x"), ez("77x")]);
```
Alternatively, you can box the iterators, thus preserving the laziness.
```rust
fn split<'a, P: Into<Option<&'a str>>>(s: &EZString, sep: P) -> Box<Iterator<Item=EZString>> {
match sep.into() {
Some(sep) => Box::new(s.split(sep)),
None => Box::new(s.split_whitespace()),
}
}
```
# Trimming
The trim(), trim_left(), and trim_right() methods trim whitespace from the ends of the string.
```rust
assert_eq!(ez(" hello \n ").trim(), "hello");
let s = ez(" hello \n ").trim_right();
assert_eq!(s, " hello");
assert_eq!(s.trim_left(), "hello");
```
trim_left_matches() and trim_right_matches() trim matches of a given substring from the ends of
the string. Note that unlike Python, they do not take a set of characters to trim, but a substring.
Note that trim_matches() is different from all of the other methods. It takes a char rather than
a substring.
```rust
assert_eq!(ez(" hello ").trim_matches(' '), "hello");
let s = ez(" x xhello x x x").trim_right_matches(" x");
assert_eq!(s, " x xhello");
assert_eq!(s.trim_left_matches(" x"), "hello");
```
# String replacement
You can replace one substring with another via .replace().
```rust
let s = ez("one fish two fish, old fish, new fish");
assert_eq!(s.replace("fish", "bush"), "one bush two bush, old bush, new bush");
assert_eq!(s.replace(&ez("fish"), &ez("bush")), "one bush two bush, old bush, new bush");
```
You can also replace a the first n occurences of a substring via .replacen()
```rust
# use easy_strings::*;
let s = ez("one fish two fish, old fish, new fish");
assert_eq!(s.replacen("fish", "bush", 3), "one bush two bush, old bush, new fish");
assert_eq!(s.replacen(&ez("fish"), &ez("bush"), 2), "one bush two bush, old fish, new fish");
```
# Other methods
to_lowercase(), to_uppercase(), and repeat() are pretty much self explanatory.
```rust
let s = ez("Hello, World!");
assert_eq!(s.to_lowercase(), "hello, world!");
assert_eq!(s.to_uppercase(), "HELLO, WORLD!");
assert_eq!(s.repeat(3), "Hello, World!Hello, World!Hello, World!");
```
Note that to_lowercase and to_uppercase are Unicode aware, but locale independent.
i.e. there is no way to get Turkish capitalization for 'i'.
```rust
let s = ez("ὈΔΥΣΣΕΎΣ");
assert_eq!(s.to_lowercase(), "ὀδυσσεύς");
```
# Pointer equality
The == operator tests for _value_ equality, that is whether the given strings contain the same
bytes. If you want to test whether two EZStrings share the same underlying buffer, you can use the
ptr_eq() method. Note that since EZString is copy-on-write, there is no observeable effect of
sharing buffers, apart from reduced memory usage. Therefore, this method is rarely useful.
```rust
let a = ez("xxx");
let mut b = a.clone();
let c = &ez("xx") + &ez("x");
assert!(a.ptr_eq(&b));
assert!(b == c && !b.ptr_eq(&c));
b += &"foo";
// += is copy on write, so b no longer points to a
assert!(!a.ptr_eq(&b));
assert!(a == "xxx");
assert!(b == "xxxfoo");
```