rust-string-manipulation-utf8
A Rust library with string manipulation functions using character indexing (UTF-8)
Library name: string_manipulation_utf8
An implementation of string manipulation functions using character indexing instead of bytes. It uses UTF-8 encoded strings as implemented in Rust.
This library also has common string functions like indexof, substr and substring that exist in other programming languages.
It can be used as functions, or methods from 'str' type (string slice) and 'String' type.
Library functions:
- indexof : get the position from one string into another
- substr : get a substring of a string using start index and length (signed values)
- substru : get a substring of a string using start index and length (unsigned values)
- substr_end : get a substring from start index till the end of the string
- substring : get a substring of a string using start and end index (not included)
- str_remove : Remove a substring from a string
- str_concat! : macro to concatenate multiple strings
Standard Rust functions:
Functions independent of character and byte indexing in Rust.
-
replace : replaces all matches of a pattern with another string
-
replacen : replaces first N matches of a pattern with another string
-
strip_prefix : returns a string slice with the prefix removed
-
contains : check if a string contains another string
-
starts_with : check if a string starts with another string
-
ends_with : check if a string ends with another string
-
is_empty : check if a String has a length of zero
The Rust standard library doesn't support Unicode grapheme clusters (with combining diacritical marks) where multiple code points are required to form one character.
Example:
e + combining acute = e + ´ = \u{0065}\u{0301} = é (two code points with 3 bytes, hex. 65 CC 81)
Versus the character é = \u{00E9} with one code point for 2 bytes, hex. C3 A9
This library uses the Rust standard library and hence will count such combined characters as multiple characters.
See section 'Using byte positioning' for examples with native byte indexing.
Simple benchmarking code was used to find the faster algorithms. GitHub rust-string-manip-benchmark
To compile and run the example code in examples/main.rs:
cargo run --example main
To compile and run the tests in tests/tests.rs:
carto test
Using character positioning
indexof
Get the character position from one string into another. Start searching from character 'start_index'. Returns None if not found. Index of the first character is 0.
Syntax:
str.indexof(searchstring: &str, start_index: usize) -> Option<usize>
string.indexof(searchstring: &str, start_index: usize) -> Option<usize>
indexof(s: &str, searchstring: &str, start_index: usize) -> Option<usize>
Example:
Return the character index of "test" in the given string. Start searching at the beginning of the string. Result position is 0 because "test" starts at the beginning of the string.
use CharString; // String and str methods
use indexof; // str function
Return the character index of "test" in the given string. Start searching from character index 6. The result is position 14.
use indexof;
use CharString; // String and str methods.
substr
Get a substring of a string, beginning at character index 'start_index' and take 'length' characters.
Negative numbers count backwards:
'start_index' from the end of the string.
'length' from 'start_index'.
If start_index exceeds the string boundary limits, return an empty string. (Similar to C++ std::substr() and c# String.Substring.)
'length' can be isize::MAX or isize::MIN to get the substring until the positive or negative string boundary without the need to calculate the length. (Alternatively, see substr_end in this library.)
Index of the first character is 0.
If 'start_index' and 'length' are positive, substru is a little faster like string.chars().skip(start_index).take(length).collect() that it interpolates. See substru and section 'Standard Rust methods' for examples.
Syntax:
str.substr(start_index: isize, length: isize) -> String
string.substr(start_index: isize, length: isize) -> String
substr(s: &str, start_index: isize, length: isize) -> String
Example:
use CharString; // String and str methods
Example:
use substr;
use CharString; // String and str methods
Remark:
To get a substring from 'start_index' until the end of the string:
substr(string, start_index, isize::MAX)
substr_end(string, start_index)
substr(string, start_index, string.chars().count() is isize - start_index)
substru
Same as substr, but only accepts unsiged values for 'start_index' and 'length'.
For positive numbers this is faster than using substr.
It interpolates the code: s.chars().skip(start_index).take(length).collect::()
Syntax:
str.substru(start_index: usize, length: usize) -> String
string.substru(start_index: usize, length: usize) -> String
substru(s: &str, start_index: usize, length: usize) -> String
substring
Get a substring of a string beginning at character index 'start_index' up to and excluding the character index 'end_index'.
Equivalent of JavaScript substring with 2 parameters.
If 'start_index' is equal to 'end_index', substring() returns an empty string.
If 'start_index' is greater than 'end_index', swap 'start_index' and 'end_index'.
Any argument value that is less than 0 is treated as if it were 0.
Any argument value that is greater than string length is treated as if it were string length.
Index of the first character is 0.
Syntax:
str.substring(start_index: isize, end_index: isize) -> String
string.substring(start_index: isize, end_index: isize) -> String
substring(s: &str, start_index: isize, end_index: isize) -> String
Example:
use CharString; // String and str methods
use substring; // str function
substr_end
Get a substring from character index 'start_index' till end of the string.
'start_index' can be negative to count backwards from the end of the string.
If start_index exceeds the string boundary limits, return an empty string.
(Similar to C++ std::substr() and c# String.Substring.)
Index of the first character is 0.
Because Rust doesn't have a practical default value for function parameters, substr_end()
replaces substr(string, start_index), string.substr(start_index).
Same result with: substr(string, start_index, isize::MAX)
Syntax:
substr_end(s: &str, start_index: isize) -> String
string.substr_end(start_index: isize) -> String
str.substr_end(start_index: isize) -> String
assert_eq!;
assert_eq!;
assert_eq!;
assert_eq!;
assert_eq!;
Example:
use substr_end;
use CharString; // String and str methods // str function
str_remove
Remove a substring from a string. Beginning at character index 'start_index' and take 'length' characters.
Index of the first character is 0.
Syntax:
str.str_remove(start_index: usize, length: usize) -> String
string.str_remove(start_index: usize, length: usize) -> String
str_remove(s: &str, start_index: usize, length: usize) -> String
Examples:
use str_remove;
use CharString; // String and str methods // str function
str_concat
Macro to concatenate multiple strings.
All strings are borrowed.
First allocates the needed capacity, then adds the stings.
Syntax:
str_concat!(&str1, &str2, ...)
Examples:
use str_concat;
Alternatives with Rust statements.
The Rust 'std::concat!' macro only works with literals. Ex. concat!("test", 10, 'b', true)
Using the std::format macro.
format!("{}{}{}", s1, s2, s3)
When adding strings with the + operator, the first string is moved (move of ownership), from the second string it's borrowed.
s1.clone() + &s2 + &s3
s1.to_owned() + &s2 + &s3
Standard Rust methods
Standard Rust methods independent of character or byte indexing.
-
replace : Replaces all matches of a pattern with another string.
-
replacen : Replaces first N matches of a pattern with another string.
-
strip_prefix : Returns a string slice with the prefix removed if the search string is found at the beginning of the string.
-
strip_suffix : Return a string slice with suffix removed if the search string is found at the end of the string.
-
contains : Check if the given pattern matches a sub-slice of this string slice.
-
starts_with : Check if the given pattern matches a prefix of this string slice.
-
ends_with : Check if the given pattern matches a suffix of this string slice.
-
is_empty : Check if this String has a length of zero.
-
chars() : Getting a substring with the chars iterator.
Examples:
Getting a substring with the Rust chars() module that returns an iterator over the string characters. Skip(), take() and count() consume the chars iterator.
Using byte positioning
Get a substring using byte positions with standard Rust methods.
Using a string slice:
use str_concat;
Using a string:
use str_concat;
Shorter version:
use str_concat;