1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
/*!
The afsort crate implements a sorting algorithm based on
[American Flag sort](https://en.wikipedia.org/wiki/American_flag_sort). The implementation is
currently limited to sort byte slices, e.g. Strings. The main motivation is to sort strings of
text, so most of the benchmarks are based on English text strings. When sorting English words,
this implementation seems to be about 40% faster than `sort_unstable` from the Rust standard
library.
For small input, this method falls back to the standard library.
# Installation
Add the depndency to your `Cargo.toml`:
```ignore
[dependencies]
afsort = "0.1"
```
In your crate root:
```ignore
extern crate afsort;
```
# Usage
You can now use afsort to e.g. sort arrays of strings or string slices.
```rust
use afsort;
let mut strings = vec!("red", "green", "blue");
afsort::sort_unstable(&mut strings);
assert_eq!(strings, vec!["blue", "green", "red"]);
```
You can also sort by an extractor function, e.g.:
```rust
use afsort;
let mut tuples = vec![("b", 2), ("a", 1)];
afsort::sort_unstable_by(&mut tuples, |t: &(&str, _) | t.0.as_bytes());
assert_eq!(tuples, vec![("a", 1), ("b", 2)]);
```
# Motivation
Essentially, I noticed that sorting of strings took a long time when using the
[fst](https://github.com/BurntSushi/fst) crate, since it requires the input to be ordered.
Since sorting strings is a general problem, this is now a crate.
# Performance
As mentioned, this implementation seems to be about 40% faster than the sort in the standard
library, when sorting strings of random English words. The implementation is fairly naive,
so I would not be surprised if it could be improved further.
You can run the benchmark tests using `cargo bench` (currently requires nightly rust), like this:
```ignore
% cargo bench
Compiling afsort v0.1.0 (file:///home/anton/dev/off/afsort)
Finished release [optimized] target(s) in 5.66 secs
Running target/release/deps/afsort-ca28db3ba0643253
running 12 tests
test tests::sorts_strings_same_as_unstable ... ignored
test tests::sorts_tuples_same_as_unstable ... ignored
test tests::sort_100000_en_af ... bench: 20,402,968 ns/iter (+/- 1,512,907)
test tests::sort_100000_en_std ... bench: 30,132,067 ns/iter (+/- 1,698,223)
test tests::sort_10000_en_af ... bench: 1,377,303 ns/iter (+/- 114,481)
test tests::sort_10000_en_lower_af ... bench: 1,371,022 ns/iter (+/- 95,391)
test tests::sort_10000_en_lower_std ... bench: 2,227,486 ns/iter (+/- 127,281)
test tests::sort_10000_en_sorted_af ... bench: 878,665 ns/iter (+/- 545,256)
test tests::sort_10000_en_sorted_std ... bench: 618,329 ns/iter (+/- 536,338)
test tests::sort_10000_en_std ... bench: 2,221,089 ns/iter (+/- 157,461)
test tests::sort_1000_en_af ... bench: 101,625 ns/iter (+/- 6,946)
test tests::sort_1000_en_std ... bench: 171,655 ns/iter (+/- 9,844)
test result: ok. 0 passed; 0 failed; 2 ignored; 10 measured; 0 filtered out
```
# Limitations
Currently, this crate only supports sorting elements of `AsRef<[u8]>`. I think that this is a
good first step, since it supports sorting of Strings. There is however no reason why American
Flag sorting should be limited to this data type. Any kind of element that can deliver a radix
to a certain digit/depth can be sorted using this technique.
The American Flag algorithm is unstable, in the same way that sort_unstable in the standard
library. That is, equal elements might be re-ordered.
This crate can _only_ sort strings based on their `utf-8` byte values. For many problems, this
is fine. However, if you want to sort strings for display to a user, Locale might matter. This
crate does not try to address this issue.
# Testing
Testing is done using the [quickcheck](https://github.com/BurntSushi/quickcheck) crate. We run
about 50k different variations of input strings. We treat the standard library's sort_unstable
as the gold standard.
*/
extern crate quickcheck;
/// Enhances slices of e.g. Strings to have a `af_sort_unstable` method, as a more idiomatic
/// way to call sort.
///
/// #Example
///
/// ```rust
/// use afsort::AFSortable;
///
/// let mut strings = vec!["c", "a", "b"];
/// strings.af_sort_unstable();
/// assert_eq!(strings, vec!["a", "b", "c"]);
/// ```
/// Main sort method.
///
/// #Example
///
/// ```rust
/// let mut strings = vec!["c", "a", "b"];
/// afsort::sort_unstable(&mut strings);
/// assert_eq!(strings, vec!["a", "b", "c"]);
/// ```
/// Sort method which accepts function to convert elements to &[u8].
///
/// #Example
///
/// ```rust
/// let mut tuples = vec![("b", 2), ("a", 1)];
///afsort::sort_unstable_by(&mut tuples, |t: &(&str, _) | t.0.as_bytes());
///assert_eq!(tuples, vec![("a", 1), ("b", 2)]);
/// ```
///
/// Footnote: The explicit type annotacion in the closure seems to be needed (even though it should
/// not). See
/// [this discussion](https://users.rust-lang.org/t/lifetime-issue-with-str-in-closure/13137).