sqlite-regex 0.2.3

A SQLite extension for working with regular expressions
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
# sqlite-regex Documentation

A full reference to every function and module that sqlite-regex offers.

As a reminder, sqlite-regex follows semver and is pre v1, so breaking changes are to be expected.

## API Reference

<h3 name="regexp"><code>regexp()</code></h3>

An implementation of the `REGEXP()` operator for SQLite, described here:

> _The REGEXP operator is a special syntax for the regexp() user function. No regexp() user function is defined by default and so use of the REGEXP operator will normally result in an error message. If an application-defined SQL function named "regexp" is added at run-time, then the "X REGEXP Y" operator will be implemented as a call to "regexp(Y,X)"._ >

<small><i><a href="https://www.sqlite.org/lang_expr.html">SQL Language Expressions</a></i>, on sqlite.org</small>

This can be used with the `text regexp pattern` or `regexp(pattern, text)` syntax. See the [regex crate documentation](https://docs.rs/regex/latest/regex/struct.Regex.html) for allowed syntax/features in the regex pattern string.

```sql
select regexp('[abc]', 'a'); -- 1
select regexp('[abc]', 'x'); -- 0

select 'a' regexp '[abc]'; -- 1
select 'x' regexp '[abc]'; -- 0


--
```

<h3 name="regex"><code>regex(pattern)</code></h3>

Creates a regex "object" with the given pattern, using [SQLite's pointer passing interface](https://www.sqlite.org/bindptr.html). Useful when caching regex patterns in heavy queries that use `sqlite-regex` table functions, like [`regex_split()`](#regex_split) or [`regex_find_all()`](#regex_find_all).

Note that the return value will appear to be `NULL` because of SQLite pointer passing interface. To debug, use [`regex_print()`](#regex_print) to print the pattern string of a regex object.

```sql
select regex('[abc]'); -- NULL, but is still a regex "object"
select regex("[abc"); -- Errors with 'Error parsing pattern as regex: ...'

select regex_print(regex('[abc]')); -- '[abc]'
```

<h3 name="regex_print"><code>regex_print(regex)</code></h3>

Prints the pattern of a regex object created with [`regex()`](#regex).

```sql
select regex_print(regex('[abc]')); -- '[abc]'
--
```

<h3 name="regex_valid"><code>regex_valid(pattern)</code></h3>

Returns 1 if the given pattern is a valid regular expression, 0 otherwise.

```sql
select regex_valid('abc'); -- 1
select regex_valid('[abc]'); -- 1
select regex_valid('[abc'); -- 0
select regex_valid(''); -- 1
--
```

<h3 name="regex_find"><code>regex_find(pattern, text)</code></h3>

Find and return the text of the given pattern in the string, or NULL otherwise. Errors if `pattern` is not legal regex. Based on [`Regex.find()`](https://docs.rs/regex/latest/regex/struct.Regex.html#method.find).

```sql
select regex_find(
  '[0-9]{3}-[0-9]{3}-[0-9]{4}',
  'phone: 111-222-3333'
);
-- '111-222-3333'
```

<h3 name="regex_find_all"><code>select * from regex_find_all(pattern, text)</code></h3>

Find all instances of a pattern in the given text. Based on [`Regex.find_iter()`](https://docs.rs/regex/latest/regex/struct.Regex.html#method.find_iter).

The returned columns:

- `rowid`: The 0-based index of the match.
- `start`: The 0-based index of the starting character of the match inside the text.
- `end`: The 0-based index of the ending character of the match inside the text.
- `match`: The full string match.

For faster results, wrap the pattern with the [`regex()`](#regex) function for caching.

```sql
select rowid, *
from regex_find_all(
  regex('\b\w{13}\b'),
  'Retroactively relinquishing remunerations is reprehensible.'
);
/*
┌───────┬───────┬─────┬───────────────┐
│ rowid │ start │ end │     match     │
├───────┼───────┼─────┼───────────────┤
│ 0     │ 0     │ 13  │ Retroactively │
│ 1     │ 14    │ 27  │ relinquishing │
│ 2     │ 28    │ 41  │ remunerations │
│ 3     │ 45    │ 58  │ reprehensible │
└───────┴───────┴─────┴───────────────┘
```

<h3 name="regex_capture"><code>regex_capture(pattern, text, group)</code></h3>

Returns the text of the capture group with the specific `group` index or name, or NULL otherwise. Errors if `pattern` is not legal regex. Based on [`Regex.captures()`](https://docs.rs/regex/latest/regex/struct.Regex.html#method.captures).

If `group` is a number, then the N-th capture group is returned, where `0` refers to the entire match, `1` refers to the first left-most capture group in the match, `2` the second, and so on. If the provided group number "overflows', then NULL is returned.

```sql
select regex_capture(
  "'(?P<title>[^']+)'\s+\((?P<year>\d{4})\)",
  "Not my favorite movie: 'Citizen Kane' (1941).",
  0
);
-- "'Citizen Kane' (1941)"

select regex_capture(
  "'(?P<title>[^']+)'\s+\((?P<year>\d{4})\)",
  "Not my favorite movie: 'Citizen Kane' (1941).",
  1
);
-- "Citizen Kane"


select regex_capture(
  "'(?P<title>[^']+)'\s+\((?P<year>\d{4})\)",
  "Not my favorite movie: 'Citizen Kane' (1941).",
  2
);
-- "1941"

select regex_capture(
  "'(?P<title>[^']+)'\s+\((?P<year>\d{4})\)",
  "Not my favorite movie: 'Citizen Kane' (1941).",
  3
);
-- NULL
```

If group is a string, then the value of the capture group with the same name is returned. If there is no matching capture group with the name, or the group was not captured, then NULL is returned.

```sql
select regex_capture(
  "'(?P<title>[^']+)'\s+\((?P<year>\d{4})\)",
  "Not my favorite movie: 'Citizen Kane' (1941).",
  'title'
);
-- "Citizen Kane"


select regex_capture(
  "'(?P<title>[^']+)'\s+\((?P<year>\d{4})\)",
  "Not my favorite movie: 'Citizen Kane' (1941).",
  'year'
);
-- "1941"

select regex_capture(
  "'(?P<title>[^']+)'\s+\((?P<year>\d{4})\)",
  "Not my favorite movie: 'Citizen Kane' (1941).",
  'not_exist'
);
-- NULL
```

Note that there is a version of `regex_capture()` that only have two parameters: `captures` and `group`. This can only be used with the [`regex_captures`](#regex_captures) table function, with the special `captures` column like so:

```sql
select
  regex_capture(captures, 'title')      as title,
  regex_capture(captures, 'year')       as year,
  regex_capture(captures, 'not_exist')  as not_exist
from regex_captures(
  regex("'(?P<title>[^']+)'\s+\((?P<year>\d{4})\)"),
  "'Citizen Kane' (1941), 'The Wizard of Oz' (1939), 'M' (1931)."
);
/*
┌──────────────────┬──────┬───────────┐
│      title       │ year │ not_exist │
├──────────────────┼──────┼───────────┤
│ Citizen Kane     │ 1941 │           │
│ The Wizard of Oz │ 1939 │           │
│ M                │ 1931 │           │
└──────────────────┴──────┴───────────┘
*/
```

<h3 name="regex_captures"><code>select * from regex_captures(pattern, text)</code></h3>

Returns all non-overlapping capture groups in the given text. Similar to [`regex_find_all`](#regex_find_all), but allows for extracting capture information. Must use with the [`regex_capture`](#regex_capture) function to extract capture group values. Based on [`Regex.captures_iter()`](https://docs.rs/regex/latest/regex/struct.Regex.html#method.captures_iter).

The returned columns:

- `rowid`: The 0-based index of the match. `0` is the entire match, `1` the first matching capture group, `2` the second, etc.
- `captures`: A special value that's meant to be passed into [`regex_capture()`]#regex_capture. Will appear NULL through direct access.

For faster results, wrap the pattern with the [`regex()`](#regex) function for caching.

```sql
select
  rowid,
  captures,
  regex_capture(captures, 0)        as "0",
  regex_capture(captures, 1)        as "1",
  regex_capture(captures, 2)        as "2",
  regex_capture(captures, 3)        as "3"
from regex_captures(
  regex("'(?P<title>[^']+)'\s+\((?P<year>\d{4})\)"),
  "'Citizen Kane' (1941), 'The Wizard of Oz' (1939), 'M' (1931)."
);
/*
┌───────┬──────────┬───────────────────────────┬──────────────────┬──────┬───┐
│ rowid │ captures │             0             │        1         │  2   │ 3 │
├───────┼──────────┼───────────────────────────┼──────────────────┼──────┼───┤
│ 0     │          │ 'Citizen Kane' (1941)     │ Citizen Kane     │ 1941 │   │
│ 1     │          │ 'The Wizard of Oz' (1939) │ The Wizard of Oz │ 1939 │   │
│ 2     │          │ 'M' (1931)                │ M                │ 1931 │   │
└───────┴──────────┴───────────────────────────┴──────────────────┴──────┴───┘
*/
```

```sql
select
  rowid,
  captures,
  regex_capture(captures, 'title')  as title,
  regex_capture(captures, 'year')  as year,
  regex_capture(captures, 'blah')  as blah
from regex_captures(
  regex("'(?P<title>[^']+)'\s+\((?P<year>\d{4})\)"),
  "'Citizen Kane' (1941), 'The Wizard of Oz' (1939), 'M' (1931)."
);
/*
┌───────┬──────────┬──────────────────┬──────┬──────┐
│ rowid │ captures │      title       │ year │ blah │
├───────┼──────────┼──────────────────┼──────┼──────┤
│ 0     │          │ Citizen Kane     │ 1941 │      │
│ 1     │          │ The Wizard of Oz │ 1939 │      │
│ 2     │          │ M                │ 1931 │      │
└───────┴──────────┴──────────────────┴──────┴──────┘
*/
```

<h3 name="regex_replace"><code>regex_replace(pattern, text, replacement)</code></h3>

Replace the **first** instance of `pattern` inside `text` with the given `replacement` text. Supports the [replacment string syntax](https://docs.rs/regex/latest/regex/struct.Regex.html#replacement-string-syntax). Based on [`Regex.replace()`](https://docs.rs/regex/latest/regex/struct.Regex.html#method.replace)

```sql

select regex_replace(
  '[^01]+',
  '1078910',
  ''
);
-- '1010'

select regex_replace(
  '(?P<last>[^,\s]+),\s+(?P<first>\S+)',
  'Springsteen, Bruce',
  '$first $last'
);
-- 'Bruce Springsteen'
```

<h3 name="regex_replace_all"><code>regex_replace_all(pattern, text, replacement)</code></h3>

Replace **all** instance of `pattern` inside `text` with the given `replacement` text. Supports the [replacment string syntax](https://docs.rs/regex/latest/regex/struct.Regex.html#replacement-string-syntax). Based on [`Regex.replace_all()`](https://docs.rs/regex/latest/regex/struct.Regex.html#method.replace_all)

```sql

select regex_replace_all(
  'dog',
  'cat dog mouse dog',
  'monkey'
)
-- 'cat monkey mouse monkey'
```

<h3 name="regex_split"><code>select * from regex_split(pattern, text)</code></h3>

Split the given text on each instance of the given pattern. Based on [`Regex.split()`](https://docs.rs/regex/latest/regex/struct.Regex.html#method.split).

The returned columns:

- `rowid`: The 0-based index of the split item.
- `item`: The individual split item, as text.

For faster results, wrap the pattern with the [`regex()`](#regex) function for caching.

```sql
select rowid, *
from regex_split(
  regex('[ \\t]+'),
  'a b \t  c\td    e'
);
/*
┌───────┬──────┐
│ rowid │ item │
├───────┼──────┤
│ 0     │ a    │
│ 1     │ b    │
│ 2     │ c    │
│ 3     │ d    │
│ 4     │ e    │
└───────┴──────┘
*/
```

<h3 name="regexset"><code>regexset(pattern1, patern2, ...)</code></h3>

Creates a regexset "object" with the given pattern, using [SQLite's pointer passing interface](https://www.sqlite.org/bindptr.html). Required when using `regexset_is_match` and `regexset_matches`. Based on [`RegexSet`](https://docs.rs/regex/latest/regex/struct.RegexSet.html).

Note that the return value will appear to be `NULL` because of SQLite pointer passing interface. To debug, use [`regexset_print()`](#regexset_print) to print the pattern string of a regex object.

```sql
select regexset(
  "bar",
  "foo",
  "barfoo"
);
-- NULL, but is still a regexset "object"

select regexset("[abc"); --errors

select regexset_print(regexset('abc', 'xyz')); -- '["abc","xyz"]'
```

<h3 name="regexset_print"><code>regexset_print()</code></h3>

Prints the patterns of a regexset object created with [`regexset()`](#regexset).

```sql
select regexset_print(regexset('abc', 'xyz')); -- '["abc","xyz"]'
```

<h3 name="regexset_is_match"><code>regexset_is_match(regexset, text)</code></h3>

Returns 1 if any of the patterns in `regexset` matches `text`. Based on [`RegexSet.is_match()`](https://docs.rs/regex/latest/regex/struct.RegexSet.html#method.is_match).

```sql
select regexset_is_match(
  regexset(
    "bar",
    "foo",
    "barfoo"
  ),
  'foobar'
); -- 1

select regexset_is_match(
  regexset(
    "bar",
    "foo",
    "barfoo"
  ),
  'xxx'
); -- 0

```

<h3 name="regexset_matches"><code>select * from regexset_matches(regexset, text)</code></h3>

Returns all the matching patterns inside `regexset` found inside `text`. Note that this doesn't return rows for each of the matches themselves, only if there was at least 1 match for each patten. Based on [`RegexSet.matches()`](https://docs.rs/regex/latest/regex/struct.RegexSet.html#method.matches).

```sql
select
  key,
  pattern
from regexset_matches(
  regexset(
    '\w+',
    '\d+',
    '\pL+',
    'foo',
    'bar',
    'barfoo',
    'foobar'
  ),
  'foobar'
);
/*
┌─────┬─────────┐
│ key │ pattern │
├─────┼─────────┤
│ 0   │ \w+     │
│ 2   │ \pL+    │
│ 3   │ foo     │
│ 4   │ bar     │
│ 6   │ foobar  │
└─────┴─────────┘
*/
```

<h3 name="regex_version"><code>regex_version()</code></h3>

Returns the semver version string of the current version of sqlite-regex.

```sql
select regex_version();
-- "v0.1.0"
```

<h3 name="regex_debug"><code>regex_debug()</code></h3>

Returns a debug string of various info about sqlite-regex, including
the version string, build date, and commit hash.

```sql
select regex_debug();
/*
Version: v0.0.0-alpha.4
Source: 85fd18bea80c42782f35975351ea3760d4396eb6
*/
```