# Edgesearch
Build a full text search API using Cloudflare Workers and WebAssembly.
## Features
- Uses an [inverted index](https://en.wikipedia.org/wiki/Inverted_index) and [compressed bit sets](https://roaringbitmap.org/).
- No servers or databases to create, manage, or scale.
- Packs large amounts of data in relatively few [KV entries](https://www.cloudflare.com/products/workers-kv/).
- Runs fast [WASM](https://webassembly.org/) code at Cloudflare edge PoPs for low-latency requests.
## Demos
Check out the [demo](./demo) folder for live demos with source code.
## How it works
Edgesearch builds a reverse index by mapping terms to a compressed bit set (using Roaring Bitmaps) of IDs of documents containing the term, and creates a custom worker script and data to upload to Cloudflare Workers.
### Data
An array of term-documents pairs sorted by term is built, where **term** is a string and **documents** is a compressed bit set.
This array is then split into chunks of up to 10 MiB, as each Cloudflare Workers KV entry can hold a value up to 10 MiB in size.
To find the documents bit set associated with a term, a binary search is done to find the appropriate chunk, and then the pair within the chunk.
The same structure and process is used to store and retrieve document contents.
Packing multiple bit sets/documents reduces read/write costs and deploy times, and improves caching.
### Searching
Search terms have an associated mode. There are three modes that match documents in different ways:
|Require|Has all terms with this mode.|
|Contain|Has at least one term with this mode.|
|Exclude|Has none of the terms with this mode.|
For example, a document with terms `a`, `b`, `c`, `d`, and `e` would match the query `require (d, a) contain (g, b, f) exclude (h, i)`.
The results are generated by doing bitwise operations across multiple bit sets.
The general computation could be summarised as:
```c
This will upload the worker script and associated WASM to Cloudflare Workers, and write every key to Cloudflare Workers KV.
```bash
edgesearch deploy \
--default-results default-results.json \
--account-id CF_ACCOUNT_ID \
--account-email me@email.com \
--global-api-key CF_GLOBAL_API_KEY \
--name my-edgesearch \
--output-dir dist/worker/ \
--namespace CF_KV_NAMESPACE_ID \
--upload-data
```
### Calling the API
A JavaScript [client](./client/) for the browser and Node.js is available for using a deployed Edgesearch worker:
```typescript
import * as Edgesearch from 'edgesearch-client';
type Document = {
title: string;
artist: string;
year: number;
};
const client = new Edgesearch.Client<Document>('https://my-edgesearch.me.workers.dev');
const query = new Edgesearch.Query();
query.add(Edgesearch.Mode.REQUIRE, 'world');
query.add(Edgesearch.Mode.CONTAIN, 'hello', 'welcome', 'greetings');
query.add(Edgesearch.Mode.EXCLUDE, 'bye', 'goodbye');
let response = await client.search(query);
query.setContinuation(response.continuation);
response = await client.search(query);
```
## Performance
Searches that retrieve entries not cached at edge locations will be slow. To reduce cache misses, ensure that there is consistent traffic.