es-public-proxy 0.2.6

simple read-only HTTP reverse-proxy for exposing an Elasticsearch node to the public internet
Documentation

es-public-proxy: simple read-only HTTP reverse-proxy for exposing an Elasticsearch node to the public internet

  • type-safe de-serialization and re-serialization of all user data
  • single-binary, easy to install
  • simple configuration with sane defaults
  • low-overhead in network latency and compute resources
  • optional CORS headers for direct browser requests
  • SSL, transport compression, load-balancing, observability, and rate-limiting are left to other tools like nginx, caddy, or HAproxy
  • free software forever: AGPLv3+ license

The Elasticsearch REST API is powerful, well documented, and has client library implementations for many programming languages. For datasets and services which contain only public information, it would be convenient to provide direct access to at least a subset of the API for anybody to take advantage of. The Elasticsearch maintainers warn against this behavior, on the basis that the API is not designed for public use. Recent versions of Elasticsearch have an authentication/authorization subsystem, and there are third-party plugins for read-only access (such as ReadonlyREST), but these solutions require careful configuration and knowledge of which endpoints are "safe" for users. Elasticsearch accepts request bodies on GET requests, and one proposed solution is to filter to only GET requests using a reverse proxy like nginx. However, some safe endpoints (such as deleting scroll objects) require other HTTP verbs, and most browsers do not support GET bodies, so this is only a partial hack.

es-public-proxy is intended to be a simple and reliable alternative for the use case of exposing popular search queries on specific indices to the public web. HTTP requests are parsed and filtered in a safe, compiled language (Rust), then only safe queries are re-serialized and forwarded to the backend search instance listening on a different port.

Note that of course clients can still submit "expensive" queries of various kinds which will slow down the host. Some of these can be disabled in the elasticsearch configuration (this would disable those queries for all connections, not just via the proxy). Some query types are simply not supported by this proxy. In the future the proxy could gain configruation parameters and smarter parsing of some query types (like query_string) to try and prevent even more expensive queries.

Installation

On Debian/Ubuntu Linux systems, the easiest way to get started is to download and install an unsigned .deb from https://archive.org/download/es-public-proxy-deb. This will include a manpage, configuration file, and systemd unit file. After installing, edit the configuration file (/etc/es-public-proxy.toml) and start the service like:

sudo systemctl start es-public-proxy
sudo systemctl enable es-public-proxy

On other platforms you can install and run on a per-user basis using the rust toolchain with:

cargo install es-public-proxy
es-public-proxy --example-config > example.toml

# edit the configuration file

es-public-proxy --config example.toml

There is also a Dockerfile, but it isn't actively used and hasn't been pushed to any image repository. Eg, unsure how best to inject configuration into a docker image. You can build the image with:

docker build -f extra/Dockerfile .

Configuration

In all cases you will want to explicitly enumerate all of the indices to have public access. There is an unsafe_all_indices intended for prototyping, but this may allow access to additional non-index API endpoints.

One simple deployment pattern is to put nginx, es-public-proxy, and elasticsearch all on the same server. In this configuration, nginx would listen on all network interfaces on ports 80 and 443, and handle SSL upgrade redirects from 80 to 443, as well as add transport compression, restrict client body payload limits, etc. es-public-proxy would listen on localhost port 9292, and connect back to elasticsearch on localhost port 9200.

Limitations

Not all of the elasticsearch API has been implemented yet. In general, this service is likely to be more strict in parsing and corner-cases. For example:

  • URL query parameters like ?human must be expanded into a boolean like ?human=true
  • Some cases where elasticsearch will allow short-cutting a full object into a string, this proxy requires the full object format
  • index patterns in configuration are not supported

Development

To build this package you need the rust toolchain installed. We target stable Rust, 2018 edition, version 1.45+.

Re-compiling the manpage requires scdoc.

Building a Debian package (.deb) requires the cargo-deb plugin, which you can install with: cargo install cargo-deb

A Makefile is included to wrap common development commands, for example:

make test
make lint
make deb

Contributions are welcome! Would prefer to keep the number of dependant crates low (eg, don't currently use a CLI argument parsing library), but open to discussion. When sending patches or merge requests, it is helpful (but not required) if you can include test coverage, re-run cargo fmt, and acknowledge the license terms ahead of time.