Cido
Cido is a framework for indexing events from blockchains and other services with great scalability and an easy to use graphql api. Cido is a hosted service and takes care of deploying and managing indexers and graphql apis for you.
Table of contents
Getting Started
Setting up a project
Working with generated code
Implementing Cidomap
Handling Events
Defining an Event Handler
Event Handler functions
Handler Function
Generator Function
Preprocessor Function
Entities and Events
Available Types
Getting started
Setting up a project
This crate contains all the core interfaces and implementations needed to index.
It also needs an implementation of the Network trait. The only current implementation
is for Ethereum (which includes all geth api compatible networks) in the
cido-ethereum crate. We plan on adding support
for more networks soon. To get started with indexing the minimum dependencies are on
this crate, a network crate, and the async-graphql crate (which we use for serving the
graphql api).
# async-graphql is required because our generated code adds async-graphql derives
# which requires end users to depend on it because of how their code is generated.
Working with generated code
Due to the way things have been implemented, we generate a lot of code. Everything is done through
the cidomap, event_handler, entity, and event attribute macros. If you are ever running into
issues and it's not clear what code is being generated or why something isn't working, each of the
macros accept a top level flag called embed_generated_code. This causes the code to be written
to disk and included so that the compiler will give better error messages instead of pointing
at the annotation.
While you can get kind of the same output with cargo-expand, it requires rebuilding proc-macro2
which then requires rebuilding most of the project and it expands everything including format!
and tracing::info! macros (which can be enormous and hard to ignore).
Implementing Cidomap
Once the project is setup, you need an implementation of the Cidomap trait. This can
be done by declaring a struct and annotating it with the cidomap attribute. There are
several required fields as documented below. If you've worked with other indexing frameworks
before, we don't use any yaml files. Everything is in rust code, which can really cut down
on boilerplate by being able to just change out the few things that need to change across
different chains when the contracts are identical like starting block, contract addresses, etc.
// cido_ethereum::prelude contains everything in
// cido::prelude plus the ethereum specific items
use *;
// Block to start indexing from. This is tied to the network.
const START_BLOCK: EthereumBlockNumber = new;
// Filters of starting contracts. These are tied to the network. The function
// must return a `Result<Vec<Network::TriggerFilter>, Network::Error>`
// init is any async function that takes a `Context`
// and returns a `Result<(), Cidomap::Error>`
async
// create is any idempotent async function that takes a `Context`
// and returns a `Result<(), Cidomap::Error>`
async
Handling events
Processing order
Cido handles events in batches of blocks until caught up to the latest. There are multiple steps
in processing that can happen in parallel. The only two steps that you need to be concerned with
are what we call the preprocessing and sync steps. The sync step is where the main processing
logic is handled and it is done one event at a time serially, just like the blockchain does. This
prevents any inconsistencies between runs, but it can also be a bottleneck if you're waiting for
database reads or for I/O over the network. The preprocessing step allows for any I/O to be
completed before the sync and those results are "cached" and then made available during the
generator step and the sync step.
The difference between the generator and sync is that if new events are being searched for, we
need to run through all the steps multiple times. The generator function is called for all but
the last time and the sync step is called the last time through once all the events have been
gathered. All events are processed in blockchain order in the generator and sync steps. The
functions for the generator and sync functions do not need to be Send because they are always
handled in the same thread. This may change in the future when there are partitioned blockchains
and we can process events in parallel like the blockchain does.
Defining an EventHandler
To handle events you need to create an enum that contains all of the events you're interested in
processing. The event_handler macro does all the necessary implementations for you. This is
what it looks like:
// The `cidomap` field allows us to reuse the `Network`
// and other definitions from the `Cidomap` struct.
Event Handler Functions
Each of the different handler functions take roughly the same types with minor differences. They've
been designed after web frameworks like Axum so that you can change the order or kind of values that
your function accepts so that you don't have to have ignored variables. If the signature doesn't match
what is expected you'll get errors from the event_handler annotation about arguments being incorrect
if you are borrowing in the handler, or expecting something owned in the other functions. You can also
get an error like
the trait `Handler<_, _, Cidomap>` is not implemented for fn item
if the types don't implement the necessary traits. In that case, make sure you are wrapping the event
and cache types with the Event and Cache wrappers.
Handler Function
The handler function is expecting the path to a function that does not need to be Send with a signature
that contains at least one of the parameters in any order like:
async
The event and cache fields come wrapped in their own types so that the compiler can be convinced that
the impl that allows you to swap the ordering of any of the fields doesn't have conflicting impls.
This function is also somewhat different from the other two because it takes ownership of the values.
Once this function runs there is no need to keep any more of the event or block information.
Generator Function
The generator function is expecting the path to a function that does not need to be Send with a
signature that contains at least one of the parameters in any order like:
async
This function borrows each of the types. Because they will be used in the handler function later.
The GeneratorContext only has access to the network and to spawn off more event filters. This is
the only place that can happen to prevent subtle bugs dealing with spawning event filters too late
in the process
Preprocessor Function
The preprocessor function is expecting the path to a function that must be Send with a signature
that contains at least one of the parameters in any order like:
async
The return value needs to match the cache type in the annotation. Because this function generates the
cache type it is not available. The PreprocessingContext has access to the network and a synchronization
primitive that can be used to ensure consistent results every time.
Entities and Events
There are currently two classifications of structs that can be stored in the database. Entities are
structs that can change over time so we keep track of the blocks where they change and create new
rows so that we can do historical point in time queries. Events are things that get stored once and
never change. Once an event has been created any attempts to update it will fail after the block it
was created on is finished processing. Annotating a struct with either entity or event will
implement the required Transformer traits. Both annotations use the same underlying code generation
for the most part, but they are different enough that we believe they warranted top level annotations
instead of just an extra option like immutable = true or something like that.
Here is an example of creating the Pair struct mentioned in the Uniswap struct above:
Only the cidomap field is required in the annotation. There are several more options to customize
functionality and naming in the graphql api. An id field is required either by naming it id or
by annotating it with #[id].
Any referenced fields (in this case token0 and token1) need to be annotated with what type they
are. This allows filtering based on the referenced type in the graphql api and makes code generation
use the correct type (the id of the related type). Cido only indexes the fields indicated so that we
can better manage the cost of inserts/updates and keeping data long term. Any field that is annotated
with #[entity] or #[event] implies the #[indexed] annotation. Only index fields that you will
be using as filters in queries.
Any fields annotated with the #[derived_from] annotation are not actually available in the struct,
they are resolved in the graphql api and the field tells us how to tie the queries together.
Available types
To have a type be used for indexing it must implement the necessary async-graphql, sqlx, and
stable-hash traits. The following is an incomplete table of types that are supported:
| Type |
|---|
| bool |
| i16 |
| i32 |
| i64 |
| String |
| cido::H<N> |
| cido::U<N> |
| cido::BigDecimal |
| cido::BigInt |
| cido::Bytes |
| chrono::DateTime |
| uuid::Uuid |