Expand description
§Motiva
From the Greek μοτίβα, meaning patterns, or the recognization of similar features between objects.
This is a scoped-down reimplementation of Yente and nomenklatura, used to match entities against sanctions lists.
Most of the algorithms are taken directly from those repositories, and simply reimplemented, and the credit should go to the Open Sanctions’s team.
Note that this piece of software requires Yente to run beside it, including Elasticsearch and a valid, licensed, collection of dataset obtained from Open Sanctions.
Work in progress
§Scope and goals
Not all of Yente is going to be implemented here. Notably, none of the index updates feature are going to their way into this repository. We will focus on the request part (search and matching).
Even through we will strive to produce matching scores in the vicinity of those of Yente, exact scores are not a goal. In particular, the Rust implementations of some algorithms will produce slightly different results, resulting in different overall scores.
All implemented algorithms will feature an integration test comparing Motiva’s score with Yente’s and check they are within a reasonable epsilon of each other.
If at all possible, this project will try to use only Rust-native dependencies, and stay clear of integrating with C libraries through FFI.
Some liberty was taken to adapt some logic and algorithms from Yente, so do not expect fully-compliant API or behavior.
§Implementation matrix
- POST /match/{dataset}
- GET /entities/{id}
- GET /catalog (proxy)
- name-based
- name-qualified
- logic-v1 [1]
[1]: Features that are disabled by default were omited for now.
§Configuration
Motiva is configured via environment variables. The following variables are supported:
Variable | Description | Default / Example |
---|---|---|
ENV | Environment (dev or production ) | dev |
LISTEN_ADDR | Address to bind the API server | 0.0.0.0:8000 |
INDEX_URL | Elasticsearch URL | http://localhost:9200 |
INDEX_AUTH_METHOD | Elasticsearch authentication (none , basic , bearer , api_key , encoded_api_key ) | none |
INDEX_CLIENT_ID | Elasticsearch client ID (required for basic or api_key ) | (none) |
INDEX_CLIENT_SECRET | Elasticsearch client secret (required for basic , api_key or encoded_api_key ) | (none) |
YENTE_URL | Optional URL to a Yente instance for score comparison | (none) |
CATALOG_URL | Optional URL to a catalog service | (none) |
MATCH_CANDIDATES | Number of candidates to consider for matching | 10 |
ENABLE_PROMETHEUS | Enable Prometheus metrics collection and /metrics endpoint | 0 |
ENABLE_TRACING | Set to 1 to enable tracing | (none) |
TRACING_EXPORTER | Tracing exporter kind (otlp , or gcp if compiled with the gcp feature) | otlp |
YENTE_URL
is required if your client needs to retrieve the actual catalog through motiva. The /catalog
request will be proxied to Yente.
You might want to use CATALOG_URL
if you customized Yente’s catalog in any way, so motiva can pull it regularly instead of Open Sanctions’s default catalog.
§Run
Right now, there are no configuration possible on this project, and it will remain that way until it is in a good enough shape to be used widely.
$ cargo run --release
$ echo '{"queries":{"test":{"schema":"Person","properties":{"name":["Vladimir Putin"]}}}}' | curl -XPOST 127.0.0.1:8080/match/sanctions -H content-type:application/json -d @-
§Development
§Test suite
To run the tests, a Python environment must be set up with the required dependencies (this include libicu
). You can install it in a virtualenv by using the Poetry file at the root of this repository and (manually) setting the PYTHONPATH
:
$ poetry install
$ export PYTHONPATH="$(pwd).venv/lib/python3.13/site-packages"
$ cargo test
One quite lengthy test is ignored by default (scoring the cartesian product of 50x50 entities against each other) and compare it against nomenklatura. You can still run this test by running cargo test -- --include-ignored
.
§Contributing
Motiva is a work in progress. Contributions and feedback are welcome!
Modules§
- prelude
- Module including most features needed to use the library.
Structs§
- Elasticsearch
Provider - Main index provider using Elasticsearch
- Entity
- An Entity returned from the index
- LogicV1
- Default matching algorithm
- Match
Params - Settings for a search
- Motiva
- The main entrypoint for using the Motiva library.
- Name
Based - Simple matching algorithm using name similarity
- Name
Qualified - Simple matching algorithm using name similarity, and penalty for disjoint attributes
- Search
Entity - Search terms
Enums§
- Algorithm
- Matching algorithms supported by motiva
- Entity
Handle - Reference to an entity
- EsAuth
Method - Authentication method to Elasticsearch
- GetEntity
Behavior - Whether to fetch related entities.
- Motiva
Error
Traits§
- Feature
- A scoring facet composed into a
MatchingAlgorithm
- HasProperties
- Index
Provider - Matching
Algorithm - Algorithm used to score a SearchEntity against an Entity