Crate libmotiva

Crate libmotiva 

Source
Expand description

§Motiva

From the Greek μοτίβα, meaning patterns, or the recognization of similar features between objects.

Crates.io Documentation Coverage

This is a scoped-down reimplementation of Yente and nomenklatura, used to match entities against sanctions lists.

Most of the algorithms are taken directly from those repositories, and simply reimplemented, and the credit should go to the Open Sanctions’s team.

Note that this piece of software requires Yente to run beside it, including Elasticsearch and a valid, licensed, collection of dataset obtained from Open Sanctions.

Work in progress

§Scope and goals

Not all of Yente is going to be implemented here. Notably, none of the index updates feature are going to their way into this repository. We will focus on the request part (search and matching).

Even through we will strive to produce matching scores in the vicinity of those of Yente, exact scores are not a goal. In particular, the Rust implementations of some algorithms will produce slightly different results, resulting in different overall scores.

All implemented algorithms will feature an integration test comparing Motiva’s score with Yente’s and check they are within a reasonable epsilon of each other.

If at all possible, this project will try to use only Rust-native dependencies, and stay clear of integrating with C libraries through FFI.

Some liberty was taken to adapt some logic and algorithms from Yente, so do not expect fully-compliant API or behavior.

§Implementation matrix

  • POST /match/{dataset}
  • GET /entities/{id}
  • GET /catalog (proxy)
  • name-based
  • name-qualified
  • logic-v1 [1]

[1]: Features that are disabled by default were omited for now.

§Configuration

Motiva is configured via environment variables. The following variables are supported:

VariableDescriptionDefault / Example
ENVEnvironment (dev or production)dev
LISTEN_ADDRAddress to bind the API server0.0.0.0:8000
INDEX_URLElasticsearch URLhttp://localhost:9200
INDEX_AUTH_METHODElasticsearch authentication (none, basic, bearer, api_key, encoded_api_key)none
INDEX_CLIENT_IDElasticsearch client ID (required for basic or api_key)(none)
INDEX_CLIENT_SECRETElasticsearch client secret (required for basic, api_key or encoded_api_key)(none)
YENTE_URLOptional URL to a Yente instance for score comparison(none)
CATALOG_URLOptional URL to a catalog service(none)
MATCH_CANDIDATESNumber of candidates to consider for matching10
ENABLE_PROMETHEUSEnable Prometheus metrics collection and /metrics endpoint0
ENABLE_TRACINGSet to 1 to enable tracing(none)
TRACING_EXPORTERTracing exporter kind (otlp, or gcp if compiled with the gcp feature)otlp

YENTE_URL is required if your client needs to retrieve the actual catalog through motiva. The /catalog request will be proxied to Yente.

You might want to use CATALOG_URL if you customized Yente’s catalog in any way, so motiva can pull it regularly instead of Open Sanctions’s default catalog.

§Run

Right now, there are no configuration possible on this project, and it will remain that way until it is in a good enough shape to be used widely.

$ cargo run --release
$ echo '{"queries":{"test":{"schema":"Person","properties":{"name":["Vladimir Putin"]}}}}' | curl -XPOST 127.0.0.1:8080/match/sanctions -H content-type:application/json -d @-

§Development

§Test suite

To run the tests, a Python environment must be set up with the required dependencies (this include libicu). You can install it in a virtualenv by using the Poetry file at the root of this repository and (manually) setting the PYTHONPATH:

$ poetry install
$ export PYTHONPATH="$(pwd).venv/lib/python3.13/site-packages"
$ cargo test

One quite lengthy test is ignored by default (scoring the cartesian product of 50x50 entities against each other) and compare it against nomenklatura. You can still run this test by running cargo test -- --include-ignored.

§Contributing

Motiva is a work in progress. Contributions and feedback are welcome!

Modules§

prelude
Module including most features needed to use the library.

Structs§

ElasticsearchProvider
Main index provider using Elasticsearch
Entity
An Entity returned from the index
LogicV1
Default matching algorithm
MatchParams
Settings for a search
Motiva
The main entrypoint for using the Motiva library.
NameBased
Simple matching algorithm using name similarity
NameQualified
Simple matching algorithm using name similarity, and penalty for disjoint attributes
SearchEntity
Search terms

Enums§

Algorithm
Matching algorithms supported by motiva
EntityHandle
Reference to an entity
EsAuthMethod
Authentication method to Elasticsearch
GetEntityBehavior
Whether to fetch related entities.
MotivaError

Traits§

Feature
A scoring facet composed into a MatchingAlgorithm
HasProperties
IndexProvider
MatchingAlgorithm
Algorithm used to score a SearchEntity against an Entity