newton-data-provider 0.1.31

newton data provider
docs.rs failed to build newton-data-provider-0.1.31
Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.
Visit the last successful build: newton-data-provider-0.1.29

Data Provider: Secrets management and provisioning flow

Related Documentation: For the complete secrets management data model including database schema, on-chain contract relationships, API endpoints, and ownership model, see ../gateway/src/rpc/api/README.md.

Scope and goals

This document describes how crates/data-provider loads, validates, and exposes encrypted secrets to policy WASM execution. It focuses on the runtime boundary between the host (gateway/data-provider) and the WASM guest, and on the storage model that makes secrets addressable and safe in a multi-tenant system.

Storage model: one blob per (policy_client, policy_data)

Secrets are stored as a single encrypted JSON blob per (policy_client_address, policy_data_address) pair. The blob is stored as ciphertext bytes (Postgres BYTEA) and is decrypted only inside the host runtime. The guest never receives ciphertext and never participates in the lookup keys for the blob.

The database schema is owned by gateway migrations, but it is consumed by the data-provider runtime through newton_prover_core::database repositories. The current “fresh DB final schema” is:

policy_client_owner(policy_client_address PK, user_id FK -> api_keys.user_id)
policy_client_secret(policy_client_address, policy_data_address, secrets BYTEA, UNIQUE(policy_client_address, policy_data_address))

Ownership is enforced at the gateway boundary. The data-provider assumes it is executing within a request context that has already passed authorization checks for the policy client being evaluated.

PolicyData-driven schema binding

Each PolicyData contract publishes a secretsSchemaCid (IPFS CID) that defines a JSON Schema for the secrets blob expected by that PolicyData. The system treats this schema as authoritative and validates the decrypted secrets JSON against it strictly, including required and additionalProperties semantics.

If secretsSchemaCid is absent or empty for a given PolicyData, the system treats that PolicyData as not requiring secrets and will not attempt any database/KMS operations to load secrets.

Upload-time flow (gateway responsibility)

This section describes the upload pipeline because it defines invariants the data-provider relies on at runtime. Upload happens through the gateway RPC newt_storeEncryptedSecrets, which receives a base64 string representing ciphertext bytes and persists the ciphertext only after strict validation succeeds.

The gateway-side validator performs three actions in sequence: it fetches secretsSchemaCid from the PolicyData contract, downloads the schema JSON from IPFS, decrypts the uploaded ciphertext using AWS KMS (RSA OAEP SHA-256), then validates the decrypted plaintext JSON against the schema. Only then does it upsert the ciphertext into policy_client_secret scoped by (policy_client, policy_data).

This upload-time validation ensures the database contains only ciphertexts that decrypt to schema-valid JSON for that PolicyData. Runtime still re-validates defensively.

Runtime flow (data-provider responsibility)

At runtime, the data-provider executes a PolicyData WASM module via the WasmExecutor and provides host imports for HTTP and secrets. The host binds the secrets scope (policy_client_address, policy_data_address) into the execution context; the guest cannot provide or override those addresses.

The runtime path to secrets is:

DataProvider.get_policy_task_data_for_client
  -> DataProvider.execute_policy_data_wasm
    -> WasmExecutor.execute_wasm_bytes
      -> SecretsProvider (host import impl)
        -> load ciphertext from DB by (policy_client, policy_data)
        -> decrypt with AWS KMS
        -> parse plaintext JSON
        -> strict schema validation using secretsSchemaCid
        -> return secrets JSON bytes to the guest

The data-provider attempts to load secrets only when secrets_schema_cid is configured for the current PolicyData. When no schema CID exists, the secrets host import returns the UTF-8 bytes of {} and avoids any DB/KMS operations.

WASM interface contract (WIT)

The secrets host import is defined in crates/data-provider/wit/newton-provider.wit. The guest calls secrets.get() and receives a secret-response whose value is the UTF-8 bytes of the full decrypted JSON object.

The guest should treat this as an injected “secrets document” for the current PolicyData and parse it once, rather than calling a per-key API. This aligns with the storage model (one blob per PolicyData) and avoids leaking schema structure into host APIs.

Guest-side consumption pattern

This section shows a typical consumption approach in the guest. The goal is to keep the guest logic explicit about parsing and field access, while keeping lookup scoping strictly on the host side.

// Pseudocode for a WASM guest that uses the imported secrets interface.
//
// Assumes bindings generated from `newton-provider.wit`, where `secrets::get()`
// returns `{ value: list<u8> }` containing UTF-8 JSON bytes.

use serde_json::Value;

fn load_secrets_json() -> Result<Value, String> {
    let resp = newton_provider::secrets::get().map_err(|e| e.to_string())??;
    let s = String::from_utf8(resp.value).map_err(|e| format!("secrets not utf-8: {e}"))?;
    serde_json::from_str::<Value>(&s).map_err(|e| format!("secrets not valid json: {e}"))
}

fn get_required_string(secrets: &Value, key: &str) -> Result<String, String> {
    secrets
        .get(key)
        .and_then(|v| v.as_str())
        .map(|s| s.to_string())
        .ok_or_else(|| format!("missing required secret '{key}'"))
}

The host already validates the full object against the PolicyData schema, but the guest should still handle missing keys gracefully because policy logic may evolve independently from deployment state.

Schema fetch and caching

Schema JSON is fetched from IPFS using secretsSchemaCid. To avoid repeated network calls, the data-provider uses an in-process, shared cache keyed by CID. This cache is global to the process and reused across WASM executions.

This cache is intentionally in-memory rather than Redis because schema documents are small, immutable, and naturally content-addressed (CID). A distributed cache can be added later if multiple gateway/data-provider instances must share warm schema state, but correctness does not depend on caching.

Failure modes and error surfaces

The secrets runtime has three common failure classes: missing secrets (no row for the (policy_client, policy_data) pair), decryption failures (KMS errors, invalid ciphertext), and schema validation failures (plaintext JSON doesn’t match the schema fetched from secretsSchemaCid). Errors are returned as strings across the WASM host boundary; avoid including plaintext secrets in errors or logs.

When secretsSchemaCid is not set, secrets are treated as not required and the host returns {}. This makes “no secrets expected” an explicit, cheap path and prevents accidental DB access in policies that do not declare a schema.