Skip to main content

IngestConfig

Struct IngestConfig 

Source
pub struct IngestConfig {
    pub version: u32,
    pub default_tenant_id: String,
    pub doc_id_namespace: Uuid,
    pub strip_control_chars: bool,
    pub metadata_policy: MetadataPolicy,
    pub max_payload_bytes: Option<usize>,
    pub max_normalized_bytes: Option<usize>,
}
Expand description

Runtime configuration for ingest behavior.

IngestConfig controls all aspects of the ingest pipeline including validation, normalization, size limits, and ID generation. It is designed to be cheap to clone and serializable for configuration management.

§Fields

  • version: Semantic version for tracking configuration changes
  • default_tenant_id: Fallback tenant when metadata doesn’t specify one
  • doc_id_namespace: UUID namespace for deterministic document ID generation
  • strip_control_chars: Whether to remove control characters from metadata
  • metadata_policy: Fine-grained metadata validation rules
  • max_payload_bytes: Maximum raw payload size (optional)
  • max_normalized_bytes: Maximum normalized text size (optional)

§Serialization

This struct supports JSON, TOML, and YAML serialization:

{
  "version": 1,
  "default_tenant_id": "default",
  "strip_control_chars": true,
  "max_payload_bytes": 52428800,
  "max_normalized_bytes": 10485760,
  "metadata_policy": {
    "required_fields": ["TenantId", "DocId"],
    "max_attribute_bytes": 1048576,
    "reject_future_timestamps": true
  }
}

§Examples

§Default Configuration

use ingest::IngestConfig;
use uuid::Uuid;

let config = IngestConfig::default();

assert_eq!(config.version, 1);
assert_eq!(config.default_tenant_id, "default");
assert_eq!(config.strip_control_chars, true);
assert!(config.max_payload_bytes.is_none());
assert!(config.max_normalized_bytes.is_none());

§Custom Configuration

use ingest::{IngestConfig, MetadataPolicy, RequiredField};
use uuid::Uuid;

let config = IngestConfig {
    version: 2,
    default_tenant_id: "my-app".to_string(),
    doc_id_namespace: Uuid::new_v5(
        &Uuid::NAMESPACE_DNS,
        b"my-app.example.com"
    ),
    strip_control_chars: true,
    metadata_policy: MetadataPolicy {
        required_fields: vec![RequiredField::TenantId],
        max_attribute_bytes: Some(65536),
        reject_future_timestamps: true,
    },
    max_payload_bytes: Some(10 * 1024 * 1024),
    max_normalized_bytes: Some(5 * 1024 * 1024),
};

assert!(config.validate().is_ok());

Fields§

§version: u32

Semantic version of the ingest configuration.

This version number helps track configuration changes and can be used for schema migration or feature flagging. Increment this when making breaking changes to ingest behavior.

Default: 1

§default_tenant_id: String

Default tenant ID to use when metadata doesn’t specify one.

This ensures every canonical record has a tenant identifier, enabling multi-tenant isolation even when callers omit the tenant field.

Default: "default"

§doc_id_namespace: Uuid

Namespace UUID for deterministic document ID generation.

When doc_id is not provided in metadata, a UUIDv5 is derived using: UUIDv5(doc_id_namespace, tenant_id + "\0" + record_id)

Using a consistent namespace ensures that:

  • The same content always gets the same ID (deterministic)
  • Different applications don’t collide (namespace isolation)
  • Re-ingesting content is idempotent

Default: Uuid::NAMESPACE_OID

§strip_control_chars: bool

Whether to strip ASCII control characters from metadata strings.

When true, control characters (0x00-0x1F and 0x7F) are removed from:

  • tenant_id
  • doc_id
  • original_source
  • id (record ID)

This prevents log injection attacks and ensures metadata is safe for downstream systems. It is strongly recommended to keep this enabled.

Default: true

§metadata_policy: MetadataPolicy

Additional metadata validation policies.

Controls which fields are required, attribute size limits, and timestamp validation rules.

Default: MetadataPolicy::default()

§max_payload_bytes: Option<usize>

Maximum raw payload byte length allowed.

If set, payloads exceeding this limit are rejected with IngestError::PayloadTooLarge before any processing.

This check is performed on the raw payload size before normalization (whitespace collapsing, UTF-8 decoding, etc.).

§Size Recommendations

  • Small text: 1-10 MB
  • Documents: 50-100 MB
  • Large files: 500 MB - 1 GB (if memory allows)

Default: None (unlimited)

§max_normalized_bytes: Option<usize>

Maximum normalized payload byte length allowed.

If set, text payloads exceeding this limit after whitespace normalization are rejected with IngestError::PayloadTooLarge.

This is useful for enforcing limits on processed content size, which may differ from raw size due to whitespace collapsing.

§Constraint

Must be less than or equal to max_payload_bytes (validated by IngestConfig::validate()).

Default: None (unlimited)

Implementations§

Source§

impl IngestConfig

Source

pub fn validate(&self) -> Result<(), ConfigError>

Validates internal consistency of this configuration.

This method checks for logical errors in the configuration that would cause runtime issues. It is inexpensive and should be called at process start-up to catch misconfigurations before handling live ingest traffic.

§Validation Rules
  1. max_normalized_bytes must be ≤ max_payload_bytes (if both are set)
§Returns
  • Ok(()) if configuration is valid
  • Err(ConfigError) describing the validation failure
§Performance

This method performs only in-memory checks with O(1) complexity. No I/O is performed.

§Examples
§Valid Configuration
use ingest::IngestConfig;

let config = IngestConfig::default();
assert!(config.validate().is_ok());
§Invalid Configuration
use ingest::IngestConfig;

let invalid_config = IngestConfig {
    max_payload_bytes: Some(100),
    max_normalized_bytes: Some(200), // Invalid!
    ..Default::default()
};

assert!(invalid_config.validate().is_err());
§Production Usage
use ingest::IngestConfig;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let config = load_config()?;
    config.validate()?;
    // Continue with valid config...
    Ok(())
}

fn load_config() -> anyhow::Result<IngestConfig> {
    // Load from file, env vars, etc.
    Ok(IngestConfig::default())
}

Trait Implementations§

Source§

impl Clone for IngestConfig

Source§

fn clone(&self) -> IngestConfig

Returns a duplicate of the value. Read more
1.0.0 · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl Debug for IngestConfig

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl Default for IngestConfig

Source§

fn default() -> Self

Creates a default IngestConfig suitable for development.

§Defaults
  • version: 1
  • default_tenant_id: “default”
  • doc_id_namespace: Uuid::NAMESPACE_OID
  • strip_control_chars: true
  • metadata_policy: default (no required fields, no limits)
  • max_payload_bytes: None (unlimited)
  • max_normalized_bytes: None (unlimited)
§Example
use ingest::IngestConfig;

let config = IngestConfig::default();
assert_eq!(config.version, 1);
assert_eq!(config.default_tenant_id, "default");
assert!(config.strip_control_chars);
Source§

impl<'de> Deserialize<'de> for IngestConfig

Source§

fn deserialize<__D>(__deserializer: __D) -> Result<Self, __D::Error>
where __D: Deserializer<'de>,

Deserialize this value from the given Serde deserializer. Read more
Source§

impl Serialize for IngestConfig

Source§

fn serialize<__S>(&self, __serializer: __S) -> Result<__S::Ok, __S::Error>
where __S: Serializer,

Serialize this value into the given Serde serializer. Read more

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dest. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T> Instrument for T

Source§

fn instrument(self, span: Span) -> Instrumented<Self>

Instruments this type with the provided Span, returning an Instrumented wrapper. Read more
Source§

fn in_current_span(self) -> Instrumented<Self>

Instruments this type with the current Span, returning an Instrumented wrapper. Read more
Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<T> WithSubscriber for T

Source§

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,

Attaches the provided Subscriber to this type, returning a WithDispatch wrapper. Read more
Source§

fn with_current_subscriber(self) -> WithDispatch<Self>

Attaches the current default Subscriber to this type, returning a WithDispatch wrapper. Read more
Source§

impl<T> DeserializeOwned for T
where T: for<'de> Deserialize<'de>,