pub struct IngestConfig {
pub version: u32,
pub default_tenant_id: String,
pub doc_id_namespace: Uuid,
pub strip_control_chars: bool,
pub metadata_policy: MetadataPolicy,
pub max_payload_bytes: Option<usize>,
pub max_normalized_bytes: Option<usize>,
}Expand description
Runtime configuration for ingest behavior.
IngestConfig controls all aspects of the ingest pipeline including validation,
normalization, size limits, and ID generation. It is designed to be cheap to clone
and serializable for configuration management.
§Fields
version: Semantic version for tracking configuration changesdefault_tenant_id: Fallback tenant when metadata doesn’t specify onedoc_id_namespace: UUID namespace for deterministic document ID generationstrip_control_chars: Whether to remove control characters from metadatametadata_policy: Fine-grained metadata validation rulesmax_payload_bytes: Maximum raw payload size (optional)max_normalized_bytes: Maximum normalized text size (optional)
§Serialization
This struct supports JSON, TOML, and YAML serialization:
{
"version": 1,
"default_tenant_id": "default",
"strip_control_chars": true,
"max_payload_bytes": 52428800,
"max_normalized_bytes": 10485760,
"metadata_policy": {
"required_fields": ["TenantId", "DocId"],
"max_attribute_bytes": 1048576,
"reject_future_timestamps": true
}
}§Examples
§Default Configuration
use ingest::IngestConfig;
use uuid::Uuid;
let config = IngestConfig::default();
assert_eq!(config.version, 1);
assert_eq!(config.default_tenant_id, "default");
assert_eq!(config.strip_control_chars, true);
assert!(config.max_payload_bytes.is_none());
assert!(config.max_normalized_bytes.is_none());§Custom Configuration
use ingest::{IngestConfig, MetadataPolicy, RequiredField};
use uuid::Uuid;
let config = IngestConfig {
version: 2,
default_tenant_id: "my-app".to_string(),
doc_id_namespace: Uuid::new_v5(
&Uuid::NAMESPACE_DNS,
b"my-app.example.com"
),
strip_control_chars: true,
metadata_policy: MetadataPolicy {
required_fields: vec![RequiredField::TenantId],
max_attribute_bytes: Some(65536),
reject_future_timestamps: true,
},
max_payload_bytes: Some(10 * 1024 * 1024),
max_normalized_bytes: Some(5 * 1024 * 1024),
};
assert!(config.validate().is_ok());Fields§
§version: u32Semantic version of the ingest configuration.
This version number helps track configuration changes and can be used for schema migration or feature flagging. Increment this when making breaking changes to ingest behavior.
Default: 1
default_tenant_id: StringDefault tenant ID to use when metadata doesn’t specify one.
This ensures every canonical record has a tenant identifier, enabling multi-tenant isolation even when callers omit the tenant field.
Default: "default"
doc_id_namespace: UuidNamespace UUID for deterministic document ID generation.
When doc_id is not provided in metadata, a UUIDv5 is derived using:
UUIDv5(doc_id_namespace, tenant_id + "\0" + record_id)
Using a consistent namespace ensures that:
- The same content always gets the same ID (deterministic)
- Different applications don’t collide (namespace isolation)
- Re-ingesting content is idempotent
Default: Uuid::NAMESPACE_OID
strip_control_chars: boolWhether to strip ASCII control characters from metadata strings.
When true, control characters (0x00-0x1F and 0x7F) are removed from:
tenant_iddoc_idoriginal_sourceid(record ID)
This prevents log injection attacks and ensures metadata is safe for downstream systems. It is strongly recommended to keep this enabled.
Default: true
metadata_policy: MetadataPolicyAdditional metadata validation policies.
Controls which fields are required, attribute size limits, and timestamp validation rules.
Default: MetadataPolicy::default()
max_payload_bytes: Option<usize>Maximum raw payload byte length allowed.
If set, payloads exceeding this limit are rejected with
IngestError::PayloadTooLarge before any processing.
This check is performed on the raw payload size before normalization (whitespace collapsing, UTF-8 decoding, etc.).
§Size Recommendations
- Small text: 1-10 MB
- Documents: 50-100 MB
- Large files: 500 MB - 1 GB (if memory allows)
Default: None (unlimited)
max_normalized_bytes: Option<usize>Maximum normalized payload byte length allowed.
If set, text payloads exceeding this limit after whitespace normalization
are rejected with IngestError::PayloadTooLarge.
This is useful for enforcing limits on processed content size, which may differ from raw size due to whitespace collapsing.
§Constraint
Must be less than or equal to max_payload_bytes (validated by
IngestConfig::validate()).
Default: None (unlimited)
Implementations§
Source§impl IngestConfig
impl IngestConfig
Sourcepub fn validate(&self) -> Result<(), ConfigError>
pub fn validate(&self) -> Result<(), ConfigError>
Validates internal consistency of this configuration.
This method checks for logical errors in the configuration that would cause runtime issues. It is inexpensive and should be called at process start-up to catch misconfigurations before handling live ingest traffic.
§Validation Rules
max_normalized_bytesmust be ≤max_payload_bytes(if both are set)
§Returns
Ok(())if configuration is validErr(ConfigError)describing the validation failure
§Performance
This method performs only in-memory checks with O(1) complexity. No I/O is performed.
§Examples
§Valid Configuration
use ingest::IngestConfig;
let config = IngestConfig::default();
assert!(config.validate().is_ok());§Invalid Configuration
use ingest::IngestConfig;
let invalid_config = IngestConfig {
max_payload_bytes: Some(100),
max_normalized_bytes: Some(200), // Invalid!
..Default::default()
};
assert!(invalid_config.validate().is_err());§Production Usage
use ingest::IngestConfig;
fn main() -> Result<(), Box<dyn std::error::Error>> {
let config = load_config()?;
config.validate()?;
// Continue with valid config...
Ok(())
}
fn load_config() -> anyhow::Result<IngestConfig> {
// Load from file, env vars, etc.
Ok(IngestConfig::default())
}Trait Implementations§
Source§impl Clone for IngestConfig
impl Clone for IngestConfig
Source§fn clone(&self) -> IngestConfig
fn clone(&self) -> IngestConfig
1.0.0 · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read moreSource§impl Debug for IngestConfig
impl Debug for IngestConfig
Source§impl Default for IngestConfig
impl Default for IngestConfig
Source§fn default() -> Self
fn default() -> Self
Creates a default IngestConfig suitable for development.
§Defaults
version: 1default_tenant_id: “default”doc_id_namespace:Uuid::NAMESPACE_OIDstrip_control_chars: truemetadata_policy: default (no required fields, no limits)max_payload_bytes: None (unlimited)max_normalized_bytes: None (unlimited)
§Example
use ingest::IngestConfig;
let config = IngestConfig::default();
assert_eq!(config.version, 1);
assert_eq!(config.default_tenant_id, "default");
assert!(config.strip_control_chars);