Skip to main content

CanonicalIngestRecord

Struct CanonicalIngestRecord 

Source
pub struct CanonicalIngestRecord {
    pub id: String,
    pub tenant_id: String,
    pub doc_id: String,
    pub received_at: DateTime<Utc>,
    pub original_source: Option<String>,
    pub source: IngestSource,
    pub normalized_payload: Option<CanonicalPayload>,
    pub attributes: Option<Value>,
}
Expand description

Normalized record produced by ingest.

CanonicalIngestRecord is the output of the ingest pipeline. It represents a cleaned, validated, and deterministic version of the input that downstream stages can rely on.

§Guarantees

  • All required fields are present (tenant_id, doc_id, received_at)
  • Metadata is sanitized (control characters stripped)
  • Payload is normalized (text whitespace collapsed, binary preserved)
  • Document ID is stable (derived deterministically if not provided)

§Examples

use ingest::{ingest, IngestConfig, RawIngestRecord, CanonicalPayload};
use ingest::{IngestMetadata, IngestSource, IngestPayload};

let config = IngestConfig::default();
let record = RawIngestRecord {
    id: "test-001".to_string(),
    source: IngestSource::RawText,
    metadata: IngestMetadata {
        tenant_id: Some("tenant".to_string()),
        doc_id: None, // Will be derived
        received_at: None, // Will default to now
        original_source: None,
        attributes: None,
    },
    payload: Some(IngestPayload::Text("  Hello   world  ".to_string())),
};

let canonical = ingest(record, &config).unwrap();

// All fields are guaranteed present
assert!(!canonical.tenant_id.is_empty());
assert!(!canonical.doc_id.is_empty());

// Text is normalized
match &canonical.normalized_payload {
    Some(CanonicalPayload::Text(text)) => {
        assert_eq!(text, "Hello world");
    }
    _ => panic!("Expected text payload"),
}

Fields§

§id: String

Unique identifier for this ingest operation (mirrors RawIngestRecord::id).

This is the sanitized version of the original ID (control characters stripped).

§tenant_id: String

Tenant identifier for multi-tenant isolation.

This is the effective tenant ID after applying defaults:

  • If provided and non-empty: the sanitized provided value
  • Otherwise: IngestConfig::default_tenant_id
§doc_id: String

Document identifier.

This is the effective document ID after derivation:

  • If provided and non-empty: the sanitized provided value
  • Otherwise: UUIDv5 derived from tenant + record ID
§received_at: DateTime<Utc>

Timestamp when the record was received.

This is the effective timestamp after applying defaults:

  • If provided: the sanitized provided value
  • Otherwise: current UTC time at ingest
§original_source: Option<String>

Original source information if provided.

Sanitized version of IngestMetadata::original_source with control characters stripped. None if not provided.

§source: IngestSource

Source of the content (mirrors RawIngestRecord::source).

§normalized_payload: Option<CanonicalPayload>

Normalized payload ready for downstream stages.

  • For text: whitespace collapsed, size limits enforced
  • For binary: preserved unchanged, non-empty check performed
  • None if no payload was provided
§attributes: Option<Value>

Attributes JSON preserved for downstream use.

This is the sanitized and size-checked version of IngestMetadata::attributes. None if not provided.

Implementations§

Source§

impl CanonicalIngestRecord

Source

pub fn has_text_payload(&self) -> bool

Returns true if this record has a text payload.

§Example
use ingest::{CanonicalIngestRecord, CanonicalPayload};

let record = CanonicalIngestRecord {
    id: "test".to_string(),
    tenant_id: "tenant".to_string(),
    doc_id: "doc".to_string(),
    received_at: chrono::Utc::now(),
    original_source: None,
    source: ingest::IngestSource::RawText,
    normalized_payload: Some(CanonicalPayload::Text("hello".to_string())),
    attributes: None,
};

assert!(record.has_text_payload());
Source

pub fn has_binary_payload(&self) -> bool

Returns true if this record has a binary payload.

§Example
use ingest::{CanonicalIngestRecord, CanonicalPayload};

let record = CanonicalIngestRecord {
    id: "test".to_string(),
    tenant_id: "tenant".to_string(),
    doc_id: "doc".to_string(),
    received_at: chrono::Utc::now(),
    original_source: None,
    source: ingest::IngestSource::File {
        filename: "test.bin".to_string(),
        content_type: None,
    },
    normalized_payload: Some(CanonicalPayload::Binary(vec![1, 2, 3])),
    attributes: None,
};

assert!(record.has_binary_payload());
Source

pub fn text_payload(&self) -> Option<&str>

Returns the text payload if present, otherwise None.

§Example
use ingest::{CanonicalIngestRecord, CanonicalPayload};

let record = CanonicalIngestRecord {
    id: "test".to_string(),
    tenant_id: "tenant".to_string(),
    doc_id: "doc".to_string(),
    received_at: chrono::Utc::now(),
    original_source: None,
    source: ingest::IngestSource::RawText,
    normalized_payload: Some(CanonicalPayload::Text("hello world".to_string())),
    attributes: None,
};

assert_eq!(record.text_payload(), Some("hello world"));
Source

pub fn binary_payload(&self) -> Option<&[u8]>

Returns the binary payload if present, otherwise None.

Trait Implementations§

Source§

impl Clone for CanonicalIngestRecord

Source§

fn clone(&self) -> CanonicalIngestRecord

Returns a duplicate of the value. Read more
1.0.0 · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl Debug for CanonicalIngestRecord

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl<'de> Deserialize<'de> for CanonicalIngestRecord

Source§

fn deserialize<__D>(__deserializer: __D) -> Result<Self, __D::Error>
where __D: Deserializer<'de>,

Deserialize this value from the given Serde deserializer. Read more
Source§

impl PartialEq for CanonicalIngestRecord

Source§

fn eq(&self, other: &CanonicalIngestRecord) -> bool

Tests for self and other values to be equal, and is used by ==.
1.0.0 · Source§

fn ne(&self, other: &Rhs) -> bool

Tests for !=. The default implementation is almost always sufficient, and should not be overridden without very good reason.
Source§

impl Serialize for CanonicalIngestRecord

Source§

fn serialize<__S>(&self, __serializer: __S) -> Result<__S::Ok, __S::Error>
where __S: Serializer,

Serialize this value into the given Serde serializer. Read more
Source§

impl Eq for CanonicalIngestRecord

Source§

impl StructuralPartialEq for CanonicalIngestRecord

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dest. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T> Instrument for T

Source§

fn instrument(self, span: Span) -> Instrumented<Self>

Instruments this type with the provided Span, returning an Instrumented wrapper. Read more
Source§

fn in_current_span(self) -> Instrumented<Self>

Instruments this type with the current Span, returning an Instrumented wrapper. Read more
Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<T> WithSubscriber for T

Source§

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,

Attaches the provided Subscriber to this type, returning a WithDispatch wrapper. Read more
Source§

fn with_current_subscriber(self) -> WithDispatch<Self>

Attaches the current default Subscriber to this type, returning a WithDispatch wrapper. Read more
Source§

impl<T> DeserializeOwned for T
where T: for<'de> Deserialize<'de>,