Struct DataPoint

Source

pub struct DataPoint {Show 15 fields
    pub id: Uuid,
    pub created_at: i64,
    pub updated_at: i64,
    pub ontology_valid: bool,
    pub version: i32,
    pub topological_rank: Option<i32>,
    pub metadata: HashMap<String, Value>,
    pub data_type: String,
    pub belongs_to_set: Option<Vec<Value>>,
    pub source_pipeline: Option<String>,
    pub source_task: Option<String>,
    pub source_node_set: Option<String>,
    pub source_user: Option<String>,
    pub source_content_hash: Option<String>,
    pub feedback_weight: f64,
}

Expand description

Base model for all storage-layer entities.

Provides:

Unique identifier (UUID)
Timestamps (created_at, updated_at) as milliseconds since epoch
Ontology validation flag
Version tracking (integer)
Topological rank for graph traversal
Flexible metadata storage
Type discriminator
Dataset membership
Pipeline provenance fields
Feedback weight

Fields§

§id: Uuid

Unique identifier

§created_at: i64

Creation timestamp (milliseconds since epoch, matching Python)

§updated_at: i64

Last update timestamp (milliseconds since epoch, matching Python)

§ontology_valid: bool

Whether this entity has been validated against an ontology

§version: i32

Version number (default 1, matching Python)

§topological_rank: Option<i32>

Topological rank for graph traversal optimization

§metadata: HashMap<String, Value>

Flexible metadata storage (e.g., index_fields, custom attributes)

§data_type: String

Type discriminator (e.g., “Entity”, “EntityType”, “EdgeType”)

§belongs_to_set: Option<Vec<Value>>

Dataset this data point belongs to (list of JSON values, matching Python)

§source_pipeline: Option<String>

Pipeline that created this data point

§source_task: Option<String>

Task that created this data point

§source_node_set: Option<String>

Node set source

§source_user: Option<String>

User that triggered creation

§source_content_hash: Option<String>

Content hash of the raw Data artefact that produced this DataPoint. Propagates from upstream Data.content_hash through every task in the cognify pipeline, enabling content-addressed lineage queries.

§feedback_weight: f64

Feedback weight (default 0.5, matching Python)

Implementations§

Source §

impl DataPoint

Source

pub fn new(data_type: impl Into<String>, dataset_id: Option<Uuid>) -> Self

Create a new DataPoint with default values.

§Arguments

data_type - Type discriminator (e.g., “Entity”, “EntityType”)
dataset_id - Optional dataset UUID

Source

pub fn with_metadata( data_type: impl Into<String>, dataset_id: Option<Uuid>, metadata: HashMap<String, Value>, ) -> Self

Create a DataPoint with specific metadata.

Source

pub fn get_embeddable_data(&self) -> String

Get embeddable data as JSON string for vector indexing.

Returns a JSON representation of this DataPoint.

Source

pub fn to_json(&self) -> Value

Convert to JSON value.

Source

pub fn vector_metadata(&self) -> HashMap<String, Value>

Canonical vector-store payload keys for this DataPoint.

Mirrors Python’s DataPoint.model_dump() payload shape: every pydantic-equivalent field flows into the metadata map. Keys with None values are omitted (consistent with the skip_serializing_if = "Option::is_none" annotations on the struct).

Used by the cognify and memify pipelines when constructing VectorPoint payloads to keep the Rust shape byte-comparable to Python’s for the cross-SDK parity tests. Note: the data_type field carries #[serde(rename = "type")], so the resulting map uses the JSON key "type" (matching Python).

Source