pub struct Data {Show 23 fields
pub id: Uuid,
pub name: String,
pub raw_data_location: String,
pub original_data_location: String,
pub extension: String,
pub mime_type: String,
pub content_hash: String,
pub owner_id: Uuid,
pub created_at: DateTime<Utc>,
pub updated_at: Option<DateTime<Utc>>,
pub label: Option<String>,
pub original_extension: Option<String>,
pub original_mime_type: Option<String>,
pub loader_engine: Option<String>,
pub raw_content_hash: Option<String>,
pub tenant_id: Option<Uuid>,
pub external_metadata: Option<String>,
pub node_set: Option<String>,
pub pipeline_status: Option<String>,
pub token_count: i64,
pub data_size: i64,
pub last_accessed: Option<DateTime<Utc>>,
pub importance_weight: Option<f64>,
}Expand description
Represents a piece of data in the system, such as a file or a text.
Fields match the Python cognee data table schema for cross-SDK compatibility.
Fields§
§id: UuidUnique identifier for this data record (UUID v5, deterministic from content hash)
name: StringDisplay name derived from the source (filename, URL, or text_<md5>.txt for inline text)
raw_data_location: Stringfile:// URI pointing to the stored raw content in the file storage backend
original_data_location: StringOriginal source location before any processing (file path, URL, or same as raw_data_location for inline text)
extension: StringFile extension of the stored content (e.g. “txt”, “pdf”, “html”)
mime_type: StringMIME type of the stored content (e.g. “text/plain”, “application/pdf”)
content_hash: StringMD5 hex digest of the raw content bytes (content-only, no owner mixing)
owner_id: UuidID of the user or agent that owns this data record
created_at: DateTime<Utc>Timestamp when this record was first created
updated_at: Option<DateTime<Utc>>Timestamp of the last update to this record, if any
label: Option<String>Human-readable label for the data item (from DataItem wrapper or user-provided)
original_extension: Option<String>Original file extension before any conversion
original_mime_type: Option<String>Original MIME type before any conversion
loader_engine: Option<String>Python loader engine name (e.g. “text_loader”, “pypdf_loader”)
raw_content_hash: Option<String>MD5 hash of the extracted-text file stored by the loader at ADD time
(Python parity, ingest_data.py:195). Equals content_hash
only when the extracted text is byte-identical to the raw input (plain
text); for inputs the loader transforms (PDF, CSV, HTML, image, audio)
the two hashes differ.
tenant_id: Option<Uuid>Tenant/organisation ID for multi-tenant isolation
external_metadata: Option<String>Arbitrary JSON metadata blob
node_set: Option<String>JSON list of node IDs associated with this data item
pipeline_status: Option<String>Pipeline processing status
token_count: i64Token count of the data (-1 = not yet computed)
data_size: i64Size of the data in bytes (-1 = not yet computed)
last_accessed: Option<DateTime<Utc>>Last access timestamp
importance_weight: Option<f64>Importance weight for ranking (0.0 to 1.0). Influences relevance scoring.
Implementations§
Source§impl Data
impl Data
Sourcepub fn builder(
id: Uuid,
name: impl Into<String>,
raw_data_location: impl Into<String>,
original_data_location: impl Into<String>,
extension: impl Into<String>,
mime_type: impl Into<String>,
content_hash: impl Into<String>,
owner_id: Uuid,
) -> DataBuilder
pub fn builder( id: Uuid, name: impl Into<String>, raw_data_location: impl Into<String>, original_data_location: impl Into<String>, extension: impl Into<String>, mime_type: impl Into<String>, content_hash: impl Into<String>, owner_id: Uuid, ) -> DataBuilder
Start building a new Data record with the required fields.
All optional fields default to None; data_size defaults to -1.