pub struct NewDataset {
pub original_id: String,
pub source_portal: String,
pub url: String,
pub title: String,
pub description: Option<String>,
pub embedding: Option<Vec<f32>>,
pub metadata: Value,
pub content_hash: String,
}Expand description
Data Transfer Object for inserting or updating datasets.
This structure is used when creating new datasets or updating existing ones.
Unlike Dataset, it doesn’t include database-generated fields like id or
timestamps. The embedding field stores a vector of floats for semantic search.
§Examples
use ceres_core::NewDataset;
use serde_json::json;
let title = "My Dataset";
let description = Some("Description here".to_string());
let content_hash = NewDataset::compute_content_hash(title, description.as_deref());
let dataset = NewDataset {
original_id: "dataset-123".to_string(),
source_portal: "https://dati.gov.it".to_string(),
url: "https://dati.gov.it/dataset/my-data".to_string(),
title: title.to_string(),
description,
embedding: None,
metadata: json!({"tags": ["open-data", "italy"]}),
content_hash,
};
assert_eq!(dataset.title, "My Dataset");
assert!(dataset.embedding.is_none());
assert_eq!(dataset.content_hash.len(), 64); // SHA-256 = 64 hex chars§Fields
original_id- Original identifier from the source portalsource_portal- Base URL of the originating CKAN portalurl- Public landing page URL for the datasettitle- Human-readable dataset titledescription- Optional detailed descriptionembedding- Optional vector of floats for semantic searchmetadata- Additional metadata as JSONcontent_hash- SHA-256 hash of title + description for delta detection
Fields§
§original_id: StringOriginal identifier from the source portal
source_portal: StringBase URL of the originating CKAN portal
url: StringPublic landing page URL for the dataset
title: StringHuman-readable dataset title
description: Option<String>Optional detailed description
embedding: Option<Vec<f32>>Optional embedding vector for semantic search
metadata: ValueAdditional metadata as JSON
content_hash: StringSHA-256 hash of title + description for delta detection
Implementations§
Source§impl NewDataset
impl NewDataset
Sourcepub fn compute_content_hash(title: &str, description: Option<&str>) -> String
pub fn compute_content_hash(title: &str, description: Option<&str>) -> String
Computes a SHA-256 hash of the content (title + description) for delta detection.
This hash is used to determine if the dataset content has changed since the last harvest, avoiding unnecessary embedding regeneration.
§Arguments
title- The dataset titledescription- Optional dataset description
§Returns
A 64-character lowercase hexadecimal string representing the SHA-256 hash.
Sourcepub fn compute_content_hash_with_language(
title: &str,
description: Option<&str>,
language: &str,
) -> String
pub fn compute_content_hash_with_language( title: &str, description: Option<&str>, language: &str, ) -> String
Computes a content hash that includes the language preference.
The language is included so that changing the preferred language for a portal triggers re-embedding (since the resolved text changes).
Trait Implementations§
Source§impl Clone for NewDataset
impl Clone for NewDataset
Source§fn clone(&self) -> NewDataset
fn clone(&self) -> NewDataset
1.0.0 · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read more