Skip to main content

Crate ceres_core

Crate ceres_core 

Source
Expand description

Ceres Core - Domain types, business logic, and services.

This crate provides the core functionality for Ceres, including:

§Architecture

Harvesting and embedding are decoupled: HarvestService handles metadata fetching (no embedding provider needed), EmbeddingService handles vector generation (no portal access needed), and HarvestPipeline composes both for the common workflow.

§Example

use ceres_core::{HarvestService, HarvestPipeline, PortalType, SearchService};
use ceres_core::progress::TracingReporter;

// Metadata-only harvesting (no API key needed)
let harvest = HarvestService::new(store.clone(), portal_factory.clone());
let stats = harvest.sync_portal("https://data.gov/api/3").await?;

// Combined harvest + embed
let pipeline = HarvestPipeline::new(store.clone(), embedding.clone(), portal_factory);
let (sync_result, embed_stats) = pipeline
    .sync_portal_with_progress("https://data.gov/api/3", None, "en", &TracingReporter, PortalType::Ckan)
    .await?;

// Semantic search
let search = SearchService::new(store, embedding);
let results = search.search("climate data", 10).await?;

Re-exports§

pub use circuit_breaker::CircuitBreaker;
pub use circuit_breaker::CircuitBreakerConfig;
pub use circuit_breaker::CircuitBreakerError;
pub use circuit_breaker::CircuitBreakerStats;
pub use circuit_breaker::CircuitState;
pub use config::DbConfig;
pub use config::EmbeddingServiceConfig;
pub use config::HarvestConfig;
pub use config::HttpConfig;
pub use config::PortalEntry;
pub use config::PortalType;
pub use config::PortalsConfig;
pub use config::SyncConfig;
pub use config::default_config_path;
pub use config::load_portals_config;
pub use error::AppError;
pub use i18n::LocalizedField;
pub use models::DatabaseStats;
pub use models::Dataset;
pub use models::NewDataset;
pub use models::SearchResult;
pub use sync::AlwaysReprocessDetector;
pub use sync::AtomicSyncStats;
pub use sync::BatchHarvestSummary;
pub use sync::ContentHashDetector;
pub use sync::DeltaDetector;
pub use sync::PortalHarvestResult;
pub use sync::ReprocessingDecision;
pub use sync::SyncOutcome;
pub use sync::SyncResult;
pub use sync::SyncStats;
pub use sync::SyncStatus;
pub use sync::needs_reprocessing;
pub use progress::HarvestEvent;
pub use progress::ProgressReporter;
pub use progress::SilentReporter;
pub use progress::TracingReporter;
pub use traits::DatasetStore;
pub use traits::EmbeddingProvider;
pub use traits::PortalClient;
pub use traits::PortalClientFactory;
pub use embedding::EmbeddingService;
pub use embedding::EmbeddingStats;
pub use export::ExportFormat;
pub use export::ExportService;
pub use harvest::HarvestService;
pub use parquet_export::ParquetExportConfig;
pub use parquet_export::ParquetExportResult;
pub use parquet_export::ParquetExportService;
pub use pipeline::HarvestPipeline;
pub use search::SearchService;
pub use job::CreateJobRequest;
pub use job::HarvestJob;
pub use job::JobStatus;
pub use job::RetryConfig;
pub use job::WorkerConfig;
pub use job_queue::JobQueue;
pub use worker::SilentWorkerReporter;
pub use worker::TracingWorkerReporter;
pub use worker::WorkerEvent;
pub use worker::WorkerReporter;
pub use worker::WorkerService;

Modules§

circuit_breaker
Circuit breaker pattern for API resilience.
config
Configuration types for Ceres components.
embedding
Standalone embedding service for generating dataset embeddings.
error
export
Export service for streaming dataset exports.
harvest
Harvest service for portal synchronization.
i18n
Multilingual field support for open data portals.
job
Job queue types for persistent harvest job management.
job_queue
Job queue trait for abstracting job persistence.
models
parquet_export
Parquet export service for publishing a curated open data index.
pipeline
Combined harvest + embed pipeline.
progress
Progress reporting for harvest operations.
search
Search service for semantic dataset queries.
sync
Sync service layer for portal synchronization logic.
traits
Trait definitions for external dependencies.
worker
Worker service for processing harvest jobs from the queue.