Expand description
Ceres Core - Domain types, business logic, and services.
This crate provides the core functionality for Ceres, including:
- Domain models:
Dataset,SearchResult, etc. - Business logic: Delta detection, statistics tracking
- Services:
HarvestServicefor metadata harvesting,EmbeddingServicefor standalone embedding,HarvestPipelinefor combined harvest+embed,SearchServicefor semantic search,ExportServicefor streaming exports - Traits:
EmbeddingProvider,DatasetStore,PortalClientfor dependency injection - Progress reporting:
ProgressReportertrait for decoupled logging/UI
§Architecture
Harvesting and embedding are decoupled: HarvestService handles metadata fetching
(no embedding provider needed), EmbeddingService handles vector generation
(no portal access needed), and HarvestPipeline composes both for the common workflow.
§Example
ⓘ
use ceres_core::{HarvestService, HarvestPipeline, PortalType, SearchService};
use ceres_core::progress::TracingReporter;
// Metadata-only harvesting (no API key needed)
let harvest = HarvestService::new(store.clone(), portal_factory.clone());
let stats = harvest.sync_portal("https://data.gov/api/3").await?;
// Combined harvest + embed
let pipeline = HarvestPipeline::new(store.clone(), embedding.clone(), portal_factory);
let (sync_result, embed_stats) = pipeline
.sync_portal_with_progress("https://data.gov/api/3", None, "en", &TracingReporter, PortalType::Ckan)
.await?;
// Semantic search
let search = SearchService::new(store, embedding);
let results = search.search("climate data", 10).await?;Re-exports§
pub use circuit_breaker::CircuitBreaker;pub use circuit_breaker::CircuitBreakerConfig;pub use circuit_breaker::CircuitBreakerError;pub use circuit_breaker::CircuitBreakerStats;pub use circuit_breaker::CircuitState;pub use config::DbConfig;pub use config::EmbeddingServiceConfig;pub use config::HarvestConfig;pub use config::HttpConfig;pub use config::PortalEntry;pub use config::PortalType;pub use config::PortalsConfig;pub use config::SyncConfig;pub use config::default_config_path;pub use config::load_portals_config;pub use error::AppError;pub use i18n::LocalizedField;pub use models::DatabaseStats;pub use models::Dataset;pub use models::NewDataset;pub use models::SearchResult;pub use sync::AlwaysReprocessDetector;pub use sync::AtomicSyncStats;pub use sync::BatchHarvestSummary;pub use sync::ContentHashDetector;pub use sync::DeltaDetector;pub use sync::PortalHarvestResult;pub use sync::ReprocessingDecision;pub use sync::SyncOutcome;pub use sync::SyncResult;pub use sync::SyncStats;pub use sync::SyncStatus;pub use sync::needs_reprocessing;pub use progress::HarvestEvent;pub use progress::ProgressReporter;pub use progress::SilentReporter;pub use progress::TracingReporter;pub use traits::DatasetStore;pub use traits::EmbeddingProvider;pub use traits::PortalClient;pub use traits::PortalClientFactory;pub use embedding::EmbeddingService;pub use embedding::EmbeddingStats;pub use export::ExportFormat;pub use export::ExportService;pub use harvest::HarvestService;pub use parquet_export::ParquetExportConfig;pub use parquet_export::ParquetExportResult;pub use parquet_export::ParquetExportService;pub use pipeline::HarvestPipeline;pub use search::SearchService;pub use job::CreateJobRequest;pub use job::HarvestJob;pub use job::JobStatus;pub use job::RetryConfig;pub use job::WorkerConfig;pub use job_queue::JobQueue;pub use worker::SilentWorkerReporter;pub use worker::TracingWorkerReporter;pub use worker::WorkerEvent;pub use worker::WorkerReporter;pub use worker::WorkerService;
Modules§
- circuit_
breaker - Circuit breaker pattern for API resilience.
- config
- Configuration types for Ceres components.
- embedding
- Standalone embedding service for generating dataset embeddings.
- error
- export
- Export service for streaming dataset exports.
- harvest
- Harvest service for portal synchronization.
- i18n
- Multilingual field support for open data portals.
- job
- Job queue types for persistent harvest job management.
- job_
queue - Job queue trait for abstracting job persistence.
- models
- parquet_
export - Parquet export service for publishing a curated open data index.
- pipeline
- Combined harvest + embed pipeline.
- progress
- Progress reporting for harvest operations.
- search
- Search service for semantic dataset queries.
- sync
- Sync service layer for portal synchronization logic.
- traits
- Trait definitions for external dependencies.
- worker
- Worker service for processing harvest jobs from the queue.