pub struct DcatClient { /* private fields */ }Expand description
HTTP client for harvesting DCAT-AP portals using the udata REST catalog endpoint.
Implementations§
Source§impl DcatClient
impl DcatClient
Sourcepub fn new(base_url_str: &str, language: &str) -> Result<Self, AppError>
pub fn new(base_url_str: &str, language: &str) -> Result<Self, AppError>
Creates a new DCAT client for the specified portal.
§Arguments
base_url_str- The base URL of the portal (e.g.,https://data.public.lu)language- Preferred language for resolving multilingual fields (e.g.,"fr","en")
§Errors
Returns AppError::InvalidPortalUrl if the URL is malformed.
Returns AppError::ClientError if the HTTP client cannot be built.
Sourcepub fn portal_type(&self) -> &'static str
pub fn portal_type(&self) -> &'static str
Returns the portal type identifier.
Sourcepub async fn list_dataset_ids(&self) -> Result<Vec<String>, AppError>
pub async fn list_dataset_ids(&self) -> Result<Vec<String>, AppError>
Returns all dataset identifiers by fetching all datasets and extracting their IDs.
This is not optimal — udata has no lightweight ID-list endpoint — but is
acceptable since the harvest pipeline typically uses search_all_datasets.
Sourcepub async fn get_dataset(&self, _id: &str) -> Result<DcatDataset, AppError>
pub async fn get_dataset(&self, _id: &str) -> Result<DcatDataset, AppError>
Not implemented: the udata single-dataset endpoint returns plain JSON,
not JSON-LD, requiring a separate parser. Use search_all_datasets instead.
Sourcepub async fn search_modified_since(
&self,
since: DateTime<Utc>,
) -> Result<Vec<DcatDataset>, AppError>
pub async fn search_modified_since( &self, since: DateTime<Utc>, ) -> Result<Vec<DcatDataset>, AppError>
Searches for datasets modified since the given timestamp.
Uses the modified_since query parameter supported by udata portals.
The response is still paginated and follows the same hydra:next pattern.
Sourcepub async fn search_all_datasets(&self) -> Result<Vec<DcatDataset>, AppError>
pub async fn search_all_datasets(&self) -> Result<Vec<DcatDataset>, AppError>
Fetches all datasets from the portal using paginated catalog requests.
Sourcepub fn into_new_dataset(
data: DcatDataset,
portal_url: &str,
_url_template: Option<&str>,
language: &str,
) -> NewDataset
pub fn into_new_dataset( data: DcatDataset, portal_url: &str, _url_template: Option<&str>, language: &str, ) -> NewDataset
Converts a DcatDataset into the normalized NewDataset model.
The url_template parameter is ignored for DCAT portals because the JSON-LD
@id field already provides the canonical landing page URL.
Sourcepub fn paginate_catalog_stream(
&self,
modified_since: Option<String>,
) -> BoxStream<'_, Result<Vec<DcatDataset>, AppError>>
pub fn paginate_catalog_stream( &self, modified_since: Option<String>, ) -> BoxStream<'_, Result<Vec<DcatDataset>, AppError>>
Streams catalog datasets page-by-page instead of accumulating into a single Vec.
Each yielded item contains the datasets extracted from one catalog page (~100 datasets). Memory is bounded to a single page at a time.
Trait Implementations§
Source§impl Clone for DcatClient
impl Clone for DcatClient
Source§fn clone(&self) -> DcatClient
fn clone(&self) -> DcatClient
1.0.0 · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read more