Skip to main content

Crate papers_datalab

Crate papers_datalab 

Source
Expand description

Async Rust client for the DataLab Marker REST API.

DataLab Marker converts PDF and other documents to markdown, HTML, JSON, or structured chunks using a cloud-based ML pipeline. Conversion is async: submit a job with DatalabClient::submit_marker and poll for the result with DatalabClient::get_marker_result, or use the convenience method DatalabClient::convert_document which handles polling automatically.

§Quick start

use papers_datalab::{DatalabClient, MarkerRequest, OutputFormat, ProcessingMode};

let client = DatalabClient::from_env()?;
let pdf_bytes = std::fs::read("paper.pdf").unwrap();

let result = client.convert_document(MarkerRequest {
    file: Some(pdf_bytes),
    filename: Some("paper.pdf".into()),
    output_format: vec![OutputFormat::Markdown],
    mode: ProcessingMode::Accurate,
    ..Default::default()
}).await?;

println!("{}", result.markdown.unwrap_or_default());

§Authentication

Set the DATALAB_API_KEY environment variable, or pass the key directly to DatalabClient::new.

Re-exports§

pub use client::DatalabClient;
pub use error::DatalabError;
pub use error::Result;
pub use types::MarkerPollResponse;
pub use types::MarkerRequest;
pub use types::MarkerStatus;
pub use types::MarkerSubmitResponse;
pub use types::OutputFormat;
pub use types::ProcessingMode;
pub use types::StepType;
pub use types::StepTypesResponse;

Modules§

client
error
types