Crate hdt

Crate hdt 

Source
Expand description

§HDT

Latest Version Lint and Test Documentation Benchmarks HDT Rust @ LD Party Video DOI

A Rust library for the Header Dictionary Triples compressed RDF format, including:

  • loading the HDT default format as created by hdt-cpp
  • converting N-Triples to HDT
  • efficient querying by triple patterns
  • serializing into other formats like RDF Turtle and N-Triples using the Sophia adapter
  • running SPARQL queries (with the experimental “sparql” feature but HDT is not optimized for that)

However it cannot:

  • load other HDT variants
  • swap data to disk
  • modify the RDF graph in memory

If you need any of the those features, consider using a SPARQL endpoint instead. For acknowledgement of all the original authors, please look at the reference implementations in C++ and Java by the https://github.com/rdfhdt organisation.

§Examples

use hdt::Hdt;

let file = std::fs::File::open("example.hdt").expect("error opening file");
let hdt = Hdt::read(std::io::BufReader::new(file)).expect("error loading HDT");
// query
let majors = hdt.triples_with_pattern(Some("http://dbpedia.org/resource/Leipzig"), Some("http://dbpedia.org/ontology/major"),None);
println!("{:?}", majors.collect::<Vec<_>>());

You can also use the Sophia graph trait implementation to load HDT files and reduce memory consumption of an existing application based on Sophia, which is re-exported as hdt::sophia:

use hdt::Hdt;
use hdt::sophia::api::graph::Graph;
use hdt::sophia::api::term::{IriRef, SimpleTerm, matcher::Any};

let file = std::fs::File::open("dbpedia.hdt").expect("error opening file");
let hdt = Hdt::read(std::io::BufReader::new(file)).expect("error loading HDT");
let s = SimpleTerm::Iri(IriRef::new_unchecked("http://dbpedia.org/resource/Leipzig".into()));
let p = SimpleTerm::Iri(IriRef::new_unchecked("http://dbpedia.org/ontology/major".into()));
let majors = hdt.triples_matching(Some(s),Some(p),Any);

If you don’t want to pull in the Sophia dependency, you can exclude it:

[dependencies]
hdt = { version = "...", default-features = false }

There is also a folder with runnable examples, which you can run with cargo run --example examplename (e.g. --example query).

§Experimental Features

All features other than “sophia” are experimental and are neither guaranteed to work in all combinations nor adher to semver: they may change or be removed in future versions including minor or patch releases.

§Cache

If the experimental cache feature is enabled, the library will speed up repeated loading of the same file by utilizing a custom cached index file if it exists or create one if it does not exist. Theses index files are incompatible with those generated by the C++ and Java implementations.

let hdt = hdt::Hdt::read_from_path(std::path::Path::new("tests/resources/snikmeta.hdt")).expect("snikmeta.hdt not found");

§SPARQL

The sparql feature implements spareval .

§API Documentation

See docs.rs/latest/hdt or generate for yourself with cargo doc --no-deps without disabling default features.

§Performance

The performance of a query depends on the size of the graph, the type of triple pattern and the size of the result set. When using large HDT files, make sure to enable the release profile, such as through cargo build --release, as this can be much faster than using the dev profile.

§Profiling

If you want to optimize the code, you can use a profiler. The provided test data is very small in order to keep the size of the crate down; locally modifying the tests to use a large HDT file returns more meaningful results.

§Example with perf and Firefox Profiler
$ cargo test --release
[...]
Running unittests src/lib.rs (target/release/deps/hdt-2b2f139dafe69681)
[...]
$ perf record --call-graph=dwarf target/release/deps/hdt-2b2f139dafe69681 hdt::tests::triples
$ perf script > /tmp/test.perf

Then go to https://profiler.firefox.com/ and open /tmp/test.perf.

§Criterion benchmark

$ cargo bench --bench criterion

§iai benchmark

cargo bench --bench iai
  • requires persondata_en_10k.hdt placed in tests/resources
  • requires Valgrind to be installed
  • may require a conservative target CPU like RUSTFLAGS="-C target-cpu=x86-64" cargo bench --bench iai

§Comparative benchmark suite

The separate benchmark suite compares the performance of this and some other RDF libraries.

§Community Guidelines

§Issues and Support

If you have a problem with the software, want to report a bug or have a feature request, please use the issue tracker. If have a different type of request, feel free to send an email to Konrad.

§Citation

DOI

If you use this library in your research, please cite our paper in the Journal of Open Source Software. We also provide a CITATION.cff file.

§BibTeX entry
@article{hdtrs,
  doi = {10.21105/joss.05114},
  year = {2023},
  publisher = {The Open Journal},
  volume = {8},
  number = {84},
  pages = {5114},
  author = {Konrad Höffner and Tim Baccaert},
  title = {hdt-rs: {A} {R}ust library for the {H}eader {D}ictionary {T}riples binary {RDF} compression format},
  journal = {Journal of Open Source Software}
}
§Citation string

Höffner et al., (2023). hdt-rs: A Rust library for the Header Dictionary Triples binary RDF compression format. Journal of Open Source Software, 8(84), 5114, https://doi.org/10.21105/joss.05114

§Contribute

We are happy to receive pull requests. Please use cargo fmt before committing, make sure that cargo test succeeds and that the code compiles on the stable and nightly toolchain both with and without the “sophia” feature active. cargo clippy should not report any warnings. githubcrates-iodocs-rs


HDT is a loading and triple pattern querying library for the Header Dictionary Triples compressed binary RDF format.

Currently this library only supports loading and querying existing HDT files as created by hdt-cpp. For reference implementations of HDT in C++ and Java, which support conversion and serialization from and into HDT with different format options, and acknowledgement of all the original authors, please look at the https://github.com/rdfhdt organisation.

§Example of loading and querying an HDT file

use hdt::Hdt;
// Load an hdt file
let file = std::fs::File::open("example.hdt").expect("error opening file");
let hdt = Hdt::read(std::io::BufReader::new(file)).expect("error loading HDT");
// query
let majors = hdt.triples_with_pattern(Some("http://dbpedia.org/resource/Leipzig"), Some("http://dbpedia.org/ontology/major"),None);
println!("{:?}", majors.collect::<Vec<_>>());

§Experimental Features

The cache feature is experimental and may change or be removed in future releases.

Creating and/or loading a HDT file leveraging a custom cache:

let hdt = hdt::Hdt::read_from_path(std::path::Path::new("tests/resources/snikmeta.hdt")).unwrap();

§Additional Optional Features

Using the sophia Graph trait implementation for Hdt:

use hdt::Hdt;
use hdt::sophia::api::graph::Graph;
use hdt::sophia::api::term::{IriRef, SimpleTerm, matcher::Any};

fn query(hdt: Hdt)
{
  let s = SimpleTerm::Iri(IriRef::new_unchecked("http://dbpedia.org/resource/Leipzig".into()));
  let p = SimpleTerm::Iri(IriRef::new_unchecked("http://dbpedia.org/ontology/major".into()));
  let majors = hdt.triples_matching(Some(s),Some(p),Any);
}

Re-exports§

pub use crate::hdt::Hdt;
pub use four_sect_dict::IdKind;
pub use sophia;

Modules§

containers
Types for storing and reading data.
dict_sect_pfc
Types for representing dictionaries.
four_sect_dict
Types for representing a four section dictionary
hdt
Types for representing triple sections.
hdt_graph
Adapter for the Sophia library.
header
Types for representing the header.
sparql
triples
Types for representing and querying triples.
vocab
Constants for triple terms