Expand description
§Rust bindings for tabulapdf/tabula-java
§Prerequisites
In order to use tabula-rs, you will need a tabula-java bytecode archive (jar). You can build it yourself by cloning ssh://git@github.com/tabulapdf/tabula-java.git and then running invoking maven to build it.
git clone git@github.com:tabulapdf/tabula-java.git && cd tabula-java
git apply path/to/tabula-rs/0001-add-ffi-constructor-to-CommandLineApp.patch
mvn compile assembly:single
the built archive should then be target/tabula-$TABULA_VER-jar-with-dependencies.jar.
Additionally, make sure $JAVA_HOME/lib/server/libjvm.so
is reachable through LD_LIBRARY_PATH
or explicitly set it as LD_PRELOAD
.
This can look like this:
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$JAVA_HOME/lib/server/
§Using tabula-rs
§Initalizing JVM & accessing JNI
In order to make use of tabula-java, you’ll need to start jni::JavaVM with the built archive added to its classpath. You could either do this manually, or call TabulaVM::new()` with the (space escaped) path to the archive as parameter.
Using TabulaVM you can now access the Java native interface by calling TabulaVM::attach().
let vm = TabulaVM::new("../tabula-java/target/tabula-1.0.6-SNAPSHOT-jar-with-dependencies.jar", false).unwrap();
let env = vm.attach().unwrap();
§Instantiating Tabula class
with access to the JNI you can instantia the Tabula class by calling TabulaEnv::configure_tabula().
let t = env.configure_tabula(None, None, OutputFormat::Csv, true, ExtractionMethod::Basic, false, None).unwrap();
§Parsing the document
Tabula provides Tabula::parse_document() that then parses a document located a its given path and returns a std::fs::File located in memory.
let file = t.parse_document(&std::path::Path::new("./test_data/spanning_cells.pdf"), "test_spanning_cells").unwrap();
§Relavant links
- tabula-rs forge: https://github.com/sp1ritCS/tabula-rs
- tabula-java project: https://github.com/tabulapdf/tabula-java/
Re-exports§
pub use jni;
Structs§
- Oxidized
technology.tabula.Rectangle
- Tabula class
- Java native interface capable of instantiating Tabula class
- Java VM capable of using Tabula
Enums§
- Oxidized
technology.tabula.CommandLineApp$ExtractionMethod
- Oxidized
technology.tabula.CommandLineApp$OutputFormat
Constants§
Type Aliases§
- Result returned from JNI