Crate tabula

source ·
Expand description

§Rust bindings for tabulapdf/tabula-java

§Prerequisites

In order to use tabula-rs, you will need a tabula-java bytecode archive (jar). You can build it yourself by cloning ssh://git@github.com/tabulapdf/tabula-java.git and then running invoking maven to build it.

git clone git@github.com:tabulapdf/tabula-java.git && cd tabula-java
git apply path/to/tabula-rs/0001-add-ffi-constructor-to-CommandLineApp.patch
mvn compile assembly:single

the built archive should then be target/tabula-$TABULA_VER-jar-with-dependencies.jar.

Additionally, make sure $JAVA_HOME/lib/server/libjvm.so is reachable through LD_LIBRARY_PATH or explicitly set it as LD_PRELOAD.

This can look like this:

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$JAVA_HOME/lib/server/

§Using tabula-rs

§Initalizing JVM & accessing JNI

In order to make use of tabula-java, you’ll need to start jni::JavaVM with the built archive added to its classpath. You could either do this manually, or call TabulaVM::new()` with the (space escaped) path to the archive as parameter.

Using TabulaVM you can now access the Java native interface by calling TabulaVM::attach().

let vm = TabulaVM::new("../tabula-java/target/tabula-1.0.6-SNAPSHOT-jar-with-dependencies.jar", false).unwrap();
let env = vm.attach().unwrap();

§Instantiating Tabula class

with access to the JNI you can instantia the Tabula class by calling TabulaEnv::configure_tabula().

let t = env.configure_tabula(None, None, OutputFormat::Csv, true, ExtractionMethod::Basic, false, None).unwrap();

§Parsing the document

Tabula provides Tabula::parse_document() that then parses a document located a its given path and returns a std::fs::File located in memory.

let file = t.parse_document(&std::path::Path::new("./test_data/spanning_cells.pdf"), "test_spanning_cells").unwrap();

Re-exports§

Structs§

  • Oxidized technology.tabula.Rectangle
  • Tabula class
  • Java native interface capable of instantiating Tabula class
  • Java VM capable of using Tabula

Enums§

  • Oxidized technology.tabula.CommandLineApp$ExtractionMethod
  • Oxidized technology.tabula.CommandLineApp$OutputFormat

Constants§

Type Aliases§