textract 0.1.0

Rust library to extract text from various types of files.
Documentation
  • Coverage
  • 0%
    0 out of 2 items documented0 out of 1 items with examples
  • Size
  • Source code size: 1.62 MB This is the summed size of all the files inside the crates.io package for this release.
  • Documentation size: 2.03 MB This is the summed size of all files generated by rustdoc for all configured targets
  • Ø build duration
  • this release: 42s Average build duration of successful builds.
  • all releases: 42s Average build duration of successful builds in releases after 2024-10-23.
  • Links
  • crates.io
  • Dependencies
  • Versions
  • Owners
  • kcubeterm

Textract

Rust library for extracting text from various file types. supported file extension.

txt odf ods odt pptx xlsx pdf

Installation and usage;

Use cargo to install textract.

// there is a pdf file at ./tmp.pdf
let content = textract::extract("tmp.pdf","pdf").unwrap;
// content contains raw text in pdf. do whatever you want. 

main.rs contains usage of textract library.

commandline

The command line as simple.

textract tmp.pdf pdf

Roadmap.

This lib is in beta stage with few file types support. but texract supports will keep increasing the file types support. since this project is part of achoz

  • supports of compressed file and tar archives
  • use lib magic to guess file types.
  • All types of documents files.