Skip to main content

TextRecognizer

Struct TextRecognizer 

Source
pub struct TextRecognizer { /* private fields */ }
Expand description

OCR engine interface.

use tesseract_ocr_static::{Image, TextRecognizer};
use image::ImageReader;

let rgb = ImageReader::open("hello.txt").unwrap().decode().unwrap().into_rgb8();
let image = Image::from_rgb(rgb.width(), rgb.height(), rgb.as_raw()).unwrap();
let mut recognizer = TextRecognizer::new().unwrap();
let results = recognizer.recognize_text(&image).unwrap();
assert_eq!("Hello world", results.get_utf8_text().as_str());

Implementations§

Source§

impl TextRecognizer

Source

pub fn new() -> Result<Self, InitFailed>

Creates new text recognizer with the default data directory, default language (English), and default OCR engine mode (LSTM).

Source

pub fn with_languages(languages: &CStr) -> Result<Self, InitFailed>

Creates new text recognizer with the specified languages.

Languages are specified by their three-letter ISO codes separated by ‘+’ symbol.

Source

pub fn with_config(config: Config<'_, '_>) -> Result<Self, InitFailed>

Creates new text recognizer with the provided configuration.

Source

pub fn recognize_text<'a>( &'a mut self, image: &Image, ) -> Result<RecognitionResults<'a>, RecognitionFailed>

Recognizes text in the provided image and returns an iterator over the results.

Source

pub fn recognize_text_with_timeout<'a>( &'a mut self, image: &Image, timeout: Duration, ) -> Result<RecognitionResults<'a>, RecognitionFailed>

Recognizes text in the provided image and returns an iterator over the results.

Timeout is the max. time spent for text recognition.

Source

pub fn recognize_text_with_monitor<'a, C>( &'a mut self, image: &Image, monitor: Monitor<C>, ) -> Result<RecognitionResults<'a>, RecognitionFailed>

Recognizes text in the provided image and returns an iterator over the results.

The monitor is used to track the progress and set the timeout.

Source

pub fn recognize_text_in_rect<'a>( &'a mut self, image: &Image, rect: &Rectangle, ) -> Result<RecognitionResults<'a>, RecognitionFailed>

Recognizes text in the specified rectangle of the provided image and returns an iterator over the results.

Source

pub fn recognize_text_in_rect_with_timeout<'a>( &'a mut self, image: &Image, rect: &Rectangle, timeout: Duration, ) -> Result<RecognitionResults<'a>, RecognitionFailed>

Recognizes text in the specified rectangle of the provided image and returns an iterator over the results.

Timeout is the max. time spent for text recognition.

Source

pub fn recognize_text_in_rect_with_monitor<'a, C>( &'a mut self, image: &Image, rect: &Rectangle, monitor: Monitor<C>, ) -> Result<RecognitionResults<'a>, RecognitionFailed>

Recognizes text in the specified rectangle of the provided image and returns an iterator over the results.

The monitor is used to track the progress and set the timeout.

Source

pub fn analyze_layout<'a>(&'a self, image: &Image) -> LayoutIter<'a>

Analyzes the text layout in the provided image and returns layout analysis results as an iterator.

If you only need layout, consider using LayoutAnalyzer that uses less memory.

Source

pub fn num_dawgs(&self) -> u32

Returns the number of Directed Acyclic Word Graph (DAWG) in the dictionary.

Methods from Deref<Target = Tesseract>§

Source

pub fn data_dir(&self) -> &Path

Returns data directory.

Source

pub fn ocr_engine_mode(&self) -> OcrEngineMode

Returns OCR engine mode.

Source

pub fn page_segmentation_mode(&self) -> PageSegmentationMode

Source

pub fn set_page_segmentation_mode(&mut self, mode: PageSegmentationMode)

Source

pub fn set_source_resolution(&mut self, pixels_per_inch: u32)

Source

pub fn set_min_orientation_margin(&mut self, margin: f64)

Source

pub fn clear(&mut self)

Source

pub fn clear_cache(&mut self)

Source

pub fn clear_adaptive_classifier(&mut self)

Source

pub fn set_variable( &mut self, name: &CStr, value: &CStr, ) -> Result<(), InvalidVariable>

Set tesseract variable.

§How to improve text recognition?

There is a guide on how to improve text recognition: https://tesseract-ocr.github.io/tessdoc/ImproveQuality.html

§Variables
VariableDefault valueDescription
allow_blob_division1Use divisible blobs chopping
ambigs_debug_level0Debug level for unichar ambiguities
applybox_debug1Debug level
applybox_exposure_pattern".exp"Exposure value follows this pattern in the image filename. The name of the image files are expected to be in the form [lang].[fontname].exp[num].tif
applybox_learn_chars_and_char_frags_mode0Learn both character fragments (as is done in the special low exposure mode) as well as unfragmented characters.
applybox_learn_ngrams_mode0Each bounding box is assumed to contain ngrams. Only learn the ngrams whose outlines overlap horizontally.
applybox_page0Page number to apply boxes from
assume_fixed_pitch_char_segment0include fixed-pitch heuristics in char segmentation
bidi_debug0Debug level for BiDi
bland_unrej0unrej potential with no checks
certainty_scale20Certainty scaling factor
chop_center_knob0.15Split center adjustment
chop_centered_maxwidth90Width of (smaller) chopped blobs above which we don’t care that a chop is not near the center.
chop_debug0Chop debug
chop_enable1Chop enable
chop_good_split50Good split limit
chop_inside_angle-50Min Inside Angle Bend
chop_min_outline_area2000Min Outline Area
chop_min_outline_points6Min Number of Points on Outline
chop_new_seam_pile1Use new seam_pile
chop_ok_split100OK split limit
chop_overlap_knob0.9Split overlap adjustment
chop_same_distance2Same distance
chop_seam_pile_size150Max number of seams in seam_pile
chop_sharpness_knob0.06Split sharpness adjustment
chop_split_dist_knob0.5Split length adjustment
chop_split_length10000Split Length
chop_vertical_creep0Vertical creep
chop_width_change_knob5Width change adjustment
chop_x_y_weight3X / Y length weight
chs_leading_punct"('`\""Leading punctuation
chs_trailing_punct1").,;:?!"1st Trailing punctuation
chs_trailing_punct2")'`\""2nd Trailing punctuation
classify_adapt_feature_threshold230Threshold for good features during adaptive 0-255
classify_adapt_proto_threshold230Threshold for good protos during adaptive 0-255
classify_adapted_pruning_factor2.5Prune poor adapted results this much worse than best result
classify_adapted_pruning_threshold-1Threshold at which classify_adapted_pruning_factor starts
classify_bln_numeric_mode0Assume the input is numbers [0-9].
classify_char_norm_range0.2Character Normalization Range …
classify_character_fragments_garbage_certainty_threshold-3Exclude fragments that do not look like whole characters from training and adaption
classify_class_pruner_multiplier15Class Pruner Multiplier 0-255:
classify_class_pruner_threshold229Class Pruner Threshold 0-255
classify_cp_angle_pad_loose45Class Pruner Angle Pad Loose
classify_cp_angle_pad_medium20Class Pruner Angle Pad Medium
classify_cp_angle_pad_tight10CLass Pruner Angle Pad Tight
classify_cp_cutoff_strength7Class Pruner CutoffStrength:
classify_cp_end_pad_loose0.5Class Pruner End Pad Loose
classify_cp_end_pad_medium0.5Class Pruner End Pad Medium
classify_cp_end_pad_tight0.5Class Pruner End Pad Tight
classify_cp_side_pad_loose2.5Class Pruner Side Pad Loose
classify_cp_side_pad_medium1.2Class Pruner Side Pad Medium
classify_cp_side_pad_tight0.6Class Pruner Side Pad Tight
classify_debug_character_fragments0Bring up graphical debugging windows for fragments training
classify_debug_level0Classify debug level
classify_enable_adaptive_debugger0Enable match debugger
classify_enable_adaptive_matcher1Enable adaptive classifier
classify_enable_learning1Enable adaptive classifier
classify_font_name"UnknownFont"Default font name to be used in training
classify_integer_matcher_multiplier10Integer Matcher Multiplier 0-255:
classify_learn_debug_str""Class str to debug learning
classify_learning_debug_level0Learning Debug Level:
classify_max_certainty_margin5.5Veto difference between classifier certainties
classify_max_rating_ratio1.5Veto ratio between classifier ratings
classify_max_slope2.41421Slope above which lines are called vertical
classify_min_slope0.414214Slope below which lines are called horizontal
classify_misfit_junk_penalty0Penalty to apply when a non-alnum is vertically out of its expected textline position
classify_nonlinear_norm0Non-linear stroke-density normalization
classify_norm_adj_curl2Norm adjust curl …
classify_norm_adj_midpoint32Norm adjust midpoint …
classify_norm_method1Normalization Method …
classify_num_cp_levels3Number of Class Pruner Levels
classify_pico_feature_length0.05Pico Feature Length
classify_pp_angle_pad45Proto Pruner Angle Pad
classify_pp_end_pad0.5Proto Prune End Pad
classify_pp_side_pad2.5Proto Pruner Side Pad
classify_save_adapted_templates0Save adapted templates to a file
classify_use_pre_adapted_templates0Use pre-adapted classifier templates
conflict_set_I_l_1"Il1[]"Il1 conflict set
crunch_accept_ok1Use acceptability in okstring
crunch_debug0As it says
crunch_del_cert-10POTENTIAL crunch cert lt this
crunch_del_high_word1.5Del if word gt xht x this above bl
crunch_del_low_word0.5Del if word gt xht x this below bl
crunch_del_max_ht3Del if word ht gt xht x this
crunch_del_min_ht0.7Del if word ht lt xht x this
crunch_del_min_width3Del if word width lt xht x this
crunch_del_rating60POTENTIAL crunch rating lt this
crunch_early_convert_bad_unlv_chs0Take out ~^ early?
crunch_early_merge_tess_fails1Before word crunch?
crunch_include_numerals0Fiddle alpha figures
crunch_leave_accept_strings0Don’t pot crunch sensible strings
crunch_leave_lc_strings4Don’t crunch words with long lower case strings
crunch_leave_ok_strings1Don’t touch sensible strings
crunch_leave_uc_strings4Don’t crunch words with long lower case strings
crunch_long_repetitions3Crunch words with long repetitions
crunch_poor_garbage_cert-9crunch garbage cert lt this
crunch_poor_garbage_rate60crunch garbage rating lt this
crunch_pot_indicators1How many potential indicators needed
crunch_pot_poor_cert-8POTENTIAL crunch cert lt this
crunch_pot_poor_rate40POTENTIAL crunch rating lt this
crunch_rating_max10For adj length in rating per ch
crunch_small_outlines_size0.6Small if lt xht x this
crunch_terrible_garbage1As it says
crunch_terrible_rating80crunch rating lt this
dawg_debug_level0Set to 1 for general debug info, to 2 for more details, to 3 to see all the debug messages
debug_file""File to send tprintf output to
debug_fix_space_level0Contextual fixspace debug
debug_noise_removal0Debug reassignment of small outlines
debug_x_ht_level0Reestimate debug
devanagari_split_debugimage0Whether to create a debug image for split shiro-rekha process.
devanagari_split_debuglevel0Debug level for split shiro-rekha process.
disable_character_fragments1Do not include character fragments in the results of the classifier
doc_dict_certainty_threshold-2.25Worst certainty for words that can be inserted into the document dictionary
doc_dict_pending_threshold0Worst certainty for using pending dictionary
document_title""Title of output document (used for hOCR and PDF output)
dotproduct"auto"Function used for calculation of dot product
edges_boxarea0.875Min area fraction of grandchild for box
edges_childarea0.5Min area fraction of child outline
edges_children_count_limit45Max holes allowed in blob
edges_children_fix0Remove boxy parents of char-like children
edges_children_per_grandchild10Importance ratio for chucking outlines
edges_debug0turn on debugging for this module
edges_max_children_layers5Max layers of nested children inside a character outline
edges_max_children_per_outline10Max number of children inside a character outline
edges_min_nonhole12Min pixels for potential char in box
edges_patharea_ratio40Max lensq/area for acceptable child outline
edges_use_new_outline_complexity0Use the new outline complexity module
enable_noise_removal1Remove and conditionally reassign small outlines when they confuse layout analysis, determining diacritics vs noise
equationdetect_save_bi_image0Save input bi image
equationdetect_save_merged_image0Save the merged image
equationdetect_save_seed_image0Save the seed image
equationdetect_save_spt_image0Save special character image
file_type".tif"Filename extension
fixsp_done_mode1What constitutes done for spacing
fixsp_non_noise_limit1How many non-noise blbs either side?
fixsp_small_outlines_size0.28Small if lt xht x this
force_word_assoc0force associator to run regardless of what enable_assoc is. This is used for CJK where component grouping is necessary.
gapmap_big_gaps1.75xht multiplier
gapmap_debug0Say which blocks have tables
gapmap_no_isolated_quanta0Ensure gaps not less than 2quanta wide
gapmap_use_ends0Use large space at start and end of rows
hocr_char_boxes0Add coordinates for each character to hocr output
hocr_font_info0Add font info to hocr output
hyphen_debug_level0Debug level for hyphenated words.
interactive_display_mode0Run interactively?
invert_threshold0.7For lines with a mean confidence below this value, OCR is also tried with an inverted image
jpg_quality85Set JPEG quality level
language_model_debug_level0Language model debug level
language_model_min_compound_length3Minimum length of compound words
language_model_ngram_nonmatch_score-40Average classifier score of a non-matching unichar.
language_model_ngram_on0Turn on/off the use of character ngram model
language_model_ngram_order8Maximum order of the character ngram model
language_model_ngram_rating_factor16Factor to bring log-probs into the same range as ratings when multiplied by outline length
language_model_ngram_scale_factor0.03Strength of the character ngram model relative to the character classifier
language_model_ngram_small_prob1e-06To avoid overly small denominators use this as the floor of the probability returned by the ngram model.
language_model_ngram_space_delimited_language1Words are delimited by space
language_model_ngram_use_only_first_uft8_step0Use only the first UTF8 step of the given string when computing log probabilities.
language_model_penalty_case0.1Penalty for inconsistent case
language_model_penalty_chartype0.3Penalty for inconsistent character type
language_model_penalty_font0Penalty for inconsistent font
language_model_penalty_increment0.01Penalty increment
language_model_penalty_non_dict_word0.15Penalty for non-dictionary words
language_model_penalty_non_freq_dict_word0.1Penalty for words not in the frequent word dictionary
language_model_penalty_punc0.2Penalty for inconsistent punctuation
language_model_penalty_script0.5Penalty for inconsistent script
language_model_penalty_spacing0.05Penalty for inconsistent spacing
language_model_use_sigmoidal_certainty0Use sigmoidal score for certainty
language_model_viterbi_list_max_num_prunable10Maximum number of prunable (those for which PrunablePath() is true) entries in each viterbi list recorded in BLOB_CHOICEs
language_model_viterbi_list_max_size500Maximum size of viterbi lists recorded in BLOB_CHOICEs
load_bigram_dawg1Load dawg with special word bigrams.
load_freq_dawg1Load frequent word dawg.
load_number_dawg1Load dawg with number patterns.
load_punc_dawg1Load dawg with punctuation patterns.
load_system_dawg1Load system word dawg.
load_unambig_dawg1Load unambiguous word dawg.
log_level2147483647Logging level
lstm_choice_iterations5Sets the number of cascading iterations for the Beamsearch in lstm_choice_mode. Note that lstm_choice_mode must be set to a value greater than 0 to produce results.
lstm_choice_mode0Allows to include alternative symbols choices in the hOCR output. Valid input values are 0, 1 and 2. 0 is the default value. With 1 the alternative symbol choices per timestep are included. With 2 alternative symbol choices are extracted from the CTC process instead of the lattice. The choices are mapped per character.
lstm_rating_coefficient5Sets the rating coefficient for the lstm choices. The smaller the coefficient, the better are the ratings for each choice and less information is lost due to the cut off at 0. The standard value is 5
lstm_use_matrix1Use ratings matrix/beam search with lstm
matcher_avg_noise_size12Avg. noise blob length
matcher_bad_match_pad0.15Bad Match Pad (0-1)
matcher_clustering_max_angle_delta0.015Maximum angle delta for prototype clustering
matcher_debug_flags0Matcher Debug Flags
matcher_debug_level0Matcher Debug Level
matcher_debug_separate_windows0Use two different windows for debugging the matching: One for the protos and one for the features.
matcher_good_threshold0.125Good Match (0-1)
matcher_min_examples_for_prototyping3Reliable Config Threshold
matcher_perfect_threshold0.02Perfect Match (0-1)
matcher_permanent_classes_min1Min # of permanent classes
matcher_rating_margin0.1New template margin (0-1)
matcher_reliable_adaptive_result0Great Match (0-1)
matcher_sufficient_examples_for_prototyping5Enable adaption even if the ambiguities have not been seen
max_permuter_attempts10000Maximum number of different character choices to consider during permutation. This limit is especially useful when user patterns are specified, since overly generic patterns can result in dawg search exploring an overly large number of options.
merge_fragments_in_matrix1Merge the fragments in the ratings matrix and delete them after merging
min_characters_to_try50Specify minimum characters to try during OSD
min_orientation_margin7Min acceptable orientation margin
min_sane_x_ht_pixels8Reject any x-ht lt or eq than this
multilang_debug_level0Print multilang debug info.
noise_cert_basechar-8Hingepoint for base char certainty
noise_cert_disjoint-1Hingepoint for disjoint certainty
noise_cert_factor0.375Scaling on certainty diff from Hingepoint
noise_cert_punc-3Threshold for new punc char certainty
noise_maxperblob8Max diacritics to apply to a blob
noise_maxperword16Max diacritics to apply to a word
numeric_punctuation".,"Punct. chs expected WITHIN numbers
ocr_devanagari_split_strategy0Whether to use the top-line splitting process for Devanagari documents while performing ocr.
ok_repeated_ch_non_alphanum_wds"-?*="Allow NN to unrej
oldbl_corrfix1Improve correlation of heights
oldbl_dot_error_size1.26Max aspect ratio of a dot
oldbl_holed_losscount10Max lost before fallback line used
oldbl_xhfix0Fix bug in modes threshold for xheights
oldbl_xhfract0.4Fraction of est allowed in calc
outlines_2"ij!?%\":;"Non standard number of outlines
outlines_odd"%| "Non standard number of outlines
output_ambig_words_file""Output file for ambiguities found in the dictionary
page_separator"\u{c}"Page separator (default is form feed control character)
page_xml_level0Create the PAGE file on 0=line or 1=word level.
page_xml_polygon1Create the PAGE file with polygons instead of box values
pageseg_apply_music_mask0Detect music staff and remove intersecting components
pageseg_devanagari_split_strategy0Whether to use the top-line splitting process for Devanagari documents while performing page-segmentation.
paragraph_debug_level0Print paragraph debug info.
paragraph_text_based1Run paragraph detection on the post-text-recognition (more accurate)
pitsync_joined_edge0.75Dist inside big blob for chopping
pitsync_linear_version6Use new fast algorithm
pitsync_offset_freecut_fraction0.25Fraction of cut for free cuts
poly_allow_detailed_fx0Allow feature extractors to see the original outline
poly_debug0Debug old poly
poly_wide_objects_better1More accurate approx on wide things
preserve_interword_spaces0Preserve multiple interword spaces
prioritize_division0Prioritize blob division over chopping
quality_blob_pc0good_quality_doc gte good blobs limit
quality_char_pc0.95good_quality_doc gte good char limit
quality_min_initial_alphas_reqd2alphas in a good word
quality_outline_pc1good_quality_doc lte outline error limit
quality_rej_pc0.08good_quality_doc lte rejection limit
quality_rowrej_pc1.1good_quality_doc gte good char limit
rating_scale1.5Rating scaling factor
rej_1Il_trust_permuter_type1Don’t double check
rej_1Il_use_dict_word0Use dictword test
rej_alphas_in_number_perm0Extend permuter check
rej_trust_doc_dawg0Use DOC dawg in 11l conf. detector
rej_use_good_perm1Individual rejection control
rej_use_sensible_wd0Extend permuter check
rej_use_tess_accepted1Individual rejection control
rej_use_tess_blanks1Individual rejection control
rej_whole_of_mostly_reject_word_fract0.85if >this fract
repair_unchopped_blobs1Fix blobs that aren’t chopped
save_alt_choices1Save alternative paths found during chopping and segmentation search
save_doc_words0Save Document Words
segment_nonalphabetic_script0Don’t use any alphabetic-specific tricks. Set to true in the traineddata config file for scripts that are cursive or inherently fixed-pitch
segment_penalty_dict_case_bad1.3125Default score multiplier for word matches, which may have case issues (lower is better).
segment_penalty_dict_case_ok1.1Score multiplier for word matches that have good case (lower is better).
segment_penalty_dict_frequent_word1Score multiplier for word matches which have good case and are frequent in the given language (lower is better).
segment_penalty_dict_nonword1.25Score multiplier for glyph fragment segmentations which do not match a dictionary word (lower is better).
segment_penalty_garbage1.5Score multiplier for poorly cased strings that are not in the dictionary and generally look like garbage (lower is better).
segsearch_debug_level0SegSearch debug level
segsearch_max_char_wh_ratio2Maximum character width-to-height ratio
segsearch_max_futile_classifications20Maximum number of pain point classifications per chunk that did not result in finding a better word choice.
segsearch_max_pain_points2000Maximum number of pain points stored in the queue
speckle_large_max_size0.3Max large speckle size
speckle_rating_penalty10Penalty to add to worst rating for noise
stopper_allowable_character_badness3Max certainty variation allowed in a word (in sigma)
stopper_certainty_per_char-0.5Certainty to add for each dict char above small word size.
stopper_debug_level0Stopper debug level
stopper_no_acceptable_choices0Make AcceptableChoice() always return false. Useful when there is a need to explore all segmentations
stopper_nondict_certainty_base-2.5Certainty threshold for non-dict words
stopper_phase2_certainty_rejection_offset1Reject certainty offset
stopper_smallword_size2Size of dict word to be treated as non-dict word
stream_filelist0Stream a filelist from stdin
subscript_max_y_top0.5Maximum top of a character measured as a multiple of x-height above the baseline for us to reconsider whether it’s a subscript.
superscript_bettered_certainty0.97What reduction in badness do we think sufficient to choose a superscript over what we’d thought. For example, a value of 0.6 means we want to reduce badness of certainty by at least 40%
superscript_debug0Debug level for sub & superscript fixer
superscript_min_y_bottom0.3Minimum bottom of a character measured as a multiple of x-height above the baseline for us to reconsider whether it’s a superscript.
superscript_scaledown_ratio0.4A superscript scaled down more than this is unbelievably small. For example, 0.3 means we expect the font size to be no smaller than 30% of the text line font size.
superscript_worse_certainty2How many times worse certainty does a superscript position glyph need to be for us to try classifying it as a char with a different baseline?
suspect_accept_rating-999.9Accept good rating limit
suspect_constrain_1Il0UNLV keep 1Il chars rejected
suspect_level99Suspect marker level
suspect_rating_per_ch999.9Don’t touch bad rating limit
suspect_short_words2Don’t suspect dict wds longer than this
tess_bn_matching0Baseline Normalized Matching
tess_cn_matching0Character Normalized Matching
tessedit_adaption_debug0Generate and print debug information for adaption
tessedit_ambigs_training0Perform training for ambiguities
tessedit_bigram_debug0Amount of debug output for bigram correction.
tessedit_certainty_threshold-2.25Good blob limit
tessedit_char_blacklist""Blacklist of chars not to recognize
tessedit_char_unblacklist""List of chars to override tessedit_char_blacklist
tessedit_char_whitelist""Whitelist of chars to recognize
tessedit_class_miss_scale0.00390625Scale factor for features not used
tessedit_create_alto0Write .xml ALTO file
tessedit_create_boxfile0Output text with boxes
tessedit_create_hocr0Write .html hOCR output file
tessedit_create_lstmbox0Write .box file for LSTM training
tessedit_create_page_xml0Write .page.xml PAGE file
tessedit_create_pdf0Write .pdf output file
tessedit_create_tsv0Write .tsv output file
tessedit_create_txt0Write .txt output file
tessedit_create_wordstrbox0Write WordStr format .box output file
tessedit_debug_block_rejection0Block and Row stats
tessedit_debug_doc_rejection0Page stats
tessedit_debug_fonts0Output font info per char
tessedit_debug_quality_metrics0Output data to debug file
tessedit_display_outwords0Draw output words
tessedit_do_invert1Try inverted line image if necessary (deprecated, will be removed in release 6, use the ‘invert_threshold’ parameter instead)
tessedit_dont_blkrej_good_wds0Use word segmentation quality metric
tessedit_dont_rowrej_good_wds0Use word segmentation quality metric
tessedit_dump_choices0Dump char choices
tessedit_dump_pageseg_images0Dump intermediate images made during page segmentation
tessedit_enable_bigram_correction1Enable correction based on the word bigram dictionary.
tessedit_enable_dict_correction0Enable single word correction based on the dictionary.
tessedit_enable_doc_dict1Add words to the document dictionary
tessedit_fix_fuzzy_spaces1Try to improve fuzzy spaces
tessedit_fix_hyphens1Crunch double hyphens?
tessedit_flip_0O1Contextual 0O O0 flips
tessedit_font_id0Font ID to use or zero
tessedit_good_doc_still_rowrej_wd1.1rej good doc wd if more than this fraction rejected
tessedit_good_quality_unrej1Reduce rejection on good docs
tessedit_image_border2Rej blbs near image edge limit
tessedit_init_config_only0Only initialize with the config file. Useful if the instance is not going to be used for OCR but say only for layout analysis.
tessedit_load_sublangs""List of languages to load with this one
tessedit_lower_flip_hyphen1.5Aspect ratio dot/hyphen test
tessedit_make_boxes_from_boxes0Generate more boxes from boxed chars
tessedit_minimal_rej_pass10Do minimal rejection on pass 1 output
tessedit_minimal_rejection0Only reject tess failures
tessedit_ocr_engine_mode1Which OCR engine(s) to run (Tesseract, LSTM, both). Defaults to loading and running the most accurate available.
tessedit_override_permuter1According to dict_word
tessedit_page_number-1-1 -> All pages, else specific page to process
tessedit_pageseg_mode6Page seg mode: 0=osd only, 1=auto+osd, 2=auto_only, 3=auto, 4=column, 5=block_vert, 6=block, 7=line, 8=word, 9=word_circle, 10=char,11=sparse_text, 12=sparse_text+osd, 13=raw_line (Values from PageSegMode enum in tesseract/publictypes.h)
tessedit_parallelize0Run in parallel where possible
tessedit_prefer_joined_punct0Reward punctuation joins
tessedit_preserve_blk_rej_perfect_wds1Only rej partially rejected words in block rejection
tessedit_preserve_min_wd_len2Only preserve wds longer than this
tessedit_preserve_row_rej_perfect_wds1Only rej partially rejected words in row rejection
tessedit_reject_bad_qual_wds1Reject all bad quality wds
tessedit_reject_block_percent45%rej allowed before rej whole block
tessedit_reject_doc_percent65%rej allowed before rej whole doc
tessedit_reject_mode0Rejection algorithm
tessedit_reject_row_percent40%rej allowed before rej whole row
tessedit_rejection_debug0Adaption debug
tessedit_resegment_from_boxes0Take segmentation and labeling from box file
tessedit_resegment_from_line_boxes0Conversion of word/line box file to char box file
tessedit_row_rej_good_docs1Apply row rejection to good docs
tessedit_tess_adaption_mode39Adaptation decision algorithm for tess
tessedit_test_adaption0Test adaption criteria
tessedit_timing_debug0Print timing stats
tessedit_train_from_boxes0Generate training data from boxed chars
tessedit_train_line_recognizer0Break input into lines and remap boxes if present
tessedit_truncate_wordchoice_log10Max words to keep in list
tessedit_unrej_any_wd0Don’t bother with word plausibility
tessedit_upper_flip_hyphen1.8Aspect ratio dot/hyphen test
tessedit_use_primary_params_model0In multilingual mode use params model of the primary language
tessedit_use_reject_spaces1Reject spaces?
tessedit_whole_wd_rej_row_percent70Number of row rejects in whole word rejects which prevents whole row rejection
tessedit_word_for_word0Make output have exactly one word per WERD
tessedit_write_block_separators0Write block separators in output
tessedit_write_images0Capture the image from the IPE
tessedit_write_params_to_file""Write all parameters to the given file.
tessedit_write_rep_codes0Write repetition char code
tessedit_write_unlv0Write .unlv output file
tessedit_zero_kelvin_rejection0Don’t reject ANYTHING AT ALL
tessedit_zero_rejection0Don’t reject ANYTHING
test_pt0Test for point
test_pt_x100000xcoord
test_pt_y100000ycoord
textonly_pdf0Create PDF with only one invisible text layer
textord_all_prop0All doc is proportial text
textord_ascheight_mode_fraction0.08Min pile height to make ascheight
textord_ascx_ratio_max1.8Max cap/xheight
textord_ascx_ratio_min1.25Min cap/xheight
textord_balance_factor1Ding rate for unbalanced char cells
textord_baseline_debug0Baseline debug level
textord_biased_skewcalc1Bias skew estimates with line length
textord_blockndoc_fixed0Attempt whole doc/block fixed pitch
textord_blocksall_fixed0Moan about prop blocks
textord_blocksall_prop0Moan about fixed pitch blocks
textord_blshift_maxshift0Max baseline shift
textord_blshift_xfraction9.99Min size of baseline shift
textord_chop_width1.5Max width before chopping
textord_chopper_test0Chopper is being tested.
textord_debug_baselines0Debug baseline generation
textord_debug_blob0Print test blob information
textord_debug_block0Block to do debug on
textord_debug_bugs0Turn on output related to bugs in tab finding
textord_debug_pitch_metric0Write full metric stuff
textord_debug_pitch_test0Debug on fixed pitch test
textord_debug_printable0Make debug windows printable
textord_debug_tabfind0Debug tab finding
textord_debug_xheights0Test xheight algorithms
textord_descheight_mode_fraction0.08Min pile height to make descheight
textord_descx_ratio_max0.6Max desc/xheight
textord_descx_ratio_min0.25Min desc/xheight
textord_disable_pitch_test0Turn off dp fixed pitch algorithm
textord_dotmatrix_gap3Max pixel gap for broken pixed pitch
textord_equation_detect0Turn on equation detector
textord_excess_blobsize1.3New row made if blob makes row this big
textord_expansion_factor1Factor to expand rows by in expand_rows
textord_fast_pitch_test0Do even faster pitch algorithm
textord_fix_makerow_bug1Prevent multiple baselines
textord_fix_xheight_bug1Use spline baseline
textord_force_make_prop_words0Force proportional word segmentation on all rows
textord_fp_chop_error2Max allowed bending of chop cells
textord_fpiqr_ratio1.5Pitch IQR/Gap IQR threshold
textord_heavy_nr0Vigorously remove noise
textord_initialasc_ile0.9Ile of sizes for xheight guess
textord_initialx_ile0.75Ile of sizes for xheight guess
textord_interpolating_skew1Interpolate across gaps
textord_linespace_iqrlimit0.2Max iqr/median for linespace
textord_lms_line_trials12Number of linew fits to do
textord_max_blob_overlaps4Max number of blobs a big blob can overlap
textord_max_noise_size7Pixel size of noise
textord_max_pitch_iqr0.2Xh fraction noise in pitch
textord_min_blob_height_fraction0.75Min blob height/top to include blob top into xheight stats
textord_min_blobs_in_row4Min blobs before gradient counted
textord_min_linesize1.25* blob height for initial linesize
textord_min_xheight10Min credible pixel xheight
textord_minxh0.25fraction of linesize for min xheight
textord_new_initial_xheight1Use test xheight mechanism
textord_no_rejects0Don’t remove noise blobs
textord_noise_area_ratio0.7Fraction of bounding box for noise
textord_noise_debug0Debug row garbage detector
textord_noise_hfract0.015625Height fraction to discard outlines as speckle noise
textord_noise_normratio2Dot to norm ratio for deletion
textord_noise_rejrows1Reject noise-like rows
textord_noise_rejwords1Reject noise-like words
textord_noise_rowratio6Dot to norm ratio for deletion
textord_noise_sizefraction10Fraction of size for maxima
textord_noise_sizelimit0.5Fraction of x for big t count
textord_noise_sncount1super norm blobs to save row
textord_noise_sxfract0.4xh fract width error for norm blobs
textord_noise_syfract0.2xh fract height error for norm blobs
textord_noise_translimit16Transitions for normal blob
textord_occupancy_threshold0.4Fraction of neighbourhood
textord_ocropus_mode0Make baselines for ocropus
textord_old_baselines1Use old baseline algorithm
textord_old_xheight0Use old xheight algorithm
textord_oldbl_debug0Debug old baseline generation
textord_oldbl_jumplimit0.15X fraction for new partition
textord_oldbl_merge_parts1Merge suspect partitions
textord_oldbl_paradef1Use para default mechanism
textord_oldbl_split_splines1Split stepped splines
textord_overlap_x0.375Fraction of linespace for good overlap
textord_parallel_baselines1Force parallel baselines
textord_pitch_range2Max range test on pitch
textord_pitch_rowsimilarity0.08Fraction of xheight for sameness
textord_pitch_scalebigwords0Scale scores on big words
textord_projection_scale0.2Ding rate for mid-cuts
textord_really_old_xheight0Use original wiseowl xheight
textord_restore_underlines1Chop underlines & put back
textord_show_blobs0Display unsorted blobs
textord_show_boxes0Display unsorted blobs
textord_show_expanded_rows0Display rows after expanding
textord_show_final_blobs0Display blob bounds after pre-ass
textord_show_final_rows0Display rows after final fitting
textord_show_initial_rows0Display row accumulation
textord_show_initial_words0Display separate words
textord_show_page_cuts0Draw page-level cuts
textord_show_parallel_rows0Display page correlated rows
textord_show_row_cuts0Draw row-level cuts
textord_single_height_mode0Script has no xheight, so use a single mode
textord_skew_ile0.5Ile of gradients for page skew
textord_skew_lag0.02Lag for skew on row accumulation
textord_skewsmooth_offset4For smooth factor
textord_skewsmooth_offset21For smooth factor
textord_space_size_is_variable0If true, word delimiter spaces are assumed to have variable width, even though characters have fixed pitch.
textord_spacesize_ratioprop2Min ratio space/nonspace
textord_spline_medianwin6Size of window for spline segmentation
textord_spline_minblobs8Min blobs in each spline segment
textord_spline_shift_fraction0.02Fraction of line spacing for quad
textord_straight_baselines0Force straight baselines
textord_tabfind_aligned_gap_fraction0.75Fraction of height used as a minimum gap for aligned blobs.
textord_tabfind_find_tables1run table detection
textord_tabfind_force_vertical_text0Force using vertical text page mode
textord_tabfind_only_strokewidths0Only run stroke widths
textord_tabfind_show_finaltabs0Show tab vectors
textord_tabfind_show_images0Show image blobs
textord_tabfind_show_initialtabs0Show tab candidates
textord_tabfind_show_strokewidths0Show stroke widths
textord_tabfind_show_vlines0Debug line finding
textord_tabfind_vertical_text1Enable vertical detection
textord_tabfind_vertical_text_ratio0.5Fraction of textlines deemed vertical to use vertical page mode
textord_tablefind_recognize_tables0Enables the table recognizer for table layout and filtering.
textord_tabvector_vertical_box_ratio0.5Fraction of box matches required to declare a line vertical
textord_tabvector_vertical_gap_fraction0.5max fraction of mean blob width allowed for vertical gaps in vertical text
textord_test_landscape0Tests refer to land/port
textord_test_x-2147483647coord of test pt
textord_test_y-2147483647coord of test pt
textord_testregion_bottom-1Bottom edge of debug rectangle in Leptonica coords (bottom=0/top=height), with horizontal lines x/y-flipped
textord_testregion_left-1Left edge of debug reporting rectangle in Leptonica coords (bottom=0/top=height), with horizontal lines x/y-flipped
textord_testregion_right2147483647Right edge of debug rectangle in Leptonica coords (bottom=0/top=height), with horizontal lines x/y-flipped
textord_testregion_top2147483647Top edge of debug reporting rectangle in Leptonica coords (bottom=0/top=height), with horizontal lines x/y-flipped
textord_underline_offset0.1Fraction of x to ignore
textord_underline_threshold0.5Fraction of width occupied
textord_underline_width2Multiple of line_size for underline
textord_use_cjk_fp_model0Use CJK fixed pitch model
textord_width_limit8Max width of blobs to make rows
textord_words_def_fixed0.016Threshold for definite fixed
textord_words_def_prop0.09Threshold for definite prop
textord_words_default_maxspace3.5Max believable third space
textord_words_default_minspace0.6Fraction of xheight
textord_words_default_nonspace0.2Fraction of xheight
textord_words_definite_spread0.3Non-fuzzy spacing region
textord_words_initial_lower0.25Max initial cluster size
textord_words_initial_upper0.15Min initial cluster spacing
textord_words_maxspace4Multiple of xheight
textord_words_min_minspace0.3Fraction of xheight
textord_words_minlarge0.75Fraction of valid gaps needed
textord_words_pitchsd_threshold0.04Pitch sync threshold
textord_words_veto_power5Rows required to outvote a veto
textord_wordstats_smooth_factor0.05Smoothing gap stats
textord_xheight_error_margin0.1Accepted variation
textord_xheight_mode_fraction0.4Min pile height to make xheight
thresholding_debug0Debug the thresholding process
thresholding_kfactor0.34Factor for reducing threshold due to variance. This parameter is used by the Sauvola thresholding method. Normal range: 0.2-0.5
thresholding_method0Thresholding method: 0 = Otsu, 1 = LeptonicaOtsu, 2 = Sauvola
thresholding_score_fraction0.1Fraction of the max Otsu score. This parameter is used by the LeptonicaOtsu thresholding method. For standard Otsu use 0.0, otherwise 0.1 is recommended
thresholding_smooth_kernel_size0Size of convolution kernel applied to threshold array (to be multiplied by image DPI). Use 0 for no smoothing. This parameter is used by the LeptonicaOtsu thresholding method
thresholding_tile_size0.33Desired tile size (to be multiplied by image DPI). This parameter is used by the LeptonicaOtsu thresholding method
thresholding_window_size0.33Window size for measuring local statistics (to be multiplied by image DPI). This parameter is used by the Sauvola thresholding method
tosp_all_flips_fuzzy0Pass ANY flip to context?
tosp_block_use_cert_spaces1Only stat OBVIOUS spaces
tosp_debug_level0Debug data
tosp_dont_fool_with_small_kerns-1Limit use of xht gap with odd small kns
tosp_enough_small_gaps0.65Fract of kerns reqd for isolated row stats
tosp_enough_space_samples_for_median3or should we use mean
tosp_few_samples40No.gaps reqd with 1 large gap to treat as a table
tosp_flip_caution0Don’t autoflip kn to sp when large separation
tosp_flip_fuzz_kn_to_sp1Default flip
tosp_flip_fuzz_sp_to_kn1Default flip
tosp_force_wordbreak_on_punct0Force word breaks on punct to break long lines in non-space delimited langs
tosp_fuzzy_kn_fraction0.5New fuzzy kn alg
tosp_fuzzy_limit_all1Don’t restrict kn->sp fuzzy limit to tables
tosp_fuzzy_sp_fraction0.5New fuzzy sp alg
tosp_fuzzy_space_factor0.6Fract of xheight for fuzz sp
tosp_fuzzy_space_factor10.5Fract of xheight for fuzz sp
tosp_fuzzy_space_factor20.72Fract of xheight for fuzz sp
tosp_gap_factor0.83gap ratio to flip sp->kern
tosp_ignore_big_gaps-1xht multiplier
tosp_ignore_very_big_gaps3.5xht multiplier
tosp_improve_thresh0Enable improvement heuristic
tosp_init_guess_kn_mult2.2Thresh guess - mult kn by this
tosp_init_guess_xht_mult0.28Thresh guess - mult xht by this
tosp_kern_gap_factor12gap ratio to flip kern->sp
tosp_kern_gap_factor21.3gap ratio to flip kern->sp
tosp_kern_gap_factor32.5gap ratio to flip kern->sp
tosp_large_kerning0.19Limit use of xht gap with large kns
tosp_max_sane_kn_thresh5Multiplier on kn to limit thresh
tosp_min_sane_kn_sp1.5Don’t trust spaces less than this time kn
tosp_narrow_aspect_ratio0.48narrow if w/h less than this
tosp_narrow_blobs_not_cert1Only stat OBVIOUS spaces
tosp_narrow_fraction0.3Fract of xheight for narrow
tosp_near_lh_edge0Don’t reduce box if the top left is non blank
tosp_old_sp_kn_th_factor2Factor for defining space threshold in terms of space and kern sizes
tosp_old_to_bug_fix0Fix suspected bug in old code
tosp_old_to_constrain_sp_kn0Constrain relative values of inter and intra-word gaps for old_to_method.
tosp_old_to_method0Space stats use prechopping?
tosp_only_small_gaps_for_kern0Better guess
tosp_only_use_prop_rows1Block stats to use fixed pitch rows?
tosp_only_use_xht_gaps0Only use within xht gap for wd breaks
tosp_pass_wide_fuzz_sp_to_context0.75How wide fuzzies need context
tosp_recovery_isolated_row_stats1Use row alone when inadequate cert spaces
tosp_redo_kern_limit10No.samples reqd to reestimate for row
tosp_rep_space1.6rep gap multiplier for space
tosp_row_use_cert_spaces1Only stat OBVIOUS spaces
tosp_row_use_cert_spaces11Only stat OBVIOUS spaces
tosp_rule_9_test_punct0Don’t chng kn to space next to punct
tosp_sanity_method1How to avoid being silly
tosp_short_row20No.gaps reqd with few cert spaces to use certs
tosp_silly_kn_sp_gap0.2Don’t let sp minus kn get too small
tosp_stats_use_xht_gaps1Use within xht gap for wd breaks
tosp_table_fuzzy_kn_sp_ratio3Fuzzy if less than this
tosp_table_kn_sp_ratio2.25Min difference of kn & sp in table
tosp_table_xht_sp_ratio0.33Expect spaces bigger than this
tosp_threshold_bias10how far between kern and space?
tosp_threshold_bias20how far between kern and space?
tosp_use_pre_chopping0Space stats use prechopping?
tosp_use_xht_gaps1Use within xht gap for wd breaks
tosp_wide_aspect_ratio0wide if w/h less than this
tosp_wide_fraction0.52Fract of xheight for wide
unlv_tilde_crunching0Mark v.bad words for tilde crunch
unrecognised_char"|"Output char for unidentified blobs
use_ambigs_for_adaption0Use ambigs for deciding whether to adapt to a character
use_only_first_uft8_step0Use only the first UTF8 step of the given string when computing log probabilities.
user_defined_dpi0Specify DPI for input image
user_patterns_file""A filename of user-provided patterns.
user_patterns_suffix""A suffix of user-provided patterns located in tessdata.
user_words_file""A filename of user-provided words.
user_words_suffix""A suffix of user-provided words located in tessdata.
word_to_debug""Word for which stopper debug information should be printed to stdout
wordrec_debug_blamer0Print blamer debug messages
wordrec_debug_level0Debug level for wordrec
wordrec_display_segmentations0Display Segmentations (ScrollView)
wordrec_display_splits0Display splits
wordrec_enable_assoc1Associator Enable
wordrec_max_join_chunks4Max number of broken pieces to associate
wordrec_run_blamer0Try to set the blame for errors
wordrec_skip_no_truth_words0Only run OCR for words that had truth recorded in BlamerBundle
words_default_fixed_limit0.6Allowed size variance
words_default_fixed_space0.75Fraction of xheight
words_default_prop_nonspace0.25Fraction of xheight
words_initial_lower0.5Max initial cluster size
words_initial_upper0.15Min initial cluster spacing
x_ht_acceptance_tolerance8Max allowed deviation of blob top outside of font data
x_ht_min_change8Min change in xht before actually trying it
xheight_penalty_inconsistent0.25Score penalty (0.1 = 10%) added if an xheight is inconsistent.
xheight_penalty_subscripts0.125Score penalty (0.1 = 10%) added if there are subscripts or superscripts in a word, but it is otherwise OK.
Source

pub fn set_debug_variable( &mut self, name: &CStr, value: &CStr, ) -> Result<(), InvalidVariable>

Set tesseract variable.

Includes debug variables.

See set_variable for more information.

Source

pub fn get_variable_i32(&self, name: &CStr) -> Option<i32>

Get integer variable value.

Source

pub fn get_variable_bool(&self, name: &CStr) -> Option<bool>

Get boolean variable value.

Source

pub fn get_variable_f64(&self, name: &CStr) -> Option<f64>

Get floating point variable value.

Source

pub fn get_variable_c_str<'a>(&'a self, name: &CStr) -> Option<&'a CStr>

Get string variable value.

Source

pub fn read_config_file(&mut self, filename: &Path) -> Result<()>

Read variables from the configuration file.

Doesn’t include debug variables.

This is a re-implementation of the original library call that handles I/O errors and invalid variables.

Source

pub fn read_debug_config_file(&mut self, filename: &Path) -> Result<()>

Read variables from the configuration file.

Includes debug variables.

This is a re-implementation of the original library call that handles I/O errors and invalid variables.

Source

pub fn print_variables_to_file( &self, filename: &CStr, ) -> Result<(), WriteFailed>

Print all variables to a file.

Trait Implementations§

Source§

impl Deref for TextRecognizer

Source§

type Target = Tesseract

The resulting type after dereferencing.
Source§

fn deref(&self) -> &Self::Target

Dereferences the value.
Source§

impl DerefMut for TextRecognizer

Source§

fn deref_mut(&mut self) -> &mut Self::Target

Mutably dereferences the value.

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<P, T> Receiver for P
where P: Deref<Target = T> + ?Sized, T: ?Sized,

Source§

type Target = T

🔬This is a nightly-only experimental API. (arbitrary_self_types)
The target type on which the method may be called.
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.