Skip to main content

parse_cell_grid_info

Function parse_cell_grid_info 

Source
pub fn parse_cell_grid_info(tokens: &[String]) -> Vec<CellGridInfo>
Expand description

Parses structure tokens to extract grid position and span info for each cell.

This function walks through HTML structure tokens and tracks row/column positions, accounting for colspan and rowspan attributes. The returned vector has one entry per <td> cell in the same order as the bboxes.

§Arguments

  • tokens - Structure tokens from table structure recognition

§Returns

A vector of CellGridInfo for each cell, in order of appearance.

§Example

let tokens = vec![
    "<tr>".to_string(),
    "<td></td>".to_string(),
    "<td colspan=\"2\"></td>".to_string(),
    "</tr>".to_string(),
    "<tr>".to_string(),
    "<td></td>".to_string(),
    "<td></td>".to_string(),
    "<td></td>".to_string(),
    "</tr>".to_string(),
];
let grid_info = parse_cell_grid_info(&tokens);
// First row: cell at (0,0), cell at (0,1) spanning 2 cols
// Second row: cells at (1,0), (1,1), (1,2)