Crate polars_view

Source
Expand description

§PolarsView

Crates.io Documentation License Rust

Polars View

A fast and interactive viewer for CSV, JSON (including Newline-Delimited JSON - NDJSON), and Apache Parquet files, built with Polars and egui.

This project is inspired by and initially forked from the parqbench project.

§Features

  • Fast Data Handling: Uses the Polars DataFrame library for efficient data loading, processing, and querying.
  • Multiple File Format Support:
    • Load data from: CSV, JSON, NDJSON (Newline-Delimited JSON), Parquet.
    • Save data as: CSV, JSON, NDJSON, Parquet (via “Save As…” [Ctrl+A]).
  • Interactive Table View:
    • Supports sorting by multiple columns simultaneously: Click column header icons to sort the entire DataFrame asynchronously. The order of clicks determines sort precedence. The 5-state cycle for each column controls direction and null placement:

      • : Not Sorted
      • : Descending, Nulls First
      • : Ascending, Nulls First
      • : Descending, Nulls Last
      • : Ascending, Nulls Last
      • : Back to Not Sorted

      (Numbers indicate sort precedence if multiple columns are sorted)

    • Customizable Header: Toggle visual style (“Enhanced Header”), adjust vertical padding (“Header Padding”).

    • Column Sizing: Choose automatic content-based sizing (“Auto Col Width”: true) or faster fixed initial widths (“Auto Col Width”: false). Manually resize columns by dragging separators.

  • SQL Querying: Filter and transform data using Polars’ SQL interface. Execute queries asynchronously via the “Query” panel.
  • String Column Number Normalization (CLI): Use the --regex (-r) argument to select string columns (via wildcard * or a ^...$ regex pattern matching column names) containing European-style numbers (e.g., ‘1.234,56’) and convert them to standard Float64 format (e.g., 1234.56) on load.
  • Configuration Panels (Side Bar):
    • Info: Displays file dimensions (rows, columns).
    • Format: Set text alignment, float decimal places, column width strategy, header style, and header padding.
    • Query: Configure SQL query, add optional row index column (with custom name/offset), normalize columns, null column removal, remove columns by regex, schema inference rows (CSV/JSON/NDJSON), CSV delimiter, custom CSV null values, and view SQL examples.
    • Columns: Shows column names and Polars data types. Right-click a column name to copy it.
  • Asynchronous Operations: Utilizes Tokio for non-blocking file I/O, sorting, and SQL execution, keeping the UI responsive. Shows a spinner during processing.
  • Drag and Drop: Load files by dropping them onto the application window.
  • Robust Error Handling: Displays errors (file loading, parsing, SQL, etc.) in a non-blocking notification window.
  • Theming: Switch between Light and Dark themes via the menu bar.
  • Persistence: Remembers window size and position between sessions.

§Building and Running

  1. Prerequisites:

    • Rust and Cargo (latest stable version recommended, minimum version 1.86 and edition 2024).
  2. Clone the Repository:

    git clone https://github.com/claudiofsr/polars-view.git
    cd polars-view
  3. Build and Install (Release Mode):

    # Build with default features (uses 'format-simple')
    cargo b -r && cargo install --path=.
    
    # --- OR Build with Specific Features ---
    # Example: Build with 'format-special' (formats 'Alíq'/'Aliq' columns differently)
    cargo b -r && cargo install --path=. --features format-special

    This compiles optimized code and installs the polars-view binary to ~/.cargo/bin/.

  4. Run:

    polars-view [path_to_file] [options]
    • If [path_to_file] is provided (CSV, JSON, NDJSON, Parquet), it’s loaded on startup.

    • Run polars-view --help for command-line options (--delimiter, --exclude-null-cols, --null-values, --query, --regex, --table-name).

    • Logging/Tracing: Control log detail using the RUST_LOG environment variable (values: error, warn, info, debug, trace). Remember to export it before running:

      # Example: Run with debug level logging
      export RUST_LOG=debug
      polars-view data.parquet
    • Examples:

      polars-view sales_data.parquet
      polars-view --delimiter="|" transactions.csv --null-values="N/A,-"
      polars-view data.csv -q "SELECT category, SUM(value) AS total FROM AllData GROUP BY category"
      # Normalize Euro numbers in columns matching "^Value.*$"
      polars-view data.csv --regex "^Value.*$"
      # Normalize Euro numbers in ALL string columns (Use with caution!)
      polars-view data.csv -r "*"
      # Use backticks/quotes for names with spaces/special chars
      polars-view items.csv -q "SELECT \`Item Name\`, Price FROM AllData WHERE Price > 100.0"
      polars-view logs.ndjson -q 'SELECT timestamp, message FROM AllData WHERE level = "ERROR"'
      # Exclude all null columns on load
      polars-view big_dataset.parquet --exclude-null-cols

§Usage Guide

  • Opening Files: Use the command line, “File” > “Open File…” (Ctrl+O), or drag & drop.
  • Viewing Data: Scroll the table. Click header icons to apply/cycle sorting (supports multiple columns; order matters). Drag header separators to resize columns.
  • Configuring View & Data: Use left-side panels (“Info”, “Format”, “Query”, “Columns”). Format changes update the view efficiently; Query/Filter changes trigger an asynchronous data reload/requery.
  • Applying SQL: Enter query in “Query” panel (default table: AllData). Click “Apply SQL Commands”. See examples or Polars SQL docs.
  • Saving Data:
    • Save (Ctrl+S): Overwrites the original file path.
    • Save As… (Ctrl+A): Saves current view to a new file. Choose format (CSV, JSON, NDJSON, Parquet) via dialog.
  • Exiting: Use “File” > “Exit” or close the window.

§Core Dependencies

  • GUI Framework: eframe, egui, egui_extras
  • Data Handling: polars (with features like lazy, csv, json, parquet, sql)
  • Asynchronous Runtime: tokio (with features like rt, sync, rt-multi-thread)
  • Command Line: clap, anstyle
  • File Dialogs: rfd
  • Logging/Diagnostics: tracing, tracing-subscriber
  • Utilities: regex, thiserror, cfg-if, env_logger (non-wasm)

§License

This project is licensed under the MIT License.

Structs§

Arguments
Command-line arguments for the PolarsView application.
DataContainer
Container for the Polars DataFrame and its associated display and filter state.
DataFilter
Holds configuration parameters related to loading and querying data.
DataFormat
Holds user-configurable settings for data presentation in the table.
Error
Notification struct for displaying error messages. Implements Notification.
FileInfo
Represents file information.
PolarsViewApp
The main application struct for PolarsView, holding the entire UI and async state.
Settings
Placeholder Notification struct for future Settings dialog. Implements Notification.
SortBy
Represents a single criterion for sorting. Used within DataFrameContainer to store the cumulative sort order as Vec<SortBy>. The order of criteria in the Vec determines sort precedence.

Enums§

FileExtension
Represents the extension of a file.
HeaderSortState
Represents the interaction state for sorting a specific column header in the UI.
PolarsViewError
Custom error type for Polars View.

Constants§

CUSTOM_TEXT_STYLE
Defines custom text styles for the egui context. Overrides default egui font sizes for different logical text styles (Heading, Body, etc.). Used by MyStyle::set_style_init.
DEFAULT_INDEX_COLUMN_NAME
Default name for the row number column if added.
DEFAULT_OVERRIDE_REGEX
DEFAULT_QUERY
The default SQL query, selected when the application starts or when examples are unavailable.
MAX_ATTEMPTS

Statics§

DEFAULT_ALIGNMENTS
A static, lazily initialized map defining the default text alignments for various Polars DataTypes used in the egui table.
DEFAULT_CSV_DELIMITER
Default delimiter used for CSV parsing if not specified or detected. Using &'static str for common, immutable delimiters saves memory allocation.
NULL_VALUES
Static string listing common values treated as null/missing during CSV parsing. The r#""# syntax denotes a raw string literal, avoiding the need to escape quotes.

Traits§

MyStyle
A trait for applying custom styling to the egui context (Context). Used once at startup by layout.rs::PolarsViewApp::new.
Notification
Trait for modal Notification windows (like errors or settings dialogs). Allows layout.rs to manage different notification types polymorphically via Box<dyn Notification>.
PathExtension
Trait to extend Path with a convenient method for getting the lowercase file extension. Used by extension.rs, file_dialog.rs, filters.rs.
SeriesExtension
Define a trait to add building capabilities directly to the Series type.
SortableHeaderRenderer
Trait defining a widget for rendering a sortable table header cell. Provides a consistent interface for container.rs::render_table_header.
UniqueElements
A trait for deduplicating vectors while preserving the original order of elements. Added to Vec<T>. Used by filters.rs for delimiter guessing.

Functions§

add_row_index_column
Conditionally adds a row index column to a DataFrame based on an explicit Option<RowIndex>.
build_null_expression
Builds a Polars Expression to replace specified string values (after trimming) with NULL within selected columns of a DataFrame.
drop_columns_by_regex
Drops columns from a DataFrame whose names match the provided regex pattern.
get_decimal_and_layout
Determines the layout for a given column based on its data type and, crucially, the alignment settings from DataFilter.
normalize_float_strings_by_regex
Normalizes string columns containing numeric values formatted with non-standard separators (e.g., ‘.’ for thousands, ‘,’ for decimals) to standard numeric format (‘.’ for decimals, no thousands separators) and then casts them to Float64.
open_file
Opens a file dialog asynchronously, allowing the user to choose a file.
read_csv_partial_from_path
Reads a CSV file from the specified path using Polars, applying given options and limiting the number of data rows read.
remove_null_columns
Removes columns from the DataFrame that consist entirely of null values.
replace_values_with_null
Replaces values with null based on a list of matching strings, with options to apply to all columns or only string columns.
save
Saves the DataFrame contained in DataContainer to a file.
save_as
Saves the DataFrame to a file asynchronously, handling CSV, Json, NDJson and Parquet formats. The user is presented with a file dialog to choose the save location and format.
sql_commands
Generates a list of example SQL commands based on the provided DataFrame schema. Uses helper functions to find suitable columns and generate diverse examples.

Type Aliases§

ContainerResult
Type alias for a Result specifically wrapping a DataContainer on success. Simplifies function signatures involving potential data loading/processing errors.
DataFuture
Type alias for a boxed, dynamically dispatched Future that yields a ContainerResult. This allows storing and managing different asynchronous operations (load, sort, format) that all eventually produce a DataContainer or an error.
PolarsViewResult
Result type to simplify function signatures.