polars-view 0.29.0

A fast and interactive viewer for CSV, Json and Parquet data.
Documentation

PolarsView

Crates.io Documentation License Rust

Polars View

A fast and interactive viewer for CSV, JSON (including Newline-Delimited JSON - NDJSON), and Apache Parquet files, built with Polars and egui.

This project is inspired by and initially forked from the parqbench project.

Features

  • Fast Data Handling: Leverages the high-performance Polars DataFrame library for efficient loading, processing, and querying.
  • Multiple File Format Support:
    • Load data from: CSV, JSON, NDJSON, Parquet.
    • Save data as: CSV, JSON, NDJSON, Parquet (via "Save As...").
  • Interactive Table View:
    • Displays data in a scrollable and resizable table using egui_extras::TableBuilder.
    • Sorting: Click column headers to sort the entire DataFrame (cycles through Not Sorted → Descending ↔ Ascending).
    • Column Sizing: Choose between automatically sizing columns to content (Format > Expand Cols = true) or using faster initial calculated widths (Expand Cols = false).
  • SQL Querying: Filter and transform data using Polars' SQL interface. Specify the query in the "Query" panel and click "Apply SQL Commands".
  • Configuration Panels:
    • Metadata: Displays file information (row count, column count).
    • Schema: Shows column names and their Polars data types (right-click column name to copy).
    • Format:
      • Alignment: Customize text alignment (Left, Center, Right) for different data types.
      • Decimals: Control the number of decimal places displayed for float columns.
      • Expand Cols: Toggle column auto-sizing behavior.
    • Query:
      • SQL Query: Enter SQL commands (default table name: AllData).
      • Remove Null Cols: Option to automatically drop columns containing only null values upon loading or applying SQL.
      • Schema Inference Length: (CSV/JSON/NDJSON) Control rows used for schema detection.
      • CSV Delimiter: Specify the delimiter (auto-detection attempted).
      • Null Values (CSV): Define custom strings (comma-separated) to be interpreted as nulls (e.g., "", "NA", <N/A>).
      • SQL Examples: Provides context-aware SQL command suggestions based on the loaded data schema.
  • Drag and Drop: Load files by dragging and dropping them onto the application window.
  • Asynchronous Operations: Uses Tokio for non-blocking file loading, saving, sorting, and SQL execution, keeping the UI responsive. Data state updates (load, sort, format) happen asynchronously and results are seamlessly integrated back into the UI.
  • Robust Error Handling: Utilizes a custom PolarsViewError enum and displays errors clearly in a non-blocking notification window.
  • Theming: Switch between Light and Dark themes.

Architecture Overview

PolarsView uses eframe for the application framework and egui for the immediate-mode GUI. Core application state (PolarsViewApp) manages UI layout, event handling, and orchestrates background tasks via a shared tokio runtime.

Data and its associated configuration (filters, format) are held within DataFrameContainer, primarily using Arc to allow cheap cloning and sharing between the main UI thread and asynchronous tasks spawned by tokio. State updates (loading new data, applying sorts, changing format) typically result in creating a new Arc<DataFrameContainer> instance. Communication between the UI thread and completed async tasks uses tokio::sync::oneshot channels.

Change detection for UI settings (in Format and Query panels) relies on comparing the state of DataFormat or DataFilters before and after rendering the UI controls within a single frame. If a difference is detected, the corresponding async update function is triggered.

The main table rendering relies heavily on egui_extras::TableBuilder for performance, and formatting/alignment logic is delegated based on DataFormat settings.

For a visual representation of module relationships, see chart.txt.

Building and Running

  1. Prerequisites:

    • Rust and Cargo (latest stable version recommended, minimum version 1.85 as defined in Cargo.toml).
  2. Clone the Repository:

    git clone https://github.com/claudiofsr/polars-view.git
    cd polars-view
    
  3. Build and Install (Release Mode):

    cargo build --release && cargo install --path=.
    # Or to install with the 'special' formatting feature (see decimal_and_layout_v2.rs):
    # cargo build --release --features special && cargo install --path=. --features special
    

    This compiles the application in release mode (optimized) and installs the binary (polars-view) into your Cargo bin directory (~/.cargo/bin/ by default), making it available in your PATH.

  4. Run:

    polars-view [path_to_file] [options]
    
    • If [path_to_file] is provided, the application will attempt to load it on startup. Supported formats: .csv, .json, .ndjson, .parquet.

    • Use polars-view --help for a detailed list of available command-line options (e.g., --delimiter, --query, --table-name, --null-values).

    • Logging/Tracing: Control log output using the RUST_LOG environment variable:

      • export RUST_LOG=info (General information)
      • export RUST_LOG=debug (Detailed information for debugging)
      • export RUST_LOG=trace (Very detailed, for granular debugging)
      • Combine levels: export RUST_LOG=polars_view=debug,polars=info
      • Run directly: RUST_LOG=debug polars-view data.parquet
    • Examples:

      polars-view sales_data.parquet
      polars-view --delimiter="|" transactions.csv --null-values="N/A,-"
      polars-view data.csv -q "SELECT category, SUM(value) AS total FROM AllData WHERE date > '2023-01-01' GROUP BY category"
      # Using backticks or double quotes for column names with spaces:
      polars-view items.csv -q "SELECT `Item Name`, Price FROM AllData WHERE Price > 100.0"
      polars-view logs.ndjson -q 'SELECT timestamp, level, message FROM AllData WHERE level = "ERROR"'
      RUST_LOG=info polars-view big_dataset.parquet
      

Usage Guide

  • Opening Files:
    • Provide the file path as a command-line argument.
    • Use the "File" > "Open File..." menu (Ctrl+O).
    • Drag and drop a supported file onto the application window.
  • Viewing Data:
    • The main panel displays the data in a table. Use horizontal and vertical scrollbars if needed.
    • Column headers show the column names. Click to sort.
    • Adjust column widths by dragging the separators between headers.
  • Configuring View & Data:
    • Expand the panels on the left ("Metadata", "Schema", "Format", "Query") to view information and change settings.
    • Format Panel: Adjust alignment per data type, set float precision, and toggle column expansion (Expand Cols). Changes trigger an efficient asynchronous update.
    • Query Panel: Set CSV options (delimiter, nulls, schema inference), toggle Remove Null Cols, define and apply SQL queries. Applying SQL or changing most query settings triggers an asynchronous data reload/requery.
  • Applying SQL:
    • Enter your query in the "SQL Query" text area (using AllData as the default table name unless changed via CLI or config).
    • Click "Apply SQL Commands". The table will update after the query executes asynchronously. Refer to Polars SQL documentation.
    • Check the dynamically generated "SQL Command Examples" for syntax relevant to your data.
  • Saving Data:
    • Save (Ctrl+S): Overwrites the original file (if applicable) with the currently displayed data (after filtering/sorting). Use with caution.
    • Save As... (Ctrl+A): Opens a dialog to save the currently displayed data to a new file. You can choose the output format (CSV, JSON, NDJSON, Parquet) and location.
  • Exiting: Use "File" > "Exit" or close the window.

Core Dependencies

  • Polars: High-performance DataFrame library (CSV, JSON, Parquet, SQL features enabled).
  • eframe / egui: Immediate-mode GUI framework.
  • egui_extras: Additional widgets for egui (TableBuilder).
  • tokio: Asynchronous runtime for background tasks.
  • clap: Command-line argument parsing.
  • tracing / tracing-subscriber: Application logging.
  • thiserror: Error handling boilerplate.
  • rfd: Native file dialogs.
  • cfg-if: Conditional compilation helpers.

License

This project is licensed under the MIT License.