Expand description
§PolarsView
A fast and interactive viewer for CSV, JSON (including Newline-Delimited JSON - NDJSON), and Apache Parquet files, built with Polars and egui.
This project is inspired by and initially forked from the parqbench project.
§Features
- Fast Data Handling: Uses the Polars DataFrame library for efficient data loading, processing, and querying.
- Multiple File Format Support:
- Load data from: CSV, JSON, NDJSON (Newline-Delimited JSON), Parquet.
- Save data as: CSV, JSON, NDJSON, Parquet (via “Save As…” [Ctrl+A]).
- Interactive Table View:
-
Supports sorting by multiple columns simultaneously: Click column header icons to sort the entire DataFrame asynchronously. The order of clicks determines sort precedence. The 5-state cycle for each column controls direction and null placement:
↕
: Not Sorted⏷
: Descending, Nulls First⏶
: Ascending, Nulls First⬇
: Descending, Nulls Last⬆
: Ascending, Nulls Last↕
: Back to Not Sorted
(Numbers indicate sort precedence if multiple columns are sorted)
-
Customizable Header: Toggle visual style (“Enhanced Header”), adjust vertical padding (“Header Padding”).
-
Column Sizing: Choose automatic content-based sizing (“Auto Col Width”: true) or faster fixed initial widths (“Auto Col Width”: false). Manually resize columns by dragging separators.
-
- SQL Querying: Filter and transform data using Polars’ SQL interface. Execute queries asynchronously via the “Query” panel.
- String Column Number Normalization (CLI): Use the
--regex
(-r
) argument to select string columns (via wildcard*
or a^...$
regex pattern matching column names) containing European-style numbers (e.g., ‘1.234,56’) and convert them to standard Float64 format (e.g., 1234.56) on load. - Configuration Panels (Side Bar):
- Info: Displays file dimensions (rows, columns).
- Format: Set text alignment, float decimal places, column width strategy, header style, and header padding.
- Query: Configure SQL query, add optional row index column (with custom name/offset), normalize columns, null column removal, remove columns by regex, schema inference rows (CSV/JSON/NDJSON), CSV delimiter, custom CSV null values, and view SQL examples.
- Columns: Shows column names and Polars data types. Right-click a column name to copy it.
- Asynchronous Operations: Utilizes Tokio for non-blocking file I/O, sorting, and SQL execution, keeping the UI responsive. Shows a spinner during processing.
- Drag and Drop: Load files by dropping them onto the application window.
- Robust Error Handling: Displays errors (file loading, parsing, SQL, etc.) in a non-blocking notification window.
- Theming: Switch between Light and Dark themes via the menu bar.
- Persistence: Remembers window size and position between sessions.
§Building and Running
-
Prerequisites:
- Rust and Cargo (latest stable version recommended, minimum version 1.86 and edition 2024).
-
Clone the Repository:
git clone https://github.com/claudiofsr/polars-view.git cd polars-view
-
Build and Install (Release Mode):
# Build with default features (uses 'format-simple') cargo b -r && cargo install --path=. # --- OR Build with Specific Features --- # Example: Build with 'format-special' (formats 'Alíq'/'Aliq' columns differently) cargo b -r && cargo install --path=. --features format-special
This compiles optimized code and installs the
polars-view
binary to~/.cargo/bin/
. -
Run:
polars-view [path_to_file] [options]
-
If
[path_to_file]
is provided (CSV, JSON, NDJSON, Parquet), it’s loaded on startup. -
Run
polars-view --help
for command-line options (--delimiter
,--exclude-null-cols
,--null-values
,--query
,--regex
,--table-name
). -
Logging/Tracing: Control log detail using the
RUST_LOG
environment variable (values:error
,warn
,info
,debug
,trace
). Remember toexport
it before running:# Example: Run with debug level logging export RUST_LOG=debug polars-view data.parquet
-
Examples:
polars-view sales_data.parquet polars-view --delimiter="|" transactions.csv --null-values="N/A,-" polars-view data.csv -q "SELECT category, SUM(value) AS total FROM AllData GROUP BY category" # Normalize Euro numbers in columns matching "^Value.*$" polars-view data.csv --regex "^Value.*$" # Normalize Euro numbers in ALL string columns (Use with caution!) polars-view data.csv -r "*" # Use backticks/quotes for names with spaces/special chars polars-view items.csv -q "SELECT \`Item Name\`, Price FROM AllData WHERE Price > 100.0" polars-view logs.ndjson -q 'SELECT timestamp, message FROM AllData WHERE level = "ERROR"' # Exclude all null columns on load polars-view big_dataset.parquet --exclude-null-cols
-
§Usage Guide
- Opening Files: Use the command line, “File” > “Open File…” (Ctrl+O), or drag & drop.
- Viewing Data: Scroll the table. Click header icons to apply/cycle sorting (supports multiple columns; order matters). Drag header separators to resize columns.
- Configuring View & Data: Use left-side panels (“Info”, “Format”, “Query”, “Columns”). Format changes update the view efficiently; Query/Filter changes trigger an asynchronous data reload/requery.
- Applying SQL: Enter query in “Query” panel (default table:
AllData
). Click “Apply SQL Commands”. See examples or Polars SQL docs. - Saving Data:
- Save (Ctrl+S): Overwrites the original file path.
- Save As… (Ctrl+A): Saves current view to a new file. Choose format (CSV, JSON, NDJSON, Parquet) via dialog.
- Exiting: Use “File” > “Exit” or close the window.
§Core Dependencies
- GUI Framework:
eframe
,egui
,egui_extras
- Data Handling:
polars
(with features likelazy
,csv
,json
,parquet
,sql
) - Asynchronous Runtime:
tokio
(with features likert
,sync
,rt-multi-thread
) - Command Line:
clap
,anstyle
- File Dialogs:
rfd
- Logging/Diagnostics:
tracing
,tracing-subscriber
- Utilities:
regex
,thiserror
,cfg-if
,env_logger
(non-wasm)
§License
This project is licensed under the MIT License.
Structs§
- Arguments
- Command-line arguments for the PolarsView application.
- Data
Container - Container for the Polars DataFrame and its associated display and filter state.
- Data
Filter - Holds configuration parameters related to loading and querying data.
- Data
Format - Holds user-configurable settings for data presentation in the table.
- Error
- Notification struct for displaying error messages. Implements
Notification
. - File
Info - Represents file information.
- Polars
View App - The main application struct for PolarsView, holding the entire UI and async state.
- Settings
- Placeholder Notification struct for future Settings dialog. Implements
Notification
. - SortBy
- Represents a single criterion for sorting.
Used within
DataFrameContainer
to store the cumulative sort order asVec<SortBy>
. The order of criteria in the Vec determines sort precedence.
Enums§
- File
Extension - Represents the extension of a file.
- Header
Sort State - Represents the interaction state for sorting a specific column header in the UI.
- Polars
View Error - Custom error type for Polars View.
Constants§
- CUSTOM_
TEXT_ STYLE - Defines custom text styles for the egui context.
Overrides default
egui
font sizes for different logical text styles (Heading, Body, etc.). Used byMyStyle::set_style_init
. - DEFAULT_
INDEX_ COLUMN_ NAME - Default name for the row number column if added.
- DEFAULT_
OVERRIDE_ REGEX - DEFAULT_
QUERY - The default SQL query, selected when the application starts or when examples are unavailable.
- MAX_
ATTEMPTS
Statics§
- DEFAULT_
ALIGNMENTS - A static, lazily initialized map defining the default text alignments
for various Polars
DataType
s used in theegui
table. - DEFAULT_
CSV_ DELIMITER - Default delimiter used for CSV parsing if not specified or detected.
Using
&'static str
for common, immutable delimiters saves memory allocation. - NULL_
VALUES - Static string listing common values treated as null/missing during CSV parsing.
The
r#""#
syntax denotes a raw string literal, avoiding the need to escape quotes.
Traits§
- MyStyle
- A trait for applying custom styling to the
egui
context (Context
). Used once at startup bylayout.rs::PolarsViewApp::new
. - Notification
- Trait for modal Notification windows (like errors or settings dialogs).
Allows
layout.rs
to manage different notification types polymorphically viaBox<dyn Notification>
. - Path
Extension - Trait to extend
Path
with a convenient method for getting the lowercase file extension. Used byextension.rs
,file_dialog.rs
,filters.rs
. - Series
Extension - Define a trait to add building capabilities directly to the Series type.
- Sortable
Header Renderer - Trait defining a widget for rendering a sortable table header cell.
Provides a consistent interface for
container.rs::render_table_header
. - Unique
Elements - A trait for deduplicating vectors while preserving the original order of elements.
Added to
Vec<T>
. Used byfilters.rs
for delimiter guessing.
Functions§
- add_
row_ index_ column - Conditionally adds a row index column to a DataFrame based on an explicit
Option<RowIndex>
. - build_
null_ expression - Builds a Polars Expression to replace specified string values (after trimming) with NULL within selected columns of a DataFrame.
- drop_
columns_ by_ regex - Drops columns from a DataFrame whose names match the provided regex pattern.
- get_
decimal_ and_ layout - Determines the layout for a given column based on its data type and, crucially, the alignment settings from DataFilter.
- normalize_
float_ strings_ by_ regex - Normalizes string columns containing numeric values formatted with non-standard separators (e.g., ‘.’ for thousands, ‘,’ for decimals) to standard numeric format (‘.’ for decimals, no thousands separators) and then casts them to Float64.
- open_
file - Opens a file dialog asynchronously, allowing the user to choose a file.
- read_
csv_ partial_ from_ path - Reads a CSV file from the specified path using Polars, applying given options and limiting the number of data rows read.
- remove_
null_ columns - Removes columns from the DataFrame that consist entirely of null values.
- replace_
values_ with_ null - Replaces values with null based on a list of matching strings, with options to apply to all columns or only string columns.
- save
- Saves the DataFrame contained in
DataContainer
to a file. - save_as
- Saves the DataFrame to a file asynchronously, handling CSV, Json, NDJson and Parquet formats. The user is presented with a file dialog to choose the save location and format.
- sql_
commands - Generates a list of example SQL commands based on the provided DataFrame schema. Uses helper functions to find suitable columns and generate diverse examples.
Type Aliases§
- Container
Result - Type alias for a
Result
specifically wrapping aDataContainer
on success. Simplifies function signatures involving potential data loading/processing errors. - Data
Future - Type alias for a boxed, dynamically dispatched Future that yields a
ContainerResult
. This allows storing and managing different asynchronous operations (load, sort, format) that all eventually produce aDataContainer
or an error. - Polars
View Result - Result type to simplify function signatures.