nu_plugin_bigquery

A Nushell plugin for querying Google BigQuery directly from your shell. Returns native Nushell tables or Arrow IPC files for seamless integration with Nushell's Polars dataframe ecosystem.

Features

Native Nushell tables — query results are real Nushell tables you can pipe to where, select, sort-by, etc.
Arrow IPC output — --arrow flag writes results as Arrow IPC files for zero-copy polars open ingestion
Zero-config auth — works out of the box with gcloud auth application-default login, or use explicit service account keys
Full BigQuery type support — INTEGER, FLOAT, BOOLEAN, STRING, TIMESTAMP, DATE, DATETIME, BYTES, NUMERIC, RECORD, REPEATED, and more
Server-side filtering — bq read --filter pushes WHERE clauses to BigQuery
Cost estimation — bq query --dry-run shows bytes processed and estimated cost before running

Requirements

Nushell 0.110.0+
Rust toolchain (to build from source)
Google Cloud credentials (one of):
- gcloud auth application-default login (recommended for development)
- Service account key JSON file
- GCE metadata server (on Google Cloud VMs)

Installation

# Build
cargo build --release

# Register with Nushell
plugin add ./target/release/nu_plugin_bigquery
plugin use bigquery

Commands

All commands are available under both bq and bigquery prefixes.

`bq query` — Run SQL queries

# Returns a Nushell table
bq query "SELECT name, age FROM `project.dataset.users` WHERE age > 30"

# Pipe to standard Nushell commands
bq query "SELECT * FROM dataset.sales" | where region == "US" | sort-by revenue --reverse

# Arrow IPC output for polars (returns LazyFrame, must collect)
bq query --arrow "SELECT * FROM dataset.large_table" | polars open $in | polars collect

# Check bytes processed before running
bq query --dry-run "SELECT * FROM dataset.huge_table"
# => {bytes_processed: 4500000000, bytes_processed_human: "4.19 GB"}

Flag	Short	Description
`--project`	`-p`	GCP project ID
`--credentials`	`-c`	Path to service account key JSON
`--location`	`-l`	BigQuery location (e.g., `US`, `EU`)
`--max-results`	`-n`	Maximum rows to return
`--timeout`		Query timeout in milliseconds
`--arrow`	`-a`	Write Arrow IPC file, return path
`--dry-run`		Show bytes processed without running

`bq read` — Read table data directly (no SQL)

# Read with server-side column selection
bq read my_dataset.my_table --columns [user_id, event_type, ts]

# Server-side row filtering (SQL WHERE syntax)
bq read my_dataset.events --filter "ts > '2024-06-01'" --columns [user_id, event_type]

# Arrow output for large tables (LazyFrame, must collect)
bq read my_dataset.clickstream --arrow | polars open $in | polars collect

Flag	Short	Description
`--columns`		Columns to read (server-side projection)
`--filter`		Row filter (SQL WHERE syntax)
`--max-results`	`-n`	Maximum rows to return
`--project`	`-p`	GCP project ID
`--credentials`	`-c`	Path to service account key JSON
`--arrow`	`-a`	Write Arrow IPC file, return path

Table reference formats: dataset.table or project.dataset.table

`bq datasets` — List datasets

bq datasets
bq datasets | where name =~ "prod"

`bq tables` — List tables in a dataset

bq tables my_dataset
# => table_name, table_type, num_rows, num_bytes, creation_time

`bq schema` — Inspect table schema

bq schema my_dataset.my_table
# => name, type, mode, description

Nested RECORD fields are flattened with dot notation (e.g., address.city).

Authentication

Credentials are resolved in this order:

--credentials flag (path to service account key JSON)
$env.GOOGLE_APPLICATION_CREDENTIALS environment variable
Application Default Credentials (gcloud auth application-default login)

Project ID is resolved in this order:

--project flag
$env.BQ_PROJECT, $env.GOOGLE_CLOUD_PROJECT, or $env.GCLOUD_PROJECT
Project from the credential/ADC configuration

# Set defaults via environment
$env.GOOGLE_APPLICATION_CREDENTIALS = "/path/to/key.json"
$env.BQ_PROJECT = "my-project"

Polars Integration

There are two ways to get BigQuery data into polars:

Approach	Best for	How it works
`\| polars into-df`	Up to ~100K rows	BQ → Nushell table → eager DataFrame
`--arrow \| polars open $in \| polars collect`	100K+ rows or memory-constrained	BQ → Arrow IPC file → zero-copy LazyFrame

Up to ~100K rows: pipe to `polars into-df`

The simpler path. Returns an eager DataFrame you can work with right away — no extra collect step needed:

bq query "SELECT region, product, revenue FROM analytics.sales"
  | polars into-df
  | polars group-by region
  | polars agg (polars col revenue | polars sum)
  | polars into-nu

100K+ rows: `--arrow` flag (LazyFrame via Arrow IPC)

For larger datasets, the default path converts every row through Nushell's Value types, which is slow and doubles memory usage. The --arrow flag skips that conversion — it writes an Arrow IPC file directly and returns the path. Pipe to polars open for a zero-copy LazyFrame.

You must end the pipeline with polars collect to materialize results — this is by design, so polars can push down filters and projections before loading data into memory:

# polars open returns a LazyFrame — chain operations, then collect
bq query --arrow "SELECT * FROM analytics.large_table"
  | polars open $in
  | polars filter ((polars col status) == "active")
  | polars group-by category
  | polars agg (polars col amount | polars mean)
  | polars collect

# If you just want all data as a DataFrame, still need collect at the end
bq query --arrow "SELECT * FROM analytics.large_table"
  | polars open $in
  | polars collect

Type Mapping

BigQuery Type	Nushell Value	Arrow Type
INTEGER / INT64	Int	Int64
FLOAT / FLOAT64	Float	Float64
BOOLEAN / BOOL	Bool	Boolean
STRING	String	Utf8
BYTES	Binary	Binary
TIMESTAMP	Date	Timestamp(us, UTC)
DATE	Date	Date32
DATETIME	Date	Timestamp(us, None)
NUMERIC / BIGNUMERIC	String	Utf8
GEOGRAPHY / JSON / TIME	String	Utf8
RECORD / STRUCT	Record	Struct
REPEATED	List	List

License

MIT

nu_plugin_bigquery 0.1.0

nu_plugin_bigquery

Features

Requirements

Installation

Commands

bq query — Run SQL queries

bq read — Read table data directly (no SQL)

bq datasets — List datasets

bq tables — List tables in a dataset

bq schema — Inspect table schema