rskim 0.6.0

Smart code reader - streaming code transformation for AI agents
# Skim

Smart code reader - streaming code transformation for AI agents.

[![Crates.io](https://img.shields.io/crates/v/rskim.svg)](https://crates.io/crates/rskim)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

## Overview

**Skim** transforms source code by intelligently removing implementation details while preserving structure, signatures, and types - perfect for optimizing code for LLM context windows.

Think of it like `cat`, but smart about what code to show.

## Installation

### Try it (no install required)

```bash
npx rskim file.ts
```

### Install globally (recommended for regular use)

```bash
# Via npm
npm install -g rskim

# Via Cargo
cargo install rskim
```

> **Note**: Use `npx` for trying it out. For regular use, install globally to avoid npx overhead (~100-500ms per invocation).

## Quick Start

```bash
# Try it with npx (no install)
npx rskim file.ts

# Or install globally for better performance
npm install -g rskim

# Read TypeScript with structure mode
skim file.ts

# Process multiple files with glob patterns
skim 'src/**/*.ts'

# Show token reduction statistics
skim file.ts --show-stats

# Extract Python function signatures
skim file.py --mode signatures

# Parallel processing with custom job count
skim '*.{js,ts}' --jobs 4

# Pipe to syntax highlighter
skim file.rs | bat -l rust

# Read from stdin
cat code.ts | skim - --language=typescript

# Clear cache
skim --clear-cache
```

## Features

- **6 Languages**: TypeScript, JavaScript, Python, Rust, Go, Java
- **4 Transformation Modes**: Structure, Signatures, Types, Full
- **Fast**: 14.6ms for 3000-line files (3x faster than target)
- **Cached**: 40-50x speedup on repeated processing (enabled by default)
- **Multi-file**: Glob patterns with parallel processing (`skim 'src/**/*.ts'`)
- **Token Stats**: Show reduction statistics with `--show-stats`
- **Streaming**: Outputs to stdout for pipe workflows
- **Safe**: Built-in DoS protections

## Usage

### Basic Usage

```bash
skim <FILE>
```

### Options

```bash
Options:
  -m, --mode <MODE>         Transformation mode [default: structure]
                            [possible values: structure, signatures, types, full]
  -l, --language <LANGUAGE> Override language detection
                            [possible values: typescript, javascript, python, rust, go, java]
  -j, --jobs <JOBS>         Number of parallel jobs [default: number of CPUs]
      --no-header           Don't print file path headers for multi-file output
      --no-cache            Disable caching (caching is enabled by default)
      --clear-cache         Clear all cached files and exit
      --show-stats          Show token reduction statistics
  -h, --help                Print help
  -V, --version             Print version
```

## Transformation Modes

### Structure Mode (Default)

Removes function bodies while preserving signatures (70-80% reduction).

```bash
skim file.ts
```

**Input:**
```typescript
function add(a: number, b: number): number {
    const result = a + b;
    console.log(`Adding ${a} + ${b} = ${result}`);
    return result;
}
```

**Output:**
```typescript
function add(a: number, b: number): number { /* ... */ }
```

### Signatures Mode

Extracts only function and method signatures (85-92% reduction).

```bash
skim file.py --mode signatures
```

**Input:**
```python
def calculate_total(items: list[Item], tax_rate: float) -> Decimal:
    subtotal = sum(item.price for item in items)
    tax = subtotal * tax_rate
    return subtotal + tax
```

**Output:**
```python
def calculate_total(items: list[Item], tax_rate: float) -> Decimal:
```

### Types Mode

Extracts only type definitions (90-95% reduction).

```bash
skim file.ts --mode types
```

**Input:**
```typescript
interface User {
    id: number;
    name: string;
}

function getUser(id: number): User {
    return db.users.find(id);
}
```

**Output:**
```typescript
interface User {
    id: number;
    name: string;
}
```

### Full Mode

Returns original code unchanged (0% reduction).

```bash
skim file.rs --mode full
```

## Examples

### Explore a codebase

```bash
# Get overview of all TypeScript files (NEW: glob support)
skim 'src/**/*.ts' --no-header

# Extract all Python function signatures with stats
skim 'lib/**/*.py' --mode signatures --show-stats > api.txt

# Review Rust types
skim lib.rs --mode types | less

# Parallel processing for faster multi-file operations
skim 'src/**/*.ts' --jobs 8
```

### Prepare code for LLMs

```bash
# Reduce token count before sending to GPT
skim large_file.ts | wc -w
# Output: 150 (was 600)

# Get just the API surface
skim server.py --mode signatures | pbcopy
```

### Pipe workflows

```bash
# Skim and highlight
skim file.rs | bat -l rust

# Skim and search
skim file.ts | grep "interface"

# Skim multiple files
cat *.py | skim - --language=python
```

## Supported Languages

| Language   | Extensions         | Auto-detected |
|------------|--------------------|---------------|
| TypeScript | `.ts`, `.tsx`      ||
| JavaScript | `.js`, `.jsx`, `.mjs` ||
| Python     | `.py`              ||
| Rust       | `.rs`              ||
| Go         | `.go`              ||
| Java       | `.java`            ||

## Performance

- **Parse + Transform**: 14.6ms for 3000-line files (verified)
- **Cached**: 5ms on repeated processing (40-50x speedup)
- **Token Reduction**: 60-95% depending on mode
- **Streaming**: Zero intermediate files
- **Parallel**: Scales with CPU cores for multi-file processing

## Security

Built-in protections against:
- Stack overflow attacks (max depth: 500)
- Memory exhaustion (max input: 50MB)
- UTF-8 boundary violations
- Path traversal attacks

## Library

For programmatic usage, see the [`rskim-core`](https://crates.io/crates/rskim-core) library crate.

## Links

- [Repository]https://github.com/dean0x/skim
- [Library Documentation]https://docs.rs/rskim-core
- [npm Package]https://www.npmjs.com/package/rskim

## License

MIT