inferno-ai 0.10.3

# 🖥️ CLI Reference

Complete documentation for all Inferno command-line interface commands. Inferno provides 45+ commands organized into logical groups for model management, inference, monitoring, and enterprise features.

## Quick Reference

```bash
# Package Management (Recommended)
inferno install <model>              # Install a model package
inferno remove <model>               # Remove a model package
inferno search <query>               # Search for models
inferno list                         # List installed models
inferno repo add <name> <url>        # Add model repository

# Inference & Execution
inferno run                          # Run inference
inferno batch                        # Batch processing
inferno serve                        # Start API server
inferno tui                          # Terminal interface

# Model Management
inferno models list                  # List models
inferno models info <model>          # Model details
inferno convert <input> <output>     # Convert formats
inferno validate <model>             # Validate model

# Performance & Monitoring
inferno bench                        # Benchmark performance
inferno metrics                      # Collect metrics
inferno monitor                      # Real-time monitoring
inferno cache                        # Cache management

# Enterprise Features
inferno security                     # Security management
inferno audit                        # Audit logging
inferno distributed                  # Distributed inference
inferno observability               # Monitoring stack
```

## Package Management Commands

### `inferno install`
Install AI models like software packages from configured repositories.

```bash
# Basic installation
inferno install microsoft/DialoGPT-medium
inferno install huggingface/gpt2
inferno install ollama/llama2:7b

# From specific repository
inferno install microsoft/DialoGPT-medium --repo huggingface
inferno install custom-model --repo enterprise

# With options
inferno install llama-2-7b --auto-update         # Enable auto-updates
inferno install gpt-3.5 --quantization q4_0      # Install with quantization
inferno install bert-base --no-cache             # Skip caching
inferno install mistral-7b --verify              # Verify checksums
```

**Options:**
- `--repo <name>`: Install from specific repository
- `--auto-update`: Enable automatic updates
- `--quantization <type>`: Quantization level (q4_0, q4_1, q5_0, q5_1, q8_0, f16, f32)
- `--no-cache`: Skip caching during installation
- `--verify`: Verify model checksums and signatures
- `--force`: Force reinstallation if already exists

### `inferno remove`
Remove installed model packages.

```bash
# Remove single model
inferno remove microsoft/DialoGPT-medium

# Remove multiple models
inferno remove gpt2 bert-base llama-2-7b

# Remove with options
inferno remove old-model --purge                 # Remove all traces
inferno remove --unused                          # Remove unused models
inferno remove --cache-only                      # Keep model, remove cache
```

**Options:**
- `--purge`: Remove all associated files and cache
- `--unused`: Remove models not used in last 30 days
- `--cache-only`: Remove only cache files, keep model
- `--force`: Skip confirmation prompts

### `inferno search`
Search for AI models across configured repositories.

```bash
# Basic search
inferno search "language model"
inferno search "code generation"
inferno search gpt

# Repository-specific search
inferno search "vision model" --repo huggingface
inferno search "embedding" --repo onnx

# Advanced search
inferno search "llama" --category nlp            # Filter by category
inferno search "bert" --size small               # Filter by model size
inferno search "gpt" --license mit               # Filter by license
inferno search "mistral" --sort downloads        # Sort by downloads
inferno search "code" --limit 20                 # Limit results
```

**Options:**
- `--repo <name>`: Search specific repository
- `--category <cat>`: Filter by category (nlp, vision, audio, multimodal)
- `--size <size>`: Filter by size (small, medium, large, xlarge)
- `--license <license>`: Filter by license (mit, apache, custom)
- `--sort <field>`: Sort by field (downloads, stars, recent, name)
- `--limit <num>`: Limit number of results
- `--format <fmt>`: Output format (table, json, compact)

### `inferno list`
List installed model packages with details.

```bash
# List all installed models
inferno list

# Detailed listing
inferno list --detailed                          # Show full details
inferno list --format json                       # JSON output
inferno list --size                              # Include file sizes
inferno list --usage                             # Show usage statistics

# Filter listings
inferno list --category nlp                      # Filter by category
inferno list --unused                            # Show unused models
inferno list --outdated                          # Show outdated models
```

**Options:**
- `--detailed`: Show comprehensive model information
- `--format <fmt>`: Output format (table, json, yaml, compact)
- `--size`: Include file sizes and disk usage
- `--usage`: Show usage statistics and last access time
- `--category <cat>`: Filter by model category
- `--unused`: Show models not used recently
- `--outdated`: Show models with available updates

### `inferno repo`
Manage model repositories and sources.

```bash
# List repositories
inferno repo list
inferno repo list --detailed                     # Show full details

# Add repository
inferno repo add enterprise https://models.company.com
inferno repo add custom https://my-models.com --priority 1

# Manage repositories
inferno repo update                               # Update all repositories
inferno repo update huggingface                  # Update specific repo
inferno repo remove custom                       # Remove repository
inferno repo enable huggingface                  # Enable repository
inferno repo disable custom                      # Disable repository

# Authentication
inferno repo auth huggingface --token <token>    # Set auth token
inferno repo auth enterprise --username user --password pass
```

**Subcommands:**
- `list`: List configured repositories
- `add <name> <url>`: Add new repository
- `remove <name>`: Remove repository
- `update [name]`: Update repository metadata
- `enable <name>`: Enable repository
- `disable <name>`: Disable repository
- `auth <name>`: Configure repository authentication

## Inference & Execution Commands

### `inferno run`
Run AI inference on text, images, or audio.

```bash
# Text inference
inferno run --model gpt2 --prompt "Hello world"
inferno run --model llama-2-7b --prompt "Explain quantum computing"

# File input/output
inferno run --model bert --input document.txt --output summary.txt
inferno run --model gpt-3.5 --input questions.jsonl --output answers.jsonl

# Interactive mode
inferno run --model DialoGPT-medium --interactive
inferno run --model llama-2-7b --chat                # Chat interface

# Streaming output
inferno run --model gpt-3.5 --prompt "Write a story" --stream
inferno run --model code-llama --input code.py --stream

# Advanced options
inferno run --model llama-2-7b --temperature 0.7     # Creativity control
inferno run --model gpt2 --max-tokens 500            # Token limit
inferno run --model mistral --context-size 4096      # Context window
inferno run --model bert --batch-size 32             # Batch processing
```

**Options:**
- `--model <name>`: Model to use for inference
- `--prompt <text>`: Input text prompt
- `--input <file>`: Input file path
- `--output <file>`: Output file path
- `--interactive`: Interactive chat mode
- `--chat`: Enhanced chat interface
- `--stream`: Stream output in real-time
- `--temperature <val>`: Sampling temperature (0.0-2.0)
- `--max-tokens <num>`: Maximum tokens to generate
- `--context-size <size>`: Context window size
- `--batch-size <size>`: Batch processing size

### `inferno batch`
Process multiple inputs efficiently in batch mode.

```bash
# Basic batch processing
inferno batch --model gpt2 --input batch.jsonl --output results.jsonl
inferno batch --model llama-2-7b --input questions/ --output answers/

# Advanced batch options
inferno batch --model bert --input data.jsonl --workers 4
inferno batch --model gpt-3.5 --input large.jsonl --chunk-size 100
inferno batch --model mistral --input prompts.txt --parallel 8

# Monitoring and resume
inferno batch --model llama --input data.jsonl --progress
inferno batch --model gpt2 --input data.jsonl --resume checkpoint.json
```

**Options:**
- `--model <name>`: Model for batch processing
- `--input <path>`: Input file or directory
- `--output <path>`: Output file or directory
- `--workers <num>`: Number of worker processes
- `--chunk-size <size>`: Items per chunk
- `--parallel <num>`: Parallel processing level
- `--progress`: Show progress bar
- `--resume <file>`: Resume from checkpoint

### `inferno serve`
Start the HTTP API server for web and API access.

```bash
# Basic server
inferno serve
inferno serve --port 8081                        # Custom port
inferno serve --bind 0.0.0.0:8080               # Custom address

# Production server
inferno serve --auth                              # Enable authentication
inferno serve --metrics                          # Enable metrics endpoint
inferno serve --cors                             # Enable CORS
inferno serve --rate-limit 1000                  # Rate limiting

# SSL/TLS
inferno serve --ssl --cert server.crt --key server.key
inferno serve --ssl-auto                         # Auto SSL certificates

# Multi-model serving
inferno serve --models gpt2,llama-2-7b,bert     # Serve multiple models
inferno serve --model-config models.toml         # Model configuration
```

**Options:**
- `--port <port>`: Server port (default: 8080)
- `--bind <addr>`: Bind address (default: 127.0.0.1)
- `--auth`: Enable authentication
- `--metrics`: Enable Prometheus metrics
- `--cors`: Enable CORS headers
- `--rate-limit <num>`: Requests per minute limit
- `--ssl`: Enable SSL/TLS
- `--cert <file>`: SSL certificate file
- `--key <file>`: SSL private key file
- `--models <list>`: Comma-separated model list
- `--workers <num>`: Number of worker threads

### `inferno tui`
Launch the terminal user interface for interactive management.

```bash
# Launch TUI
inferno tui

# TUI with options
inferno tui --theme dark                         # Color theme
inferno tui --refresh 1000                       # Refresh rate (ms)
inferno tui --model gpt2                         # Start with specific model
```

**Features:**
- Real-time metrics dashboard
- Interactive model management
- Chat interface
- Performance monitoring
- Configuration management

## Model Management Commands

### `inferno models`
Direct model management for advanced users.

```bash
# List models
inferno models list                               # List available models
inferno models list --detailed                   # Detailed information
inferno models list --format json                # JSON output

# Model information
inferno models info llama-2-7b                   # Show model details
inferno models info gpt2 --format yaml           # YAML format
inferno models validate bert-base                # Validate model

# Model operations
inferno models download microsoft/DialoGPT-medium # Download model
inferno models load gpt2                         # Load into memory
inferno models unload llama-2-7b                 # Unload from memory
inferno models optimize bert --quantization q4_0  # Optimize model
```

**Subcommands:**
- `list`: List available models
- `info <model>`: Show model information
- `validate <model>`: Validate model integrity
- `download <model>`: Download model from repository
- `load <model>`: Load model into memory
- `unload <model>`: Unload model from memory
- `optimize <model>`: Optimize model performance

### `inferno convert`
Convert models between different formats.

```bash
# Basic conversion
inferno convert model.gguf model.onnx --format onnx
inferno convert model.pt model.gguf --format gguf
inferno convert model.safetensors model.onnx --format onnx

# Advanced conversion
inferno convert model.gguf model.onnx --optimization balanced
inferno convert model.pt model.gguf --quantization q4_0
inferno convert model.onnx model.gguf --precision fp16
inferno convert large-model.pt small-model.gguf --compression
```

**Supported Formats:**
- GGUF (llama.cpp format)
- ONNX (Open Neural Network Exchange)
- PyTorch (.pt, .pth)
- SafeTensors (.safetensors)
- TensorFlow SavedModel

**Options:**
- `--format <fmt>`: Target format (gguf, onnx, pytorch, safetensors)
- `--optimization <level>`: Optimization level (fast, balanced, aggressive)
- `--quantization <type>`: Quantization type (q4_0, q4_1, q5_0, q5_1, q8_0, f16, f32)
- `--precision <prec>`: Precision (fp16, fp32, int8)
- `--compression`: Enable compression for smaller file size

### `inferno validate`
Validate model files and configurations.

```bash
# Validate models
inferno validate llama-2-7b                      # Validate model
inferno validate models/                          # Validate directory
inferno validate --all                           # Validate all models

# Validation options
inferno validate gpt2 --checksum                 # Verify checksums
inferno validate bert --format                   # Validate format
inferno validate mistral --load-test             # Test loading
inferno validate llama --benchmark               # Run benchmark
```

**Options:**
- `--checksum`: Verify file checksums
- `--format`: Validate file format structure
- `--load-test`: Test model loading
- `--benchmark`: Run performance benchmark
- `--fix`: Attempt to fix minor issues
- `--report <file>`: Generate validation report

## Performance & Monitoring Commands

### `inferno bench`
Benchmark model performance and establish baselines.

```bash
# Basic benchmarking
inferno bench --model gpt2                       # Benchmark model
inferno bench --model llama-2-7b --iterations 100
inferno bench --model bert --duration 60s        # Time-based benchmark

# Comprehensive benchmarking
inferno bench --all                              # Benchmark all models
inferno bench --model gpt2 --detailed           # Detailed metrics
inferno bench --model llama --memory            # Memory usage analysis
inferno bench --model mistral --gpu             # GPU utilization

# Output formats
inferno bench --model gpt2 --format json        # JSON output
inferno bench --model bert --output report.txt  # Save to file
inferno bench --model llama --compare baseline.json # Compare to baseline
```

**Options:**
- `--model <name>`: Model to benchmark
- `--iterations <num>`: Number of iterations
- `--duration <time>`: Benchmark duration
- `--detailed`: Include detailed metrics
- `--memory`: Monitor memory usage
- `--gpu`: Monitor GPU utilization
- `--format <fmt>`: Output format (table, json, csv)
- `--output <file>`: Save results to file
- `--compare <file>`: Compare to baseline

### `inferno metrics`
Collect and export performance metrics.

```bash
# View metrics
inferno metrics show                              # Show current metrics
inferno metrics show --model gpt2               # Model-specific metrics
inferno metrics show --format json              # JSON output

# Export metrics
inferno metrics export metrics.json             # Export to file
inferno metrics export --format prometheus      # Prometheus format
inferno metrics export --filter inference       # Filter metric types

# Metrics management
inferno metrics reset                            # Reset all metrics
inferno metrics reset --model bert              # Reset model metrics
inferno metrics enable                           # Enable collection
inferno metrics disable                          # Disable collection
```

**Subcommands:**
- `show`: Display current metrics
- `export <file>`: Export metrics to file
- `reset`: Reset metric counters
- `enable`: Enable metrics collection
- `disable`: Disable metrics collection

### `inferno monitor`
Real-time performance monitoring and alerting.

```bash
# Start monitoring
inferno monitor start                            # Start monitoring
inferno monitor start --interval 5s             # Custom interval
inferno monitor start --dashboard               # Web dashboard

# Configure monitoring
inferno monitor config --cpu-threshold 80       # CPU alert threshold
inferno monitor config --memory-threshold 4GB   # Memory threshold
inferno monitor config --alert-email admin@company.com

# View monitoring status
inferno monitor status                           # Show status
inferno monitor logs                             # View monitoring logs
inferno monitor stop                             # Stop monitoring
```

**Subcommands:**
- `start`: Start monitoring system
- `stop`: Stop monitoring
- `status`: Show monitoring status
- `config`: Configure monitoring settings
- `logs`: View monitoring logs

### `inferno cache`
Manage model and response caching.

```bash
# Cache management
inferno cache status                             # Show cache status
inferno cache clear                              # Clear all cache
inferno cache clear --model gpt2                # Clear model cache
inferno cache clear --responses                 # Clear response cache

# Cache warming
inferno cache warm --model llama-2-7b           # Pre-load model
inferno cache warm --all                        # Pre-load all models
inferno cache warm --popular                    # Pre-load popular models

# Cache configuration
inferno cache config --size 10GB                # Set cache size limit
inferno cache config --compression zstd         # Enable compression
inferno cache config --persist                  # Enable persistence
```

**Subcommands:**
- `status`: Show cache status and statistics
- `clear`: Clear cache contents
- `warm`: Pre-load models into cache
- `config`: Configure cache settings

## Enterprise Features

### `inferno security`
Security and access control management.

```bash
# Initialize security
inferno security init                            # Setup authentication
inferno security init --jwt                     # JWT authentication
inferno security init --ldap                    # LDAP integration

# User management
inferno security user add alice --role admin    # Add user
inferno security user remove bob                # Remove user
inferno security user list                      # List users
inferno security user update alice --role user  # Update user role

# API key management
inferno security key create --name app1         # Create API key
inferno security key revoke key123              # Revoke API key
inferno security key list                       # List API keys

# Security configuration
inferno security config --rate-limit 1000       # Set rate limits
inferno security config --ip-whitelist 10.0.0.0/8 # IP restrictions
inferno security config --audit-all             # Enable full auditing
```

**Subcommands:**
- `init`: Initialize security system
- `user`: User management commands
- `key`: API key management
- `config`: Security configuration

### `inferno audit`
Comprehensive audit logging and compliance.

```bash
# View audit logs
inferno audit logs                               # Show recent logs
inferno audit logs --user alice                 # Filter by user
inferno audit logs --action inference           # Filter by action
inferno audit logs --date 2024-01-01            # Filter by date

# Audit configuration
inferno audit enable                             # Enable audit logging
inferno audit disable                           # Disable audit logging
inferno audit encrypt --key mykey               # Enable encryption
inferno audit compress --algorithm gzip         # Enable compression

# Compliance reports
inferno audit report --type security            # Security report
inferno audit report --type usage               # Usage report
inferno audit export compliance.json            # Export audit data
```

**Subcommands:**
- `logs`: View and filter audit logs
- `enable/disable`: Control audit logging
- `encrypt`: Configure log encryption
- `compress`: Configure log compression
- `report`: Generate compliance reports
- `export`: Export audit data

### `inferno distributed`
Distributed inference with worker pools.

```bash
# Worker management
inferno distributed worker start                # Start worker
inferno distributed worker stop                 # Stop worker
inferno distributed worker list                 # List workers
inferno distributed worker status               # Worker status

# Cluster management
inferno distributed cluster init                # Initialize cluster
inferno distributed cluster join <master-url>   # Join cluster
inferno distributed cluster status              # Cluster status
inferno distributed cluster scale 5             # Scale to 5 workers

# Load balancing
inferno distributed balance round-robin         # Round-robin balancing
inferno distributed balance weighted            # Weighted balancing
inferno distributed balance adaptive            # Adaptive balancing
```

**Subcommands:**
- `worker`: Worker node management
- `cluster`: Cluster management
- `balance`: Load balancing configuration

### `inferno observability`
Comprehensive observability stack.

```bash
# Start observability stack
inferno observability start                     # Start full stack
inferno observability start --prometheus        # Prometheus only
inferno observability start --grafana           # Grafana only
inferno observability start --jaeger            # Jaeger tracing

# Configuration
inferno observability config --retention 30d    # Data retention
inferno observability config --dashboard custom.json # Custom dashboard

# Status and management
inferno observability status                    # Show status
inferno observability stop                      # Stop services
inferno observability restart                   # Restart services
```

**Subcommands:**
- `start`: Start observability services
- `stop`: Stop observability services
- `status`: Show service status
- `config`: Configure observability settings

## Advanced Commands

### `inferno optimization`
Model optimization with quantization and pruning.

```bash
# Quantization
inferno optimization quantize llama-2-7b --type q4_0
inferno optimization quantize bert --type int8  # Integer quantization
inferno optimization quantize gpt2 --auto       # Auto-select best type

# Model pruning
inferno optimization prune large-model --ratio 0.2  # Remove 20%
inferno optimization prune bert --structured     # Structured pruning

# Distillation
inferno optimization distill teacher student    # Knowledge distillation
inferno optimization compress model --algorithm zstd # Model compression
```

### `inferno multimodal`
Multi-modal inference with vision, audio, and text.

```bash
# Image processing
inferno multimodal image --model clip --input image.jpg
inferno multimodal vision --model yolo --input video.mp4

# Audio processing
inferno multimodal audio --model whisper --input audio.wav
inferno multimodal speech --model tts --text "Hello world"

# Mixed modal
inferno multimodal mixed --vision clip --text bert --input image.jpg --prompt "Describe this"
```

### `inferno deployment`
Advanced deployment automation.

```bash
# Kubernetes deployment
inferno deployment k8s deploy --namespace inferno
inferno deployment k8s scale --replicas 5
inferno deployment k8s update --image inferno:v2.0

# Docker deployment
inferno deployment docker build                 # Build image
inferno deployment docker push                  # Push to registry
inferno deployment docker compose up            # Docker Compose

# Cloud deployment
inferno deployment cloud aws --region us-west-2
inferno deployment cloud gcp --project my-project
inferno deployment cloud azure --resource-group rg1
```

### `inferno marketplace`
Model marketplace and registry integration.

```bash
# Browse marketplace
inferno marketplace browse                       # Browse all models
inferno marketplace browse --category nlp       # Browse by category
inferno marketplace search "code generation"    # Search marketplace

# Model publishing
inferno marketplace publish my-model             # Publish model
inferno marketplace unpublish my-model          # Remove from marketplace
inferno marketplace update my-model             # Update published model

# Marketplace management
inferno marketplace config --registry custom.com # Set registry
inferno marketplace auth --token <token>        # Authenticate
```

## Global Options

All commands support these global options:

- `--config <file>`: Use custom configuration file
- `--log-level <level>`: Set logging level (trace, debug, info, warn, error)
- `--log-format <format>`: Set log format (pretty, json, compact)
- `--models-dir <path>`: Override models directory
- `--help`: Show command help
- `--version`: Show version information
- `--verbose`: Enable verbose output
- `--quiet`: Suppress non-error output
- `--dry-run`: Show what would be done without executing

## Environment Variables

Configure Inferno using environment variables:

```bash
export INFERNO_MODELS_DIR="/data/models"        # Models directory
export INFERNO_LOG_LEVEL="info"                 # Logging level
export INFERNO_LOG_FORMAT="json"                # Log format
export INFERNO_CONFIG="/etc/inferno.toml"       # Configuration file
export INFERNO_API_KEY="your-api-key"           # API authentication
export INFERNO_GPU_ENABLED="true"               # Enable GPU acceleration
export INFERNO_CACHE_DIR="/tmp/inferno"         # Cache directory
export INFERNO_PROMETHEUS_PORT="9090"           # Metrics port
```

## Configuration File

Create `inferno.toml` for persistent configuration:

```toml
# Basic settings
models_dir = "/data/models"
log_level = "info"
cache_dir = "/tmp/inferno"

[server]
bind_address = "0.0.0.0"
port = 8080
workers = 4

[backend_config]
gpu_enabled = true
context_size = 4096
batch_size = 64

[cache]
enabled = true
max_size_gb = 10
compression = "zstd"

[security]
auth_enabled = true
rate_limit = 1000
audit_enabled = true

[observability]
prometheus_enabled = true
metrics_port = 9090
tracing_enabled = true
```

## Exit Codes

Inferno uses standard exit codes:

- `0`: Success
- `1`: General error
- `2`: Invalid command or arguments
- `3`: Configuration error
- `4`: Model not found or invalid
- `5`: Network or download error
- `6`: Permission or authentication error
- `7`: Resource exhaustion (memory, disk)
- `8`: GPU or hardware error

## See Also

- [API Reference](api-reference.md) - REST API documentation
- [Configuration Reference](configuration.md) - Complete configuration options
- [Troubleshooting Guide](../guides/troubleshooting.md) - Common issues and solutions
- [Performance Tuning](../guides/performance-tuning.md) - Optimization strategies