bssh 2.1.2

Parallel SSH command execution tool for cluster management
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
# bssh Architecture Documentation

## Overview

bssh (Backend.AI SSH / Broadcast SSH) is a high-performance parallel SSH command execution tool with SSH-compatible interface. This document provides a high-level architecture overview. For detailed component documentation, see [docs/architecture/](./docs/architecture/).

### Core Capabilities

- Parallel command execution across multiple nodes
- SSH-compatible command-line interface (drop-in replacement)
- SSH port forwarding (-L, -R, -D/SOCKS proxy)
- SSH jump host support (-J)
- SSH configuration file parsing (-F)
- Interactive PTY sessions with single/multiplex modes
- SFTP file transfers (upload/download)
- Backend.AI cluster auto-detection
- pdsh compatibility mode

## System Architecture

```
        ┌─────────────────────────────────────────────────────────┐
        │                     CLI Interface                       │
        │                       (main.rs)                         │
        │        (-L, -R, -D, -J, -F, -t/T, SSH-compatible)       │
        └────────────────────────────┬────────────────────────────┘
        ┌─────────────┬──────────────┼──────────────┬─────────────┐
        ▼             ▼              ▼              ▼             ▼
┌──────────────┐ ┌───────────┐ ┌───────────┐ ┌───────────┐ ┌──────────┐
│   Commands   │ │  Config   │ │  Utils    │ │Forwarding │ │   Jump   │
│   Module     │ │  Manager  │ │  Module   │ │  Manager  │ │   Host   │
│ (commands/*) │ │(config.rs)│ │ (utils/*) │ │(forward/*)│ │ (jump/*) │
└──────┬───────┘ └─────┬─────┘ └───────────┘ └───┬───────┘ └───┬──────┘
       │               │                         │             │
       │               ▼                         │             │
       │       ┌──────────────┐                  │             │
       │       │ SSH Config   │                  │             │
       │       │    Parser    │                  │             │
       │       │(ssh_config/*)│                  │             │
       │       └──────────────┘                  │             │
       ▼                                         ▼             ▼
┌──────────────┐                         ┌──────────────┐ ┌──────────────────┐
│   Executor   │◄────────────────────────┤     Node     │ │ Port Forwarders  │
│  (Parallel)  │                         │    Parser    │ │  (L/R/D modes)   │
│(executor.rs) │                         │  (node.rs)   │ │    + Tunnels     │
└──────┬───────┘                         └──────────────┘ └────────┬─────────┘
       │                                                           │
       ├──────────┬────────────┬───────────────────────────────────┘
       ▼          ▼            ▼
┌──────────┐ ┌──────────┐ ┌──────────┐
│   SSH    │ │   SSH    │ │   SSH    │
│  Client  │ │  Client  │ │  Client  │
│ (russh)  │ │ (russh)  │ │ (russh)  │
└──────────┘ └──────────┘ └──────────┘
```

## Component Summary

### CLI Interface
**Documentation**: [docs/architecture/cli-interface.md](./docs/architecture/cli-interface.md)

The CLI system provides an SSH-compatible command-line interface with multiple operation modes:

- **Native bssh mode**: Cluster-based parallel execution
- **SSH compatibility mode**: Drop-in SSH replacement for single-host operations
- **pdsh compatibility mode**: Compatible with pdsh command-line syntax

Key features:
- clap v4 with derive macros for type-safe argument parsing
- Backend.AI cluster auto-detection
- Hostlist expression support (pdsh-compatible)
- Mode detection based on binary name, environment, or flags

### Configuration Management
**Documentation**: [docs/architecture/configuration.md](./docs/architecture/configuration.md)

Hierarchical configuration system with multiple sources:

1. Backend.AI environment variables (auto-detection)
2. Current directory (`./config.yaml`)
3. XDG config directory (`~/.config/bssh/config.yaml`)
4. CLI specified path (via `--config` flag)

Features:
- YAML format for human readability
- Environment variable expansion (`${VAR}` syntax)
- SSH configuration file integration
- Platform-specific paths via XDG Base Directory specification

### Parallel Executor
**Documentation**: [docs/architecture/executor.md](./docs/architecture/executor.md)

Tokio-based async executor for concurrent command execution:

- Semaphore-based concurrency limiting
- Two-stage signal handling (default) or batch mode
- Fail-fast mode for early termination on errors
- Real-time progress visualization
- Stream mode for live output

### SSH Client
**Documentation**: [docs/architecture/ssh-client.md](./docs/architecture/ssh-client.md)

Built on russh and russh-sftp with custom tokio_client wrapper:

- Connection management with russh
- Multiple authentication methods (agent, key file, password)
- Host key verification (known_hosts support)
- Command execution with streaming output
- SFTP file transfers (upload/download)
- Connection timeout handling
- Configurable SSH keepalive (ServerAliveInterval, ServerAliveCountMax)

### Terminal User Interface (TUI)
**Documentation**: [docs/architecture/tui.md](./docs/architecture/tui.md)

Interactive terminal interface for real-time command monitoring:

- Multiple views (JobList, JobDetail, Logs, System)
- Keyboard navigation and command palette
- Progress parsing from command output
- Real-time log streaming
- Clean shutdown handling

### Interactive Mode
**Documentation**: [docs/architecture/interactive-mode.md](./docs/architecture/interactive-mode.md)

PTY-based interactive SSH sessions:

- Single-host mode: Direct PTY connection to one host
- Multiplex mode: Broadcast input to multiple hosts
- Terminal escape sequence handling
- Raw mode terminal management
- Signal propagation (Ctrl+C, window resize)

### SSH Configuration Parser
**Documentation**: [docs/architecture/ssh-config-parser.md](./docs/architecture/ssh-config-parser.md)

OpenSSH-compatible configuration file parser:

- Include directive support with recursion limits
- Match directive (Host, LocalUser)
- All standard SSH options
- Configuration caching for performance
- Override chain resolution

### SSH Jump Host Support
**Documentation**: [docs/architecture/ssh-jump-hosts.md](./docs/architecture/ssh-jump-hosts.md)

ProxyJump (-J) support for bastion hosts:

- Multiple jump host chains
- IPv6 and custom port support
- Authentication through jump hosts
- Integration with all bssh operations
- Automatic tunnel management

### SSH Port Forwarding
**Documentation**: [docs/architecture/ssh-port-forwarding.md](./docs/architecture/ssh-port-forwarding.md)

Full port forwarding support:

- Local forwarding (-L): Forward local port to remote
- Remote forwarding (-R): Forward remote port to local
- Dynamic forwarding (-D): SOCKS proxy mode
- Multiple forwarding rules
- Automatic port allocation

### Exit Code Strategy
**Documentation**: [docs/architecture/exit-code-strategy.md](./docs/architecture/exit-code-strategy.md)

MPI-compatible exit code handling:

- **MainRank** (default): Returns main rank's exit code
- **RequireAllSuccess**: Returns 0 only if all nodes succeed
- **MainRankWithFailureCheck**: Hybrid mode for detailed diagnostics
- Automatic main rank detection (Backend.AI integration)
- Preserves actual exit codes (SIGSEGV=139, OOM=137, etc.)

### Shared Module

Common utilities for code reuse between bssh client and server implementations:

- **Validation**: Input validation for usernames, hostnames, paths with security checks
- **Rate Limiting**: Generic token bucket rate limiter for connection/auth throttling
- **Authentication Types**: Common auth result types and user info structures
- **Error Types**: Shared error types for validation, auth, connection, and rate limiting

The `security` and `jump::rate_limiter` modules re-export from shared for backward compatibility.

### Server Security Module

Security features for the SSH server (`src/server/security/`):

- **AuthRateLimiter**: Fail2ban-like authentication rate limiting
  - Tracks failed authentication attempts per IP address
  - Automatic banning after exceeding configurable threshold
  - Time-windowed failure counting (failures outside window not counted)
  - Configurable ban duration with automatic expiration
  - IP whitelist for exempting trusted addresses from banning
  - Memory-safe with configurable maximum tracked IPs
  - Automatic cleanup of expired records via background task
  - Thread-safe async implementation with `Arc<RwLock<>>`

- **IpAccessControl**: IP-based connection filtering
  - Whitelist mode: Only allow connections from specified CIDR ranges
  - Blacklist mode: Block connections from specified CIDR ranges
  - Blacklist takes priority over whitelist (blocked IPs are always denied)
  - Support for both IPv4 and IPv6 addresses and CIDR notation
  - Dynamic updates: Add/remove rules at runtime via `SharedIpAccessControl`
  - Early rejection at connection level before handler creation
  - Thread-safe with fail-closed behavior on lock contention
  - Configuration via `allowed_ips` and `blocked_ips` in server config

### File Transfer Filter Module

Policy-based filtering infrastructure for SFTP and SCP file transfer operations (`src/server/filter/`):

**Structure**:
- `mod.rs` - `TransferFilter` trait, `Operation` enum, `FilterResult` enum, `NoOpFilter`
- `policy.rs` - `FilterPolicy` engine, `FilterRule`, `Matcher` trait, `SharedFilterPolicy`
- `path.rs` - Path-based matchers: `PrefixMatcher`, `ExactMatcher`, `ComponentMatcher`, `ExtensionMatcher`
- `pattern.rs` - Pattern-based matchers: `GlobMatcher`, `RegexMatcher`, `CombinedMatcher`, `NotMatcher`

**Key Components**:

- **Operation**: Enum representing file operations
  - `Upload`, `Download`, `Delete`, `Rename`
  - `CreateDir`, `ListDir`, `Stat`, `SetStat`
  - `Symlink`, `ReadLink`

- **FilterResult**: Actions to take on matched operations
  - `Allow` - Permit the operation (default)
  - `Deny` - Block the operation
  - `Log` - Allow but log for auditing

- **TransferFilter Trait**: Interface for custom filter implementations
  - `check(path, operation, user)` - Check single path operations
  - `check_with_dest(src, dest, operation, user)` - Check two-path operations (rename, symlink)
  - `is_enabled()` - Check if filtering is active

- **FilterPolicy**: First-match-wins rule evaluation engine
  - Ordered rule evaluation
  - Configurable default action
  - Enable/disable filtering
  - Create from YAML configuration via `from_config()`

- **FilterRule**: Combines matcher, action, and optional constraints
  - Path pattern matcher
  - Per-operation restrictions
  - Per-user restrictions
  - Named rules for debugging

**Built-in Matchers**:

| Matcher | Purpose | Example |
|---------|---------|---------|
| `GlobMatcher` | Wildcard patterns | `*.key`, `*.pem` |
| `RegexMatcher` | Full regex support | `(?i)\.exe$` |
| `PrefixMatcher` | Directory tree matching | `/etc/` |
| `ExactMatcher` | Specific file matching | `/etc/shadow` |
| `ComponentMatcher` | Path component matching | `.git`, `.ssh` |
| `ExtensionMatcher` | File extension matching | `exe`, `key` |
| `CombinedMatcher` | OR-combine matchers | Multiple patterns |
| `NotMatcher` | Invert matcher results | Exclude patterns |

**Security Features**:
- `normalize_path()` function for path traversal prevention
- ReDoS protection via regex size limits
- Case-insensitive extension matching

**Usage Example**:
```rust
use bssh::server::filter::{FilterPolicy, FilterResult, Operation};
use bssh::server::filter::pattern::GlobMatcher;
use bssh::server::filter::policy::FilterRule;
use std::path::Path;

// Create policy that blocks *.key files
let policy = FilterPolicy::new()
    .with_default(FilterResult::Allow)
    .add_rule(FilterRule::new(
        Box::new(GlobMatcher::new("*.key").unwrap()),
        FilterResult::Deny,
    ));

// Check if operation is allowed
let result = policy.check(
    Path::new("/etc/secret.key"),
    Operation::Download,
    "alice"
);
assert_eq!(result, FilterResult::Deny);
```

**Configuration** (YAML):
```yaml
filter:
  enabled: true
  default_action: allow
  rules:
    - name: block-sensitive-keys
      pattern: "*.{key,pem}"
      action: deny
      operations:
        - download
        - upload
    - name: block-hidden-dirs
      path_prefix: "/home"
      pattern: ".*"
      action: deny
      users:
        - guest
```

### Audit Logging Module

Comprehensive audit logging infrastructure for the SSH server (`src/server/audit/`):

**Structure**:
- `mod.rs` - `AuditManager` for collecting and distributing audit events
- `event.rs` - `AuditEvent` type definitions and builder pattern
- `exporter.rs` - `AuditExporter` trait and `NullExporter` implementation
- `file.rs` - `FileExporter` for JSON Lines output with rotation support

**Key Components**:

- **AuditEvent**: Represents discrete auditable actions with fields for:
  - Unique event ID (UUID v4)
  - Timestamp (UTC)
  - Event type, session ID, username, client IP
  - File paths, bytes transferred, operation result
  - Protocol and additional details

- **EventType**: Categorizes security and operational events:
  - Authentication: `AuthSuccess`, `AuthFailure`, `AuthRateLimited`
  - Sessions: `SessionStart`, `SessionEnd`
  - Commands: `CommandExecuted`, `CommandBlocked`
  - File operations: `FileOpenRead`, `FileOpenWrite`, `FileRead`, `FileWrite`, `FileClose`, `FileUploaded`, `FileDownloaded`, `FileDeleted`, `FileRenamed`
  - Directory operations: `DirectoryCreated`, `DirectoryDeleted`, `DirectoryListed`
  - Filters: `TransferDenied`, `TransferAllowed`
  - Security: `IpBlocked`, `IpUnblocked`, `SuspiciousActivity`

- **EventResult**: Operation outcomes (`Success`, `Failure`, `Denied`, `Error`)

- **AuditExporter Trait**: Interface for audit event destinations
  - `export()` - Export single event
  - `export_batch()` - Export multiple events (optimizable)
  - `flush()` - Ensure pending events are written
  - `close()` - Clean up resources

- **NullExporter**: No-op exporter for testing and disabled audit logging

- **FileExporter**: File-based exporter writing events in JSON Lines format
  - Append mode to preserve existing data
  - Optional log rotation based on file size (`RotateConfig`)
  - Optional gzip compression for rotated files
  - Thread-safe using async Mutex
  - Async I/O using tokio
  - Automatic parent directory creation
  - Restrictive file permissions (0o600 on Unix)

- **AuditManager**: Central manager with async processing
  - Background worker for non-blocking event processing
  - Configurable buffering (buffer size, batch size)
  - Periodic flush intervals
  - Multiple exporter support
  - Graceful shutdown with event flush

**Configuration**:
```rust
let config = AuditConfig::new()
    .with_enabled(true)
    .with_buffer_size(1000)
    .with_batch_size(100)
    .with_flush_interval(5);
```

**File Exporter Usage**:
```rust
use bssh::server::audit::file::{FileExporter, RotateConfig};
use std::path::Path;

// Simple file exporter
let exporter = FileExporter::new(Path::new("/var/log/audit.log"))?;

// With rotation (50 MB, 10 backups, gzip compression)
let rotate_config = RotateConfig::new()
    .with_max_size(50 * 1024 * 1024)
    .with_max_backups(10)
    .with_compress(true);

let exporter = FileExporter::new(Path::new("/var/log/audit.log"))?
    .with_rotation(rotate_config);
```

**Output Format** (JSON Lines - one JSON object per line):
```json
{"id":"uuid","timestamp":"2024-01-15T10:30:00Z","event_type":"file_uploaded","session_id":"sess-001","user":"admin","client_ip":"192.168.1.100","path":"/data/report.pdf","bytes":1048576,"result":"success","protocol":"sftp"}
```

- **OtelExporter**: OpenTelemetry exporter for distributed tracing and observability
  - OTLP/gRPC protocol support using tonic
  - Event to LogRecord mapping with proper attribute conversion
  - Severity level mapping based on event types and results
  - Resource attributes including service.name and service.version
  - Graceful shutdown and flush methods
  - TLS support for secure audit data transmission

- **LogstashExporter**: Logstash exporter for ELK stack integration
  - TCP connection with JSON Lines protocol (newline-delimited JSON)
  - Optional TLS encryption for secure transmission
  - Automatic reconnection on connection failure
  - Batch support for efficient event transmission
  - Connection timeout handling (default: 10 seconds)
  - Configurable host and port

**OtelExporter Usage**:
```rust
use bssh::server::audit::otel::OtelExporter;
use bssh::server::audit::exporter::AuditExporter;
use bssh::server::audit::event::{AuditEvent, EventType};

// Create exporter with OTLP endpoint
let exporter = OtelExporter::new("http://localhost:4317")?;

// Export an audit event
let event = AuditEvent::new(
    EventType::AuthSuccess,
    "alice".to_string(),
    "session-123".to_string(),
);
exporter.export(event).await?;

// Graceful shutdown
exporter.close().await?;
```

**LogstashExporter Usage**:
```rust
use bssh::server::audit::logstash::LogstashExporter;
use bssh::server::audit::exporter::AuditExporter;
use bssh::server::audit::event::{AuditEvent, EventType};

// Create exporter (unencrypted by default)
let exporter = LogstashExporter::new("logstash.example.com", 5044)?
    .with_tls(true);  // Enable TLS for production

// Export an audit event
let event = AuditEvent::new(
    EventType::AuthSuccess,
    "alice".to_string(),
    "session-123".to_string(),
);
exporter.export(event).await?;

// Graceful shutdown
exporter.close().await?;
```

### Server CLI Binary
**Binary**: `bssh-server`

The `bssh-server` binary provides a command-line interface for managing and operating the SSH server:

**Subcommands**:
- **run** - Start the SSH server (default when no subcommand specified)
- **gen-config** - Generate a configuration file template with secure defaults
- **hash-password** - Hash passwords for configuration using Argon2id (recommended)
- **check-config** - Validate configuration files and display settings
- **gen-host-key** - Generate SSH host keys (Ed25519 or RSA)
- **version** - Show version and build information

**Global Options**:
- `-c, --config <FILE>` - Configuration file path
- `-b, --bind-address <ADDR>` - Override bind address
- `-p, --port <PORT>` - Override listen port
- `-k, --host-key <FILE>` - Host key file(s) (can be repeated)
- `-v, --verbose` - Verbosity level (repeatable: -v, -vv, -vvv)
- `-D, --foreground` - Run in foreground (don't daemonize)
- `--pid-file <FILE>` - PID file path

**Usage Examples**:
```bash
# Generate configuration template
bssh-server gen-config -o /etc/bssh/server.yaml

# Generate Ed25519 host key (recommended)
bssh-server gen-host-key -t ed25519 -o /etc/bssh/ssh_host_ed25519_key

# Generate RSA host key (for compatibility)
bssh-server gen-host-key -t rsa -o /etc/bssh/ssh_host_rsa_key --bits 4096

# Hash a password for configuration
bssh-server hash-password

# Validate configuration
bssh-server check-config -c /etc/bssh/server.yaml

# Start server with configuration file
bssh-server -c /etc/bssh/server.yaml

# Start server with CLI overrides
bssh-server -c /etc/bssh/server.yaml -p 2222 -b 0.0.0.0 -k /path/to/key
```

### SSH Server Module
**Documentation**: [docs/architecture/server-configuration.md](./docs/architecture/server-configuration.md)

SSH server implementation using the russh library for accepting incoming connections:

**Structure** (`src/server/`):
- `mod.rs` - `BsshServer` struct and `russh::server::Server` trait implementation
- `config/mod.rs` - Module exports and backward compatibility layer
- `config/types.rs` - Comprehensive configuration types with serde
- `config/loader.rs` - Config loader with validation and environment overrides
- `handler.rs` - `SshHandler` implementing `russh::server::Handler` trait
- `session.rs` - Session state management (`SessionManager`, `SessionInfo`, `ChannelState`)
- `exec.rs` - Command execution for SSH exec requests
- `sftp.rs` - SFTP subsystem handler with path traversal prevention
- `scp.rs` - SCP protocol handler with sink/source modes
- `auth/` - Authentication provider infrastructure
- `audit/` - Audit logging infrastructure (event types, exporters, manager)

**Key Components**:

- **BsshServer**: Main server struct managing the SSH server lifecycle
  - Accepts connections on configured address
  - Loads host keys from OpenSSH format files
  - Configures russh with authentication settings
  - Creates shared rate limiter for authentication attempts

- **Server Configuration System**: Dual configuration system for flexibility
  - **Builder API** (`ServerConfig`): Programmatic configuration for embedded use
  - **File-Based** (`ServerFileConfig`): YAML configuration with environment overrides
  - Configuration precedence: CLI > Environment > File > Defaults
  - Configuration validation at startup (host keys, CIDR ranges, paths)
  - Support for BSSH_* environment variable overrides

- **ServerConfig**: Configuration options with builder pattern
  - Host key paths and listen address
  - Connection limits and timeouts
  - Authentication method toggles (password, publickey, keyboard-interactive)
  - Public key authentication configuration (authorized_keys location)
  - Command execution configuration (shell, timeout, allowed/blocked commands)

- **ServerFileConfig**: Comprehensive YAML file configuration
  - Server settings (bind address, port, host keys, keepalive)
  - Authentication (public key, password with inline or file-based users)
  - Shell configuration (default shell, environment, command timeout)
  - SFTP/SCP enablement with optional chroot
  - File transfer filtering rules
  - Audit logging (file, OpenTelemetry, Logstash exporters)
  - Security settings (auth attempts, bans, session limits, IP allowlist/blocklist)

- **SshHandler**: Per-connection handler for SSH protocol events
  - Public key authentication via AuthProvider trait
  - Rate limiting for authentication attempts (token bucket)
  - Auth rate limiting with ban support (fail2ban-like)
  - Channel operations (open, close, EOF, data)
  - PTY, exec, shell, and subsystem request handling
  - Command execution with stdout/stderr streaming

- **PTY Module** (`src/server/pty.rs`): Pseudo-terminal management for interactive sessions
  - PTY master/slave pair creation using POSIX APIs via nix crate
  - Window size management with TIOCSWINSZ ioctl
  - Async I/O for PTY master file descriptor using tokio's AsyncFd
  - Configuration management (terminal type, dimensions, pixel sizes)
  - Implements `AsyncRead` and `AsyncWrite` for PTY I/O

- **Shell Session Module** (`src/server/shell.rs`): Interactive shell session handler
  - Shell process spawning with login shell configuration (-l flag)
  - Terminal environment setup (TERM, HOME, USER, SHELL, PATH)
  - Bidirectional I/O forwarding between SSH channel and PTY master
  - Window resize event handling forwarded to PTY
  - Proper session cleanup on disconnect (SIGHUP to shell, process termination)
  - Controlling terminal setup via TIOCSCTTY ioctl

- **CommandExecutor**: Executes commands requested by SSH clients
  - Shell-based command execution with `-c` flag
  - Environment variable configuration (HOME, USER, SHELL, PATH)
  - Stdout/stderr streaming to SSH channel
  - Command timeout with graceful process termination
  - Command allow/block list validation for security
  - Exit code propagation to client

- **SessionManager**: Tracks active sessions with configurable capacity
  - Session creation and cleanup
  - Idle session management
  - Authentication state tracking

- **SftpHandler**: SFTP subsystem handler (`src/server/sftp.rs`)
  - Implements `russh_sftp::server::Handler` trait for file transfer operations
  - Path traversal prevention with chroot-like isolation
  - File operations: open, read, write, close
  - Directory operations: opendir, readdir, mkdir, rmdir
  - Attribute operations: stat, lstat, fstat, setstat, fsetstat
  - Path operations: realpath, rename, remove, readlink, symlink
  - Symlink validation ensures targets remain within root directory
  - Handle limit enforcement to prevent resource exhaustion
  - Read size capping to prevent memory exhaustion

- **ScpHandler**: SCP protocol handler (`src/server/scp.rs`)
  - Implements SCP server protocol for file transfers via the `scp` command
  - Sink mode (`-t` flag): receives files from client (upload)
  - Source mode (`-f` flag): sends files to client (download)
  - Recursive transfer support (`-r` flag) for directories
  - Time preservation (`-p` flag) for file modification times
  - Security features:
    - Path traversal prevention with normalized path resolution
    - Symlink escape prevention via canonicalization
    - Filename validation (rejects `/`, `..`, `.`)
    - File size limit (10 GB maximum)
    - Mode permission masking (strips setuid/setgid/sticky bits)
    - Line length limits to prevent DoS via buffer exhaustion
  - Automatic SCP command detection in exec_request handler
  - Configurable via `scp_enabled` setting

### Server Authentication Module

The authentication subsystem (`src/server/auth/`) provides extensible authentication for the SSH server:

**Structure**:
- `mod.rs` - Module exports and re-exports
- `provider.rs` - `AuthProvider` trait definition
- `publickey.rs` - `PublicKeyVerifier` implementation
- `password.rs` - `PasswordVerifier` implementation with Argon2id hashing
- `composite.rs` - `CompositeAuthProvider` combining multiple auth methods

**AuthProvider Trait**:

The `AuthProvider` trait defines the interface for all authentication backends:

```rust
#[async_trait]
pub trait AuthProvider: Send + Sync {
    async fn verify_publickey(&self, username: &str, key: &PublicKey) -> Result<AuthResult>;
    async fn verify_password(&self, username: &str, password: &str) -> Result<AuthResult>;
    async fn get_user_info(&self, username: &str) -> Result<Option<UserInfo>>;
    async fn user_exists(&self, username: &str) -> Result<bool>;
}
```

**PublicKeyVerifier**:

Implements public key authentication by parsing OpenSSH authorized_keys files:

- **Key file location modes**:
  - Directory mode: `{dir}/{username}/authorized_keys`
  - Pattern mode: `/home/{user}/.ssh/authorized_keys`

- **Supported key types**:
  - ssh-ed25519, ssh-ed448
  - ssh-rsa, ssh-dss
  - ecdsa-sha2-nistp256/384/521
  - Security keys (sk-ssh-ed25519, sk-ecdsa-sha2-nistp256)

- **Key options parsing**:
  - `command="..."` - Force specific command
  - `from="..."` - Restrict source addresses
  - `no-pty`, `no-port-forwarding`, `no-agent-forwarding`, `no-X11-forwarding`
  - `environment="..."` - Set environment variables

**PasswordVerifier**:

Implements password authentication with secure password hashing:

- **Argon2id hashing**: Uses the OWASP-recommended password hashing algorithm
  - Memory cost: 19 MiB
  - Time cost: 2 iterations
  - Parallelism: 1

- **User configuration**:
  - External YAML file with user definitions
  - Inline users in server configuration
  - User attributes: name, password_hash, shell, home, env

- **Security features**:
  - Timing attack mitigation with constant-time verification
  - Minimum verification time (100ms) regardless of user existence
  - Dummy hash verification for non-existent users
  - Secure memory cleanup using `zeroize` crate
  - User enumeration protection

- **Hash compatibility**:
  - Argon2id (recommended, generated by `hash-password` command)
  - bcrypt (supported for backward compatibility)

**CompositeAuthProvider**:

Combines multiple authentication methods into a single provider:

- Delegates to `PublicKeyVerifier` for public key auth
- Delegates to `PasswordVerifier` for password auth
- Prioritizes password verifier for user info (more detailed)
- Supports hot-reloading of password users via `reload_password_users()`

**Security Features**:

- **Username validation**: Prevents path traversal attacks (e.g., `../etc/passwd`)
- **File permission checks** (Unix): Rejects world/group-writable files and symlinks
- **Symlink protection**: Uses `symlink_metadata()` to detect and reject symlinks
- **Parent directory validation**: Checks parent directory permissions
- **Rate limiting**: Token bucket rate limiter for authentication attempts
- **Timing attack mitigation**: Constant-time behavior in password verification and `user_exists()` check
- **Secure memory handling**: Password strings cleared from memory after use via `zeroize`
- **Comprehensive logging**: All authentication attempts are logged

## Data Flow

### Command Execution Flow

```
User Input → CLI Parser → Mode Detection → Node Resolution
                                         Configuration Loading
                                         SSH Config Parsing
                                    Jump Host Chain Creation
                                      Parallel Executor Setup
                            ┌────────────────────┴─────────────────┐
                            ▼                                      ▼
                    Connection Pool                       Task Spawning
                            ↓                                      ↓
                    Per-Node Execution                   Semaphore Control
                            ↓                                      ↓
                    Command/Transfer                      Result Collection
                            ↓                                      ↓
                    Output Streaming                     Exit Code Strategy
                            └────────────────────┬─────────────────┘
                                          User Output
```

### Error Handling Strategy

- **Connection errors**: Retry with exponential backoff
- **Authentication failures**: Immediate failure with clear diagnostics
- **Command execution errors**: Captured with exit codes
- **Timeout handling**: Configurable per-connection and per-command
- **Signal handling**: Clean shutdown on Ctrl+C with two-stage confirmation

### Test Environment-Variable Mutation Pattern (`EnvGuard`)

Several test suites must temporarily set or remove process-wide environment
variables (e.g. `BACKENDAI_CLUSTER_HOSTS`, `HOME`, `SSH_AUTH_SOCK`). Under
Rust 2024 edition, `std::env::set_var` and `std::env::remove_var` are marked
`unsafe` because concurrent mutation of the environment is undefined behaviour
at the libc level on glibc, musl, and macOS. `EnvGuard` centralises all such
mutations in `src/test_helpers/env_guard.rs`.

**Soundness contract**: every test that constructs an `EnvGuard` MUST be
annotated with `#[serial_test::serial]`. Every other test in the same crate
binary that reads or mutates the same variable MUST also carry `#[serial]` (or
a matching `#[serial(key)]` group). Note that `#[serial]` only serializes
against other `#[serial]` / `#[parallel]` tests — unannotated tests may still
run concurrently with serial ones and would race on environment reads. This is
not an `EnvGuard` limitation; it is an inherent constraint of the libc
environment-variable API.

```rust
use serial_test::serial;
use crate::test_helpers::EnvGuard;

#[test]
#[serial]
fn my_test() {
    let _host = EnvGuard::set("BACKENDAI_CLUSTER_HOSTS", "node1,node2");
    // Variable is automatically restored when `_host` drops at end of scope.
}
```

**Integration tests** access the same struct via a `#[path]`-based re-export
in `tests/common/mod.rs`, which avoids making `EnvGuard` part of the public
`bssh` crate API while keeping a single source of truth. When adding a new
integration-test binary that needs `EnvGuard`, add `mod common;` at the top of
that file and use `common::EnvGuard`.

Use `#[serial(key)]` (a named group) when two sets of tests touch different,
non-overlapping variables and can therefore run concurrently with each other
but not with themselves; omit the key (plain `#[serial]`) when in doubt.

## Security Model

### Authentication

- SSH agent authentication (auto-detection)
- Private key files with passphrase support
- Password authentication (discouraged in production)
- Public key authentication preferred

### Host Verification

- known_hosts file verification
- Three modes: Yes (strict), No (insecure), AcceptNew (recommended)
- Per-host configuration support
- Host key fingerprint display

### Data Protection

- No credential logging
- Secure memory handling for passphrases
- Encrypted SSH transport (via russh)
- Connection timeout enforcement

### Network Security

- Jump host support for bastion architectures
- Port forwarding for secure tunneling
- SSH config directive support for security policies

## Dependencies and Licensing

### Core Dependencies

- **tokio** - Async runtime
- **russh / russh-sftp** - SSH protocol implementation
- **clap** - CLI argument parsing
- **serde / serde_yaml** - Configuration serialization
- **tracing / tracing-subscriber** - Structured logging
- **anyhow / thiserror** - Error handling

### License

See [LICENSE](./LICENSE) file for licensing information.

## Appendix

### Performance Tuning

- **Parallelism**: Adjust `--parallel` flag (default: 10)
- **Connection timeout**: Use `--connect-timeout` (default: 30s)
- **Command timeout**: Use `--timeout` (default: 5min)
- **Keepalive**: Configurable via `--server-alive-interval` (default: 60s) and `--server-alive-count-max` (default: 3)
  - Interval of 0 disables keepalive
  - Connection is considered dead after `interval * (count_max + 1)` seconds without response
  - Equivalent to OpenSSH `ServerAliveInterval` and `ServerAliveCountMax` options

### Configuration Schema

See [docs/architecture/configuration.md](./docs/architecture/configuration.md) for complete YAML schema and examples.

### Exit Codes

- **0**: Success (all nodes, or main rank succeeded)
- **1**: General failure
- **130**: Terminated by SIGINT (Ctrl+C)
- **Other**: Preserved from main rank (SIGSEGV=139, OOM=137, etc.)

See [docs/architecture/exit-code-strategy.md](./docs/architecture/exit-code-strategy.md) for detailed strategy documentation.

## Further Reading

For detailed component documentation, see:
- [Architecture Documentation Index]./docs/architecture/README.md
- [CLI Interface Documentation]./docs/architecture/cli-interface.md
- [Configuration Management]./docs/architecture/configuration.md
- [Parallel Executor]./docs/architecture/executor.md
- [SSH Client]./docs/architecture/ssh-client.md
- [Terminal User Interface]./docs/architecture/tui.md
- [Interactive Mode]./docs/architecture/interactive-mode.md
- [SSH Configuration Parser]./docs/architecture/ssh-config-parser.md
- [SSH Jump Host Support]./docs/architecture/ssh-jump-hosts.md
- [SSH Port Forwarding]./docs/architecture/ssh-port-forwarding.md
- [Exit Code Strategy]./docs/architecture/exit-code-strategy.md