auth-framework 0.5.0-rc19

# AuthFramework REST API Design Rationale


> Historical design note: this document captures rationale, gaps, and proposed API evolution ideas at the time it was written. It is not a route-by-route description of the currently mounted REST router. For the current implementation, use `docs/api/README.md`, `docs/api/complete-reference.md`, and the live `/api/openapi.json` output.

## Document Purpose


This document explains the **why** behind every design decision in the AuthFramework REST API. It serves as:

1. **Architectural Decision Record (ADR)** - documenting why things are structured as they are
2. **SDK Development Guide** - helping implementers understand intent, not just mechanics
3. **API Evolution Guide** - providing context for future changes
4. **Audit Tool** - helping identify gaps, inconsistencies, or areas for improvement

## Core Design Principles


### 1. **REST + JSON First**


**Why**: Universal compatibility across all languages and platforms.

- JSON is the lingua franca of web APIs
- REST principles provide predictable, intuitive resource manipulation
- HTTP verbs map naturally to CRUD operations
- Stateless design enables horizontal scaling

**Tradeoffs Considered**:

- ❌ GraphQL: Adds complexity, harder to cache, not needed for our use case
- ❌ gRPC: Excellent for service-to-service, poor for web/browser clients
- ✅ REST + JSON: Maximum compatibility, established patterns, excellent tooling

### 2. **Standard HTTP Status Codes**


**Why**: Developers already know what they mean.

We use standard HTTP semantics:

- `200 OK` - Successful request
- `201 Created` - Resource created successfully
- `204 No Content` - Success with no response body
- `400 Bad Request` - Client error (validation, malformed request)
- `401 Unauthorized` - Authentication required or failed
- `403 Forbidden` - Authenticated but not authorized
- `404 Not Found` - Resource doesn't exist
- `409 Conflict` - Resource conflict (duplicate, constraint violation)
- `429 Too Many Requests` - Rate limit exceeded
- `500 Internal Server Error` - Server error
- `503 Service Unavailable` - Service temporarily unavailable

**Anti-pattern Avoided**: Never return `200 OK` with an error in the body. HTTP status codes exist for a reason.

### 3. **Consistent Response Envelope**


**Why**: Predictable parsing across all endpoints.

Every response (except raw metrics/health) uses:

```json
{
  "success": true,
  "data": { /* actual payload */ },
  "timestamp": "2025-09-30T12:00:00Z"
}
```

Or for errors:

```json
{
  "success": false,
  "error": {
    "code": "INVALID_CREDENTIALS",
    "message": "The provided credentials are invalid",
    "details": null
  },
  "timestamp": "2025-09-30T12:00:00Z"
}
```

**Why this structure**:

- `success` boolean: Allows explicit success checking in loosely-typed languages
- `data` object: Actual payload, consistent location
- `error` object: Structured error information
- `timestamp`: Helps with debugging, logging, audit trails

**Tradeoffs**:

- ❌ "Naked" responses: Simpler but inconsistent, harder to parse generically
- ❌ Error-specific structures: More flexible but unpredictable
- ✅ Envelope pattern: Slight verbosity for massive consistency win

### 4. **Bearer Token Authentication**


**Why**: Industry standard, stateless, secure.

```text
Authorization: Bearer <jwt-token>
```

**Why Bearer tokens**:

- RFC 6750 standard
- Stateless - no server-side session storage
- JWT carries claims (user_id, roles, permissions)
- Works across microservices without shared state
- Can be validated independently

**Alternatives Rejected**:

- ❌ API Keys in URL: Security risk (logs, browser history)
- ❌ Basic Auth: Credentials in every request, no expiry
- ❌ Custom header: Non-standard, reinventing the wheel
- ❌ Cookies only: Poor for mobile/native apps, CSRF concerns

**Note**: We also support API keys via `X-API-Key` header for service accounts where appropriate.

## API Structure & Organization


### Path Hierarchy Design


```text
/
├── health/              # Monitoring (always public, unauthenticated)
├── metrics/             # Observability (public, Prometheus format)
├── auth/                # Authentication operations
├── oauth/               # OAuth 2.0 & OIDC operations
├── users/               # User account management (authenticated)
├── mfa/                 # Multi-factor authentication (authenticated)
├── admin/               # Administrative operations (admin role required)
└── api/v1/rbac/         # RBAC operations (versioned, authenticated)
```

### Why This Hierarchy?


#### 1. **Top-level Health & Metrics**


**Path**: `/health`, `/metrics`
**Why**:

- Need to be accessible without authentication (load balancers, monitoring)
- Should never change location (stability for infrastructure)
- Universally expected at root level

**Security Note**: `/health/detailed` requires authentication to prevent information disclosure.

#### 2. **Auth at Root**


**Path**: `/auth/*`
**Why**:

- Authentication is fundamental, not a "feature"
- Login is the entry point - should be obvious
- Commonly expected location (convention)

**Endpoints**:

- `POST /auth/login` - Get tokens
- `POST /auth/refresh` - Refresh access token
- `POST /auth/logout` - Invalidate tokens
- `GET /auth/validate` - Check token validity
- `GET /auth/providers` - List available auth providers

**Design Note**: We use `/auth/login` not `/login` to keep auth operations grouped.

#### 3. **OAuth Separate from Auth**


**Path**: `/oauth/*`
**Why**:

- OAuth is a protocol, not just "authentication"
- Distinct use case (delegated authorization)
- Different clients (third-party apps vs first-party users)
- Follows OAuth 2.0 spec conventions

**Endpoints**:

- `GET /oauth/authorize` - Authorization endpoint (OAuth 2.0 spec)
- `POST /oauth/token` - Token endpoint (OAuth 2.0 spec)
- `POST /oauth/revoke` - Token revocation (RFC 7009)
- `POST /oauth/introspect` - Token introspection (RFC 7662)
- `GET /oauth/clients/{client_id}` - Client information

**Standard Compliance**: Paths match OAuth 2.0 specification recommendations.

#### 4. **User Operations**


**Path**: `/users/*`
**Why**:

- Resource-oriented (users are resources)
- Operations on the authenticated user's own account
- Distinct from admin user management

**Endpoints**:

- `GET /users/profile` - Get own profile
- `PUT /users/profile` - Update own profile
- `POST /users/change-password` - Change own password
- `GET /users/sessions` - List own sessions
- `DELETE /users/sessions/{session_id}` - Revoke own session
- `GET /users/{user_id}/profile` - Get another user's public profile

**Design Choice**: Why not `/me`?

- `/me` is a nice shorthand but `/users/profile` is more explicit
- Consistency: all user operations under `/users`
- RESTful: `/users` represents the users collection

#### 5. **MFA Operations**


**Path**: `/mfa/*`
**Why**:

- MFA is a distinct feature, not just "auth"
- Multiple operations (setup, verify, disable, backup codes)
- Should be grouped together for discoverability

**Endpoints**:

- `POST /mfa/setup` - Initialize MFA
- `POST /mfa/verify` - Verify MFA code
- `POST /mfa/disable` - Disable MFA
- `GET /mfa/status` - Check MFA status
- `POST /mfa/regenerate-backup-codes` - New backup codes
- `POST /mfa/verify-backup-code` - Use backup code

**Design Note**: Could have been `/users/mfa/*` but MFA deserves top-level visibility.

#### 6. **Admin Operations**


**Path**: `/admin/*`
**Why**:

- Clear separation of admin vs user operations
- Easier to apply admin-only middleware
- Security: explicit admin intent
- Discoverability: all admin operations in one place

**Endpoints**:

- `GET /admin/users` - List all users
- `POST /admin/users` - Create user (admin)
- `PUT /admin/users/{user_id}/roles` - Assign roles
- `DELETE /admin/users/{user_id}` - Delete user
- `PUT /admin/users/{user_id}/activate` - Activate/deactivate
- `GET /admin/stats` - System statistics
- `GET /admin/audit-logs` - Audit log access

**Security Model**: All `/admin/*` endpoints require `admin` role or specific permissions.

#### 7. **RBAC Operations (Versioned API)**


**Path**: `/api/v1/rbac/*`
**Why versioned**:

- RBAC is complex and likely to evolve
- Version prefix allows breaking changes without disrupting old clients
- Clear contract: `/api/v1/` = versioned, stable API

**Why `/api/v1/` prefix**:

- Distinguishes versioned from unversioned endpoints
- Common pattern (GitHub, Stripe, many others)
- Future-proof: `/api/v2/` when needed

**Endpoints**:

- `POST /api/v1/rbac/roles` - Create role
- `GET /api/v1/rbac/roles` - List roles
- `GET /api/v1/rbac/roles/{role_id}` - Get role
- `PUT /api/v1/rbac/roles/{role_id}` - Update role
- `DELETE /api/v1/rbac/roles/{role_id}` - Delete role
- `POST /api/v1/rbac/users/{user_id}/roles` - Assign role
- `DELETE /api/v1/rbac/users/{user_id}/roles/{role_id}` - Revoke role
- `GET /api/v1/rbac/users/{user_id}/roles` - Get user roles
- `POST /api/v1/rbac/bulk/assign` - Bulk role assignment
- `POST /api/v1/rbac/check-permission` - Check permission
- `POST /api/v1/rbac/elevate` - Elevate privileges
- `GET /api/v1/rbac/audit` - RBAC audit log

**Design Question**: Why not `/admin/rbac/*`?

- RBAC is not admin-only - users check their own permissions
- RBAC might need different versioning than admin operations
- Separation of concerns: RBAC is authorization, admin is management

## HTTP Verb Usage


### POST vs PUT vs PATCH


**POST** - Create new resources or perform actions

- `POST /auth/login` - Action: authenticate
- `POST /api/v1/rbac/roles` - Create new role
- `POST /mfa/verify` - Action: verify code

**PUT** - Replace entire resource

- `PUT /users/profile` - Replace entire profile
- `PUT /api/v1/rbac/roles/{role_id}` - Replace role

**PATCH** - Partial update (future consideration)

- Not currently used, but reserved for partial updates
- Example: `PATCH /users/profile` could update only changed fields

**DELETE** - Remove resource

- `DELETE /users/sessions/{session_id}` - Remove session
- `DELETE /api/v1/rbac/roles/{role_id}` - Remove role

**GET** - Retrieve resource(s)

- `GET /users/profile` - Retrieve profile
- `GET /api/v1/rbac/roles` - List roles

### Why Actions Use POST


Some endpoints are actions, not resource manipulations:

- `POST /auth/login` - Not creating a "login", performing authentication
- `POST /auth/logout` - Action: invalidate token
- `POST /mfa/verify` - Action: verify code
- `POST /api/v1/rbac/check-permission` - Action: check permission

**Alternative considered**: Custom HTTP verbs (WEBDAV)

- ❌ Non-standard, poor tooling support
- ✅ POST for actions is widely accepted

## Authentication & Authorization Model


### Three-Layer Security


1. **Public Endpoints** (no authentication)
   - `/health`
   - `/metrics`
   - `/auth/login`
   - `/oauth/authorize`
   - `/oauth/token`

2. **Authenticated Endpoints** (valid token required)
   - `/users/*`
   - `/mfa/*`
   - `/api/v1/rbac/check-permission`

3. **Authorized Endpoints** (specific roles/permissions required)
   - `/admin/*` - requires `admin` role
   - `/api/v1/rbac/roles` (POST) - requires `rbac:roles:create`
   - `/admin/audit-logs` - requires `audit:read`

### Permission Checking Strategy


**Where to check**:

1. **Middleware**: Token validation (all authenticated endpoints)
2. **Endpoint**: Role/permission checking (specific endpoints)

**Why not all in middleware**:

- Different endpoints need different permissions
- Some endpoints have complex authorization (context-dependent)
- Better error messages at endpoint level

### Token Types


1. **Access Token** (short-lived, 15-60 minutes)
   - Used for API requests
   - Contains user_id, roles, permissions
   - Cannot be revoked (short lifetime mitigates risk)

2. **Refresh Token** (long-lived, days/weeks)
   - Used to obtain new access tokens
   - Can be revoked
   - Stored securely, not sent on every request

**Why two tokens**:

- Security: Minimize access token exposure
- Performance: Stateless access token validation
- Flexibility: Can revoke refresh tokens without affecting active sessions

## Request/Response Patterns


### Pagination


For list endpoints:

```text
GET /api/v1/rbac/roles?page=2&per_page=50
```

**Response includes**:

```json
{
  "success": true,
  "data": {
    "items": [...],
    "total_count": 150,
    "page": 2,
    "per_page": 50,
    "total_pages": 3
  }
}
```

**Why page-based not cursor-based**:

- Simpler for most use cases
- Allows jumping to specific pages
- Total count is useful for UI

**Future consideration**: Cursor-based pagination for very large datasets.

### Filtering & Sorting


```text
GET /admin/users?role=admin&sort=created_at&order=desc
```

**Why query parameters**:

- RESTful convention
- Easy to construct in any HTTP client
- Clear separation from resource path

### Timestamps


**Format**: ISO 8601 with timezone (RFC 3339)

```text
"2025-09-30T12:00:00Z"
```

**Why**:

- Unambiguous
- Sortable as strings
- Parseable by all date libraries

### IDs


**Format**: UUID v4 or sequential strings

```text
"123e4567-e89b-12d3-a456-426614174000"
```

**Why UUIDs**:

- Globally unique without coordination
- No enumeration attacks
- Can be generated client-side

**Tradeoff**: Larger than integers, but security and distribution benefits win.

## Error Handling


### Error Code Structure


```json
{
  "success": false,
  "error": {
    "code": "INVALID_CREDENTIALS",
    "message": "The provided credentials are invalid",
    "details": {
      "field": "password",
      "reason": "incorrect"
    }
  },
  "timestamp": "2025-09-30T12:00:00Z"
}
```

### Error Code Categories


**Authentication Errors** (`AUTH_*`)

- `AUTH_REQUIRED` - Authentication required
- `INVALID_CREDENTIALS` - Bad username/password
- `INVALID_TOKEN` - Token validation failed
- `TOKEN_EXPIRED` - Token has expired
- `MFA_REQUIRED` - MFA verification needed

**Authorization Errors** (`AUTHZ_*`)

- `INSUFFICIENT_PERMISSIONS` - Missing required permission
- `ROLE_REQUIRED` - Missing required role
- `FORBIDDEN` - Operation not allowed

**Validation Errors** (`VALIDATION_*`)

- `INVALID_INPUT` - Request validation failed
- `MISSING_REQUIRED_FIELD` - Required field not provided
- `INVALID_FORMAT` - Field format incorrect

**Resource Errors** (`RESOURCE_*`)

- `NOT_FOUND` - Resource doesn't exist
- `ALREADY_EXISTS` - Duplicate resource
- `CONFLICT` - Resource state conflict

**Rate Limiting** (`RATE_*`)

- `RATE_LIMIT_EXCEEDED` - Too many requests

**System Errors** (`SYSTEM_*`)

- `INTERNAL_ERROR` - Server error
- `SERVICE_UNAVAILABLE` - Service temporarily unavailable
- `DATABASE_ERROR` - Database operation failed

**Why structured codes**:

- Machine-readable for client logic
- Categorized for easier handling
- Human-readable messages for debugging

### Details Field


Optional `details` object provides additional context:

```json
"details": {
  "field": "email",
  "reason": "invalid_format",
  "expected": "user@example.com"
}
```

**When to include details**:

- Validation errors (which field failed)
- Complex errors (multiple issues)
- Debug information (development mode)

**When to omit details**:

- Security-sensitive errors (don't leak info)
- Simple errors (message is enough)

## Rate Limiting


### Strategy


**Per-endpoint rate limits**:

- `/auth/login`: 5 requests/minute per IP
- Standard endpoints: 100 requests/minute per user
- Admin endpoints: 50 requests/minute per user

### Headers


```text
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 95
X-RateLimit-Reset: 1633024800
```

**Why per-endpoint**:

- Authentication endpoints need stricter limits (brute force)
- Read operations can be more permissive than writes
- Admin operations need moderate protection

**Why these numbers**:

- Login: 5/min prevents brute force while allowing retries
- Standard: 100/min = ~1.6/sec, enough for interactive use
- Admin: 50/min = safety margin for powerful operations

## Security Considerations


### CORS Policy


**Default**: Disabled in production
**Development**: Can be enabled via configuration

**Why disabled by default**:

- Most secure option
- Authentication APIs typically aren't accessed cross-origin
- Enables only when explicitly needed

### CSRF Protection


**Not required for Bearer token authentication**:

- Tokens in Authorization header
- Browser doesn't automatically send tokens
- Not vulnerable to CSRF

**Would be required for**:

- Cookie-based authentication
- Form submissions

### HTTPS Enforcement


**Should always be enforced in production**:

- Bearer tokens in clear text over HTTP = security disaster
- Many endpoints contain sensitive data

**Configuration**: Server should reject HTTP in production.

### Sensitive Data Handling


**Never return in responses**:

- Password hashes
- Raw secrets
- Internal system details
- Full stack traces (production)

**Audit logging**:

- All authentication attempts
- Permission checks
- Admin operations
- Sensitive data access

## API Versioning Strategy


### Current Approach


**Two-tier system**:

1. **Unversioned endpoints** - Stable, won't break
   - `/auth/*`, `/oauth/*`, `/users/*`, etc.
   - Follow semantic versioning of library
   - Breaking changes require major version bump

2. **Versioned endpoints** - Complex features
   - `/api/v1/rbac/*`
   - Can evolve independently
   - `/api/v2/rbac/*` can coexist

### Why This Hybrid Approach?


**Unversioned for stable features**:

- Authentication patterns are well-established
- OAuth 2.0 is a stable spec
- Basic user operations rarely need breaking changes

**Versioned for complex features**:

- RBAC is complex and evolving
- Permission model might need significant changes
- Allows innovation without breaking existing integrations

### Version Deprecation Policy


When introducing breaking changes:

1. Release new version (`/api/v2/*`)
2. Maintain old version for minimum 6 months
3. Add deprecation warnings to old version
4. Communicate migration path
5. Remove old version in major library version

## OpenAPI Specification


### Why OpenAPI 3.1?


**Benefits**:

- Machine-readable API definition
- Generates client SDKs automatically
- Interactive documentation (Swagger UI)
- Validation tooling
- Industry standard

**Our Usage**:

- `openapi.yaml` is source of truth
- Used to generate documentation
- Used to validate requests/responses
- Used for SDK generation

### Structure


Our OpenAPI spec includes:

- Complete endpoint definitions
- Request/response schemas
- Authentication schemes
- Error responses
- Examples

**Maintained**: Updated with every API change (enforced in PR reviews).

## Identified Gaps & Improvements Needed


### 🔴 Critical Issues


1. **RBAC Endpoints Not Registered**
   - **Problem**: RBAC endpoints exist but aren't in the router
   - **Impact**: Unusable in current state
   - **Fix Required**: Add RBAC routes to `server.rs`

2. **Missing OIDC Endpoints**
   - **Problem**: OIDC provider exists but no REST endpoints
   - **Missing**: `/.well-known/openid-configuration`, `/userinfo`, `/jwks`
   - **Impact**: OIDC clients can't discover or use OIDC features

3. **No Token Exchange Endpoint**
   - **Problem**: Token exchange exists but no REST API
   - **Missing**: `POST /oauth/token-exchange` (RFC 8693)
   - **Impact**: Can't use token exchange from REST API

### 🟡 Medium Priority


1. **Inconsistent Documentation**
   - **Problem**: OpenAPI spec mentions endpoints not in code
   - **Impact**: Confusing for users, SDK generation fails
   - **Fix**: Audit OpenAPI against actual routes

2. **No Batch Operations**
   - **Problem**: Some operations need to happen in bulk
   - **Missing**: Batch user creation, batch permission checks
   - **Impact**: Performance issues for bulk operations

3. **Limited Filtering Options**
   - **Problem**: List endpoints have basic pagination only
   - **Missing**: Filtering by multiple fields, search
   - **Impact**: Clients must filter client-side

### 🟢 Nice to Have


1. **No WebSocket Support**
   - **Opportunity**: Real-time permission updates
   - **Use case**: Live session invalidation, role changes

2. **No GraphQL Alternative**
   - **Opportunity**: Complex queries with single request
   - **Use case**: Dashboard with user + roles + permissions

## Future API Evolution


### Planned Additions (v0.5.0)


1. **WebAuthn/Passkey Endpoints**
   - `POST /auth/webauthn/register`
   - `POST /auth/webauthn/authenticate`

2. **OIDC Discovery**
   - `GET /.well-known/openid-configuration`
   - `GET /.well-known/jwks.json`

3. **Token Exchange**
   - `POST /oauth/token-exchange`

4. **Enhanced Admin APIs**
   - Batch operations
   - Advanced filtering
   - Export/import capabilities

### Under Consideration


1. **Webhooks**
   - Notify external systems of events
   - User creation, role changes, auth failures

2. **API Keys Management**
   - `POST /api-keys` - Create API key
   - `GET /api-keys` - List keys
   - `DELETE /api-keys/{key_id}` - Revoke key

3. **Session Management API**
   - `GET /sessions` - List all sessions
   - `DELETE /sessions/{session_id}` - Kill session
   - `POST /sessions/{session_id}/extend` - Extend session

## SDK Implications


### What SDKs Should Provide


Based on this API design, language SDKs should:

1. **Type-Safe Models**
   - Every request/response as a typed struct
   - Generated from OpenAPI spec

2. **Automatic Token Management**
   - Store access + refresh tokens
   - Auto-refresh when access token expires
   - Handle token errors gracefully

3. **Retry Logic**
   - Retry on 429 (rate limit)
   - Exponential backoff
   - Configurable retry strategies

4. **Error Handling**
   - Translate error codes to exceptions/results
   - Provide error code enums
   - Include details when available

5. **Pagination Helpers**
   - Iterator pattern for list endpoints
   - Automatic page fetching
   - Lazy loading

6. **Builder Patterns**
   - Fluent APIs for constructing requests
   - Sensible defaults
   - Type-safe option passing

### Rust SDK Specifics


The Rust SDK should be:

- **Zero-cost**: No runtime overhead vs manual reqwest calls
- **Type-safe**: Leverage Rust's type system fully
- **Async**: Tokio-based, `async fn` everywhere
- **Ergonomic**: Builder patterns, Result types
- **Well-documented**: Docs.rs compatible, examples

Example desired API:

```rust
let client = AuthFrameworkClient::builder()
    .base_url("https://auth.example.com")
    .build()?;

// Automatic token management
let tokens = client.login("user@example.com", "password").await?;

// Type-safe permission checking
let allowed = client
    .check_permission("read", "documents/123")
    .await?;

// Iterator for pagination
let users = client.admin().list_users()
    .page_size(50)
    .all() // Returns async iterator
    .await?;
```

## Conclusion


This API design prioritizes:

1. **Compatibility**: Works everywhere, standard patterns
2. **Security**: Defense in depth, secure by default
3. **Consistency**: Predictable structure, clear conventions
4. **Evolvability**: Versioning strategy for long-term maintenance
5. **Developer Experience**: Intuitive paths, good errors, complete docs

Every decision is deliberate, balancing:

- Simplicity vs Flexibility
- Security vs Usability  
- Standards vs Innovation
- Present needs vs Future growth

This document should be updated whenever API design decisions are made, ensuring we never lose the reasoning behind our choices.

---

**Document Status**: Living document, updated with each API change  
**Last Updated**: 2025-09-30  
**Next Review**: When adding v0.5.0 features