orion-error 0.6.3

# 错误归集机制

> 历史设计说明：
> 本页主要保存早期的错误归集和跨层传播设计示意，不是 `orion-error 0.6.x / V1 API` 的直接使用手册。
> 文中的 `StructReason`、`UvsReason::from_*`、手写链路模型以及自定义包装示例，很多都不对应当前源码中的公开接口。
> 当前 V1 主路径请以 `into_as(...)` / `wrap_as(...)` / `err_conv()` / `with_std_source(...)` / `with_struct_source(...)` 为准，详见 [使用教程](../tutorial.md) 和 [V1 修复与评审基线](../v1-fix-and-review-plan.md)。

## 当前 V1 对照

如果你的目标是“按当前 crate 做跨层错误归集”，优先按下面的映射理解：

- 普通 `StdError` 第一次进入结构化体系：`into_as(...)`
- 已经是 `StructError<_>`，只做 reason 映射：`err_conv()`
- 已经是 `StructError<_>`，上层建立新语义边界：`wrap_as(...)`
- 普通底层 source：`with_std_source(...)`
- 结构化下层 source：`with_struct_source(...)`

下面正文中的很多类型和代码块，只能当作历史设计示意，不能直接复制为当前实现。

## 概述

错误归集机制是错误处理系统的核心组成部分，负责将不同层级、不同来源的错误信息进行统一收集、转换和管理。通过有效的错误归集，可以确保错误信息在系统各层级间传递时不会丢失关键上下文，同时保持错误类型的一致性和可追溯性。

## 跨层错误转换

### 转换原则

跨层错误转换遵循以下核心原则：

1. **语义保留**: 转换过程中保持错误的原始语义
2. **信息完整**: 保留所有相关的错误上下文信息
3. **层级映射**: 将底层错误映射到合适的业务层错误类型
4. **链式追踪**: 维护完整的错误转换链路

### 转换模式

以下示例是历史设计稿中的转换示意，不对应当前 `orion-error 0.6.x / V1 API` 的稳定接口：

#### 标准转换模式
```rust
// 底层存储错误到业务层错误的转换
impl From<StoreReason> for StructReason<OrderReason> {
    fn from(value: StoreReason) -> Self {
        match value {
            // 存储空间不足 -> 系统错误
            StoreReason::StorageFull => 
                StructReason::from(UvsReason::from_sys("storage full")),
            
            // 连接失败 -> 依赖错误
            StoreReason::ConnectionFailed => 
                StructReason::from(UvsReason::from_dep("database unavailable")),
            
            // 约束冲突 -> 业务错误
            StoreReason::ConstraintViolation => 
                StructReason::from(UvsReason::from_biz("data conflict")),
            
            // 权限错误 -> 业务错误
            StoreReason::PermissionDenied => 
                StructReason::from(UvsReason::from_biz("access denied")),
            
            // 未知错误 -> 系统错误
            StoreReason::Unknown(error) => 
                StructReason::from(UvsReason::from_sys(&format!("unknown error: {}", error))),
        }
    }
}
```

#### 带上下文的转换
```rust
// 带有上下文信息的错误转换
impl From<DatabaseError> for OrderError {
    fn from(db_error: DatabaseError) -> Self {
        let order_error = match db_error {
            DatabaseError::ConnectionFailed => {
                OrderError::ServiceUnavailable {
                    service: "database".to_string(),
                    message: "无法连接到数据库".to_string(),
                    retry_after: Some(Duration::from_secs(30)),
                }
            },
            DatabaseError::Timeout => {
                OrderError::Timeout {
                    operation: "database_query".to_string(),
                    timeout_ms: 5000,
                }
            },
            DatabaseError::ConstraintViolation { constraint, table } => {
                OrderError::ValidationFailed {
                    field: table.clone(),
                    message: format!("数据约束违反: {} in table {}", constraint, table),
                }
            },
            DatabaseError::RecordNotFound { table, id } => {
                OrderError::NotFound {
                    resource: format!("{}:{}", table, id),
                }
            },
        };
        
        // 附加原始错误信息
        order_error.with_source_error(db_error)
    }
}
```

### 转换链管理

本节的 `ErrorChain` / `ConversionStep` 是历史治理模型示意，不是当前 crate 已提供的公共类型。

#### 错误转换链定义
```rust
#[derive(Debug, Clone)]
pub struct ErrorChain {
    pub original_error: String,
    pub conversion_steps: Vec<ConversionStep>,
    pub final_error: String,
}

#[derive(Debug, Clone)]
pub struct ConversionStep {
    pub from_type: String,
    pub to_type: String,
    pub timestamp: SystemTime,
    pub context: HashMap<String, String>,
}

impl ErrorChain {
    pub fn new(original_error: String) -> Self {
        Self {
            original_error,
            conversion_steps: Vec::new(),
            final_error: String::new(),
        }
    }
    
    pub fn add_conversion(&mut self, from_type: String, to_type: String, context: HashMap<String, String>) {
        self.conversion_steps.push(ConversionStep {
            from_type,
            to_type,
            timestamp: SystemTime::now(),
            context,
        });
    }
    
    pub fn set_final_error(&mut self, final_error: String) {
        self.final_error = final_error;
    }
}
```

#### 转换链使用示例
```rust
pub fn process_order_with_chain(order: Order) -> Result<OrderResponse, OrderError> {
    let mut error_chain = ErrorChain::new("开始处理订单".to_string());
    
    match save_order_to_database(&order) {
        Ok(saved_order) => {
            error_chain.add_conversion(
                "DatabaseOperation".to_string(),
                "OrderSaved".to_string(),
                HashMap::new(),
            );
            // 继续处理...
        },
        Err(db_error) => {
            error_chain.add_conversion(
                "DatabaseError".to_string(),
                "OrderError".to_string(),
                HashMap::from([
                    ("order_id".to_string(), order.id.to_string()),
                    ("operation".to_string(), "save_order".to_string()),
                ]),
            );
            
            let order_error: OrderError = db_error.into();
            error_chain.set_final_error(format!("{:?}", order_error));
            
            return Err(order_error.with_error_chain(error_chain));
        }
    }
}
```

## 错误上下文保留

### 上下文类型

#### 请求上下文
```rust
#[derive(Debug, Clone)]
pub struct RequestContext {
    pub request_id: String,
    pub user_id: Option<String>,
    pub client_ip: String,
    pub user_agent: String,
    pub timestamp: SystemTime,
    pub trace_id: String,
}

impl RequestContext {
    pub fn new() -> Self {
        Self {
            request_id: generate_request_id(),
            user_id: None,
            client_ip: String::new(),
            user_agent: String::new(),
            timestamp: SystemTime::now(),
            trace_id: generate_trace_id(),
        }
    }
    
    pub fn with_user_id(mut self, user_id: String) -> Self {
        self.user_id = Some(user_id);
        self
    }
    
    pub fn with_client_info(mut self, client_ip: String, user_agent: String) -> Self {
        self.client_ip = client_ip;
        self.user_agent = user_agent;
        self
    }
}
```

#### 业务上下文
```rust
#[derive(Debug, Clone)]
pub struct BusinessContext {
    pub operation: String,
    pub entity_type: String,
    pub entity_id: Option<String>,
    pub business_rules: Vec<String>,
    pub custom_fields: HashMap<String, String>,
}

impl BusinessContext {
    pub fn new(operation: String, entity_type: String) -> Self {
        Self {
            operation,
            entity_type,
            entity_id: None,
            business_rules: Vec::new(),
            custom_fields: HashMap::new(),
        }
    }
    
    pub fn with_entity_id(mut self, entity_id: String) -> Self {
        self.entity_id = Some(entity_id);
        self
    }
    
    pub fn add_business_rule(mut self, rule: String) -> Self {
        self.business_rules.push(rule);
        self
    }
    
    pub fn add_custom_field(mut self, key: String, value: String) -> Self {
        self.custom_fields.insert(key, value);
        self
    }
}
```

### 上下文管理器

#### WithContext 实现
```rust
pub struct WithContext {
    pub request_context: RequestContext,
    pub business_context: BusinessContext,
    pub error_context: Vec<ContextFrame>,
}

#[derive(Debug, Clone)]
pub struct ContextFrame {
    pub location: String,
    pub timestamp: SystemTime,
    pub data: HashMap<String, String>,
}

impl WithContext {
    pub fn want(operation: &str) -> Self {
        Self {
            request_context: RequestContext::new(),
            business_context: BusinessContext::new(operation.to_string(), "unknown".to_string()),
            error_context: Vec::new(),
        }
    }
    
    pub fn with(mut self, context_data: &str) -> Self {
        let frame = ContextFrame {
            location: "user_provided".to_string(),
            timestamp: SystemTime::now(),
            data: HashMap::from([
                ("context".to_string(), context_data.to_string()),
            ]),
        };
        self.error_context.push(frame);
        self
    }
    
    pub fn with_request_context(mut self, request_context: RequestContext) -> Self {
        self.request_context = request_context;
        self
    }
    
    pub fn with_business_context(mut self, business_context: BusinessContext) -> Self {
        self.business_context = business_context;
        self
    }
    
    pub fn add_frame(mut self, location: String, data: HashMap<String, String>) -> Self {
        let frame = ContextFrame {
            location,
            timestamp: SystemTime::now(),
            data,
        };
        self.error_context.push(frame);
        self
    }
}
```

#### 上下文使用示例

以下示例属于历史设计说明，用于解释旧式上下文聚合写法。

对于 `0.6.x / V1 API`：

- 不再把 `WithContext::want(...)` / 链式 `.want(...)` 作为推荐主路径
- 不再把 `.owe_*()` 作为普通错误第一次进入结构化体系的首选写法
- 请优先参考 `into_as(...)` / `wrap_as(...)` / `doing(...)` 的主路径说明

```rust
fn place_order() -> Result<Order> {
    // 创建错误上下文
    let mut ctx = WithContext::want("place_order");
    ctx.with(order_txt);
    
    // 解析订单并绑定上下文
    parse_order()
        .want("解析订单")
        .with(&ctx)  // 绑定上下文
        .owe_biz()?
}

impl<T, E> ResultExt<T, E> for Result<T, E> {
    fn want(self, description: &str) -> Result<T, ContextualError<E>> {
        self.map_err(|e| ContextualError::new(e, description))
    }
    
    fn with(mut self, ctx: &WithContext) -> Result<T, ContextualError<E>> {
        if let Err(ref mut contextual_error) = self {
            contextual_error.request_context = ctx.request_context.clone();
            contextual_error.business_context = ctx.business_context.clone();
            contextual_error.context_frames.extend(ctx.error_context.clone());
        }
        self
    }
    
    fn owe_biz(self) -> Result<T, BusinessError> {
        self.map_err(|e| BusinessError::from_contextual_error(e))
    }
}
```

### 上下文序列化

#### JSON 序列化
```rust
impl serde::Serialize for ContextualError<E> {
    fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
    where
        S: serde::Serializer,
    {
        let mut state = serializer.serialize_struct("ContextualError", 6)?;
        
        state.serialize_field("error", &self.error)?;
        state.serialize_field("description", &self.description)?;
        state.serialize_field("timestamp", &self.timestamp)?;
        state.serialize_field("request_context", &self.request_context)?;
        state.serialize_field("business_context", &self.business_context)?;
        state.serialize_field("context_frames", &self.context_frames)?;
        
        state.end()
    }
}

// 序列化示例
fn log_error_with_context<E: std::fmt::Debug>(error: &ContextualError<E>) -> String {
    serde_json::to_string_pretty(error).unwrap_or_else(|_| {
        format!("Error serializing contextual error: {:?}", error)
    })
}
```

#### 日志格式化
```rust
impl<E: std::fmt::Debug> std::fmt::Display for ContextualError<E> {
    fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result {
        write!(f, "Error: {} - {}\n", self.description, self.error)?;
        write!(f, "Timestamp: {}\n", self.timestamp)?;
        write!(f, "Request ID: {}\n", self.request_context.request_id)?;
        write!(f, "Trace ID: {}\n", self.request_context.trace_id)?;
        write!(f, "Operation: {}\n", self.business_context.operation)?;
        
        if let Some(entity_id) = &self.business_context.entity_id {
            write!(f, "Entity ID: {}\n", entity_id)?;
        }
        
        if !self.context_frames.is_empty() {
            write!(f, "Context Frames:\n")?;
            for (index, frame) in self.context_frames.iter().enumerate() {
                write!(f, "  [{}]: {} at {}\n", index, frame.location, frame.timestamp)?;
                for (key, value) in &frame.data {
                    write!(f, "    {}: {}\n", key, value)?;
                }
            }
        }
        
        Ok(())
    }
}
```

## 错误聚合策略

### 聚合维度

#### 时间维度聚合
```rust
pub struct TimeBasedAggregator {
    buckets: HashMap<SystemTime, ErrorBucket>,
    bucket_duration: Duration,
    max_buckets: usize,
}

impl TimeBasedAggregator {
    pub fn new(bucket_duration: Duration, max_buckets: usize) -> Self {
        Self {
            buckets: HashMap::new(),
            bucket_duration,
            max_buckets,
        }
    }
    
    pub fn add_error(&mut self, error: &dyn std::fmt::Debug) {
        let now = SystemTime::now();
        let bucket_time = self.get_bucket_time(now);
        
        let bucket = self.buckets.entry(bucket_time).or_insert_with(|| ErrorBucket::new(bucket_time));
        bucket.add_error(error);
        
        // 清理过期的桶
        self.cleanup_old_buckets(now);
    }
    
    fn get_bucket_time(&self, time: SystemTime) -> SystemTime {
        let duration_since_epoch = time.duration_since(SystemTime::UNIX_EPOCH).unwrap();
        let bucket_duration_since_epoch = duration_since_epoch.as_secs() / self.bucket_duration.as_secs();
        SystemTime::UNIX_EPOCH + Duration::from_secs(bucket_duration_since_epoch * self.bucket_duration.as_secs())
    }
    
    fn cleanup_old_buckets(&mut self, current_time: SystemTime) {
        let cutoff_time = current_time - Duration::from_secs(self.bucket_duration.as_secs() * self.max_buckets as u64);
        self.buckets.retain(|&time, _| time >= cutoff_time);
    }
    
    pub fn get_error_summary(&self) -> Vec<ErrorSummary> {
        self.buckets.values().map(|bucket| bucket.summary()).collect()
    }
}
```

#### 类型维度聚合
```rust
pub struct TypeBasedAggregator {
    error_counts: HashMap<String, usize>,
    error_samples: HashMap<String, Vec<String>>,
    max_samples_per_type: usize,
}

impl TypeBasedAggregator {
    pub fn new(max_samples_per_type: usize) -> Self {
        Self {
            error_counts: HashMap::new(),
            error_samples: HashMap::new(),
            max_samples_per_type,
        }
    }
    
    pub fn add_error(&mut self, error_type: String, error_message: String) {
        // 增加计数
        *self.error_counts.entry(error_type.clone()).or_insert(0) += 1;
        
        // 添加样本
        let samples = self.error_samples.entry(error_type).or_insert_with(Vec::new);
        samples.push(error_message);
        
        // 限制样本数量
        if samples.len() > self.max_samples_per_type {
            samples.remove(0);
        }
    }
    
    pub fn get_error_statistics(&self) -> Vec<ErrorStatistic> {
        self.error_counts.iter().map(|(error_type, count)| {
            let samples = self.error_samples.get(error_type).cloned().unwrap_or_default();
            ErrorStatistic {
                error_type: error_type.clone(),
                count: *count,
                recent_samples: samples,
            }
        }).collect()
    }
}
```

### 聚合配置

#### 聚合策略配置
```rust
#[derive(Debug, Clone)]
pub struct AggregationConfig {
    pub time_window: Duration,
    pub max_errors_per_window: usize,
    pub sampling_rate: f64,
    pub aggregation_dimensions: Vec<AggregationDimension>,
    pub alert_thresholds: HashMap<String, usize>,
}

#[derive(Debug, Clone, PartialEq)]
pub enum AggregationDimension {
    ErrorType,
    ErrorLocation,
    ErrorSeverity,
    UserSegment,
    ServiceName,
}

impl Default for AggregationConfig {
    fn default() -> Self {
        Self {
            time_window: Duration::from_secs(300), // 5分钟
            max_errors_per_window: 1000,
            sampling_rate: 0.1, // 10%采样率
            aggregation_dimensions: vec![
                AggregationDimension::ErrorType,
                AggregationDimension::ErrorLocation,
            ],
            alert_thresholds: HashMap::from([
                ("critical".to_string(), 10),
                ("high".to_string(), 50),
                ("medium".to_string(), 100),
            ]),
        }
    }
}
```

## 监控与告警

### 聚合指标

#### 关键指标定义
```rust
#[derive(Debug, Clone)]
pub struct ErrorAggregationMetrics {
    pub total_errors: usize,
    pub unique_error_types: usize,
    pub error_rate_per_minute: f64,
    pub top_error_types: Vec<(String, usize)>,
    pub error_trend: TrendDirection,
    pub aggregation_efficiency: f64,
}

#[derive(Debug, Clone, PartialEq)]
pub enum TrendDirection {
    Increasing,
    Decreasing,
    Stable,
}

impl ErrorAggregationMetrics {
    pub fn calculate_from_buckets(buckets: &[ErrorBucket]) -> Self {
        let total_errors: usize = buckets.iter().map(|b| b.error_count).sum();
        let unique_types: HashSet<String> = buckets.iter()
            .flat_map(|b| b.error_types.keys())
            .cloned()
            .collect();
        
        let duration_minutes = buckets.len() as f64;
        let error_rate_per_minute = total_errors as f64 / duration_minutes.max(1.0);
        
        let top_error_types = buckets.iter()
            .flat_map(|b| b.error_types.iter())
            .map(|(error_type, count)| (error_type.clone(), *count))
            .collect::<HashMap<String, usize>>()
            .into_iter()
            .map(|(error_type, total_count)| (error_type, total_count))
            .collect::<Vec<(String, usize)>>();
        
        let trend = Self::calculate_trend(buckets);
        let efficiency = Self::calculate_efficiency(buckets);
        
        Self {
            total_errors,
            unique_error_types: unique_types.len(),
            error_rate_per_minute,
            top_error_types,
            error_trend: trend,
            aggregation_efficiency: efficiency,
        }
    }
    
    fn calculate_trend(buckets: &[ErrorBucket]) -> TrendDirection {
        if buckets.len() < 2 {
            return TrendDirection::Stable;
        }
        
        let recent_errors = buckets[buckets.len()-1].error_count;
        let previous_errors = buckets[buckets.len()-2].error_count;
        
        if recent_errors > previous_errors * 12 / 10 { // 20% increase
            TrendDirection::Increasing
        } else if recent_errors < previous_errors * 8 / 10 { // 20% decrease
            TrendDirection::Decreasing
        } else {
            TrendDirection::Stable
        }
    }
    
    fn calculate_efficiency(buckets: &[ErrorBucket]) -> f64 {
        if buckets.is_empty() {
            return 0.0;
        }
        
        let total_raw_errors: usize = buckets.iter().map(|b| b.raw_error_count).sum();
        let total_aggregated_errors: usize = buckets.iter().map(|b| b.error_count).sum();
        
        if total_raw_errors == 0 {
            return 1.0;
        }
        
        total_aggregated_errors as f64 / total_raw_errors as f64
    }
}
```

#### 告警触发逻辑
```rust
pub struct AlertManager {
    config: AggregationConfig,
    active_alerts: HashMap<String, Alert>,
    notification_channels: Vec<Box<dyn NotificationChannel>>,
}

impl AlertManager {
    pub fn new(config: AggregationConfig) -> Self {
        Self {
            config,
            active_alerts: HashMap::new(),
            notification_channels: Vec::new(),
        }
    }
    
    pub fn evaluate_metrics(&mut self, metrics: &ErrorAggregationMetrics) -> Vec<Alert> {
        let mut new_alerts = Vec::new();
        
        // 检查错误率告警
        if metrics.error_rate_per_minute > 100.0 {
            let alert = Alert::new(
                "HIGH_ERROR_RATE".to_string(),
                format!("错误率过高: {:.2} 错误/分钟", metrics.error_rate_per_minute),
                AlertSeverity::High,
            );
            new_alerts.push(alert.clone());
            self.active_alerts.insert(alert.id.clone(), alert);
        }
        
        // 检查错误类型增长告警
        if metrics.unique_error_types > 50 && metrics.error_trend == TrendDirection::Increasing {
            let alert = Alert::new(
                "ERROR_TYPE_PROLIFERATION".to_string(),
                format!("错误类型数量增长: {} 种不同错误类型", metrics.unique_error_types),
                AlertSeverity::Medium,
            );
            new_alerts.push(alert.clone());
            self.active_alerts.insert(alert.id.clone(), alert);
        }
        
        // 检查特定错误类型告警
        for (error_type, count) in &metrics.top_error_types {
            if let Some(&threshold) = self.config.alert_thresholds.get(error_type) {
                if *count > threshold {
                    let alert = Alert::new(
                        format!("{}_EXCEEDED_THRESHOLD", error_type.to_uppercase()),
                        format!("错误类型 '{}' 超过阈值: {} > {}", error_type, count, threshold),
                        AlertSeverity::High,
                    );
                    new_alerts.push(alert.clone());
                    self.active_alerts.insert(alert.id.clone(), alert);
                }
            }
        }
        
        // 发送新告警
        for alert in &new_alerts {
            self.send_alert(alert);
        }
        
        new_alerts
    }
    
    fn send_alert(&self, alert: &Alert) {
        for channel in &self.notification_channels {
            if let Err(e) = channel.send(alert) {
                log::error!("Failed to send alert via channel: {:?}", e);
            }
        }
    }
}
```

## 最佳实践

### 错误转换最佳实践

1. **保持语义一致性**: 确保转换后的错误类型在业务语义上准确
2. **避免信息丢失**: 转换过程中保留所有重要的错误信息
3. **建立转换映射**: 维护明确的错误类型转换映射关系
4. **文档化转换规则**: 为每种转换编写详细的文档说明

### 上下文管理最佳实践

1. **及早绑定**: 在错误发生时尽早绑定上下文信息
2. **信息分层**: 按照重要性和使用频率组织上下文信息
3. **性能考虑**: 避免在上下文中存储过大的数据结构
4. **隐私保护**: 确保敏感信息在上下文传播中得到适当处理

### 聚合策略最佳实践

1. **合理配置窗口**: 根据业务特点设置合适的聚合时间窗口
2. **动态调整**: 根据系统负载动态调整聚合策略
3. **采样优化**: 在高并发场景下使用适当的采样策略
4. **实时监控**: 建立实时的聚合指标监控和告警机制

## 相关文档

- [错误分类体系](./01-error-classification.md) - 不同类型错误的分类方法
- [错误处理策略](./02-handling-strategies.md) - 不同类型错误的处理策略
- [错误日志规范](./05-logging-standards.md) - 标准化错误日志字段格式