aip-sci 0.1.0 - Docs.rs

# AIP系统训练方案设计

本文档详细描述AIP（情感交互编程）系统的冷启动方案、实时训练方案和训练标记方案。

---

## 1. 冷启动方案

### 1.1 问题定义

冷启动问题指新用户首次使用系统时，系统缺乏该用户的个性化数据，无法提供精准的情感交互体验。AIP系统面临两个层面的冷启动挑战：

1. **EDM冷启动**：缺乏用户行为-情绪映射数据，无法准确推断新用户的情绪状态
2. **Director冷启动**：缺乏用户-交互参数反馈数据，无法生成最优交互策略

### 1.2 冷启动策略架构

```
┌─────────────────────────────────────────────────────────────────┐
│                      冷启动决策流程                              │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  新用户 ──→ 用户画像采集 ──→ 群体匹配 ──→ 初始策略分配          │
│              │                │               │                 │
│              ↓                ↓               ↓                 │
│         静态特质         相似用户群        预训练模型             │
│         (8维)            聚类中心          参数迁移              │
│                                                                 │
│  校准期 ──→ 行为数据采集 ──→ 在线学习 ──→ 个性化模型             │
│  (2分钟)      (10秒窗口)      (增量更新)    (用户专属)           │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘
```

### 1.3 EDM冷启动方案

#### 1.3.1 预训练通用模型

**策略**：使用大规模跨用户数据训练通用EDM模型，作为新用户的初始模型。

**实现**：
- 收集N≥500用户的标注数据（行为特征 + SAM评分）
- 训练通用EDM模型，达到论文目标性能（Pearson r ≥ 0.55）
- 模型权重作为新用户的初始化参数

**数据要求**：
| 数据项 | 要求 |
|--------|------|
| 用户数量 | ≥ 500 |
| 每用户标注数 | ≥ 200个窗口 |
| 标注频率 | 每3分钟SAM评分 |
| 会话时长 | ≥ 20分钟 |

#### 1.3.2 用户特质引导初始化

**策略**：利用用户静态特质（8维）进行参数迁移。

**用户特质向量**：
```rust
struct UserTraits {
    age_group: f32,           // 年龄组 [0-1]
    gender: f32,              // 性别 [0/1]
    self_reported_style: f32, // 自我报告风格 [0-1]
    reaction_time_percentile: f32, // 反应时间百分位 [0-1]
    gaming_experience: f32,   // 游戏经验 [0-1]
    preferred_pace: f32,      // 偏好节奏 [0-1]
    challenge_preference: f32, // 挑战偏好 [0-1]
    social_orientation: f32,  // 社交倾向 [0-1]
}
```

**初始化方法**：
1. **聚类映射**：将用户特质映射到预定义的用户群体聚类
2. **参数插值**：使用聚类中心的模型参数进行加权插值
3. **迁移学习**：基于用户特质微调通用模型

#### 1.3.3 快速校准机制

**策略**：利用前2分钟的交互数据进行快速个性化校准。

**校准流程**：
```
时间轴：
├── 0-30秒：行为基线采集（中性交互参数）
├── 30-60秒：探索性交互（不同强度参数）
├── 60-90秒：情绪响应建模
└── 90-120秒：模型参数微调
```

**校准算法**：
```rust
fn calibrate_edm(
    base_model: &RogueliteEdm,
    calibration_data: &[CalibrationSample],
    user_traits: &UserTraits,
) -> RogueliteEdm {
    let mut model = base_model.clone();
    
    let learning_rate = 1e-3;
    let epochs = 10;
    
    for _ in 0..epochs {
        for sample in calibration_data {
            let predicted = model.infer(&sample.features);
            let loss = compute_calibration_loss(&predicted, &sample.target);
            model.update(&loss, learning_rate);
        }
    }
    
    model
}
```

### 1.4 Director冷启动方案

#### 1.4.1 群体策略迁移

**策略**：基于用户群体聚类，迁移相似用户的交互策略。

**群体定义**（基于预研数据）：
| 群体ID | 特征 | 典型参数 |
|--------|------|----------|
| 探索型 | 低唤醒偏好，慢节奏 | intensity=0.73, pace=0.85 |
| 战斗型 | 高唤醒偏好，快节奏 | intensity=1.42, pace=1.35 |
| 社交型 | 中等唤醒，高反馈 | feedback=1.2, rhythm=1.1 |
| 挑战型 | 高掌控感，陡峭曲线 | challenge=0.8, scarcity=0.7 |

#### 1.4.2 保守探索策略

**策略**：冷启动期使用保守的交互参数，避免极端体验。

**参数约束**：
```rust
struct ColdStartParams {
    intensity_factor: Range<f32>,     // [0.8, 1.2] (正常: [0.5, 2.0])
    feedback_intensity: Range<f32>,   // [0.7, 1.2] (正常: [0.3, 1.5])
    pace_speed: Range<f32>,           // [0.8, 1.2] (正常: [0.6, 1.8])
    reward_scarcity: Range<f32>,      // [0.3, 0.7] (正常: [0.0, 1.0])
    env_arousal: Range<f32>,          // [0.4, 0.7] (正常: [0.3, 1.0])
    rhythm_modulation: Range<f32>,    // [0.9, 1.2] (正常: [0.8, 1.5])
    challenge_curve: Range<f32>,      // [-0.3, 0.3] (正常: [-1.0, 1.0])
}
```

#### 1.4.3 渐进式个性化

**策略**：随着数据积累，逐步扩大参数空间并增加个性化程度。

**阶段划分**：
| 阶段 | 数据量 | 参数空间 | 更新频率 |
|------|--------|----------|----------|
| 冷启动期 | 0-2分钟 | 保守约束 | 实时 |
| 适应期 | 2-10分钟 | 中等约束 | 每30秒 |
| 个性化期 | 10-30分钟 | 正常空间 | 每2分钟 |
| 稳定期 | >30分钟 | 正常空间 | 每5分钟 |

### 1.5 冷启动评估指标

| 指标 | 冷启动期目标 | 稳定期目标 |
|------|-------------|-----------|
| EDM预测准确率 | Pearson r ≥ 0.45 | Pearson r ≥ 0.55 |
| 用户满意度 | ≥ 3.5/5.0 | ≥ 4.0/5.0 |
| 留存率（5分钟） | ≥ 70% | ≥ 85% |
| 情绪稳定性 | 方差 ≤ 0.15 | 方差 ≤ 0.10 |

---

## 2. 实时训练方案

### 2.1 系统架构

```
┌─────────────────────────────────────────────────────────────────────┐
│                        实时训练系统架构                              │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│  ┌──────────────┐    ┌──────────────┐    ┌──────────────┐          │
│  │   客户端      │    │   边缘节点    │    │   云端服务器  │          │
│  │  (游戏端)     │───→│  (推理+缓存)  │───→│  (训练中心)   │          │
│  └──────────────┘    └──────────────┘    └──────────────┘          │
│         │                   │                    │                  │
│         ↓                   ↓                    ↓                  │
│   行为数据采集         增量推理更新          批量模型训练            │
│   特征计算             轨迹缓存              全局模型更新            │
│   本地推理             快速响应              模型分发               │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘
```

### 2.2 数据流设计

#### 2.2.1 实时数据管道

```
用户交互 ──→ 特征提取 ──→ EDM推理 ──→ Director决策 ──→ 交互参数
    │            │            │             │              │
    ↓            ↓            ↓             ↓              ↓
原始事件     15维特征     情绪状态      7维参数        游戏响应
(触控/按键)  (10秒窗口)   (VAD)        (连续值)       (即时)
    │            │            │             │              │
    └────────────┴────────────┴─────────────┴──────────────┘
                              │
                              ↓
                        轨迹存储 ──→ 奖励计算 ──→ 训练样本
```

#### 2.2.2 数据格式定义

**行为事件**：
```json
{
  "event_id": "uuid",
  "user_id": "user_123",
  "session_id": "session_456",
  "timestamp": 1709876543210,
  "event_type": "tap|swipe|pause|resume|fail|success",
  "position": {"x": 0.5, "y": 0.3},
  "duration_ms": 150,
  "pressure": 0.8,
  "device_motion": {"pitch": 0.1, "roll": -0.05}
}
```

**轨迹数据**：
```json
{
  "trajectory_id": "traj_789",
  "user_id": "user_123",
  "session_id": "session_456",
  "steps": [
    {
      "timestamp": 1709876543210,
      "state": {
        "user_traits": [...],
        "env_state": [...],
        "emotion": {"valence": 0.6, "arousal": 0.5, "dominance": 0.7},
        "emotion_stats": {...}
      },
      "action": {
        "intensity_factor": 1.0,
        "feedback_intensity": 0.9,
        ...
      },
      "reward": 0.75
    }
  ]
}
```

### 2.3 分布式训练架构

#### 2.3.1 三层训练架构

**Layer 1: 客户端推理**
- 模型：轻量级EDM + Director
- 更新：从边缘/云端拉取模型
- 频率：每会话开始时

**Layer 2: 边缘节点增量学习**
- 模型：用户专属微调模型
- 更新：在线增量学习
- 频率：每30秒-2分钟

**Layer 3: 云端批量训练**
- 模型：全局预训练模型
- 更新：批量训练 + 模型融合
- 频率：每天/每周

#### 2.3.2 联邦学习机制

```
┌─────────────────────────────────────────────────────────────┐
│                    联邦学习训练流程                          │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  云端 ──→ 分发全局模型 ──→ 各边缘节点                        │
│    │                           │                            │
│    │                    本地训练                              │
│    │                    (用户数据)                           │
│    │                           │                            │
│    │              上传梯度/模型差分                           │
│    │                           │                            │
│    ←───────────────────────────┘                            │
│    │                                                        │
│  聚合更新 ──→ 新全局模型 ──→ 分发                           │
│                                                             │
└─────────────────────────────────────────────────────────────┘
```

**隐私保护**：
- 差分隐私：梯度添加高斯噪声
- 模型差分：仅上传模型参数变化
- 数据本地化：原始行为数据不上传

### 2.4 在线学习算法

#### 2.4.1 EDM在线更新

**增量学习策略**：
```rust
struct OnlineEdmUpdater {
    model: RogueliteEdm,
    buffer: VecDeque<TrainingSample>,
    buffer_size: usize,
    learning_rate: f32,
    update_interval: Duration,
}

impl OnlineEdmUpdater {
    fn add_sample(&mut self, sample: TrainingSample) {
        self.buffer.push_back(sample);
        if self.buffer.len() > self.buffer_size {
            self.buffer.pop_front();
        }
    }
    
    fn update(&mut self) -> Result<f32, EdmError> {
        if self.buffer.len() < MIN_SAMPLES {
            return Ok(0.0);
        }
        
        let dataset = TrainingDataset::new(self.buffer.iter().cloned().collect());
        let mut trainer = RogueliteEdmTrainer::from_model(&self.model)?;
        
        let result = trainer.train(&dataset)?;
        self.model = trainer.to_model();
        
        Ok(result.final_loss)
    }
}
```

**更新触发条件**：
| 条件 | 触发更新 |
|------|----------|
| 时间间隔 | 每2分钟 |
| 样本数量 | 累积≥50个新样本 |
| 预测误差 | 连续10个窗口误差>阈值 |
| 用户反馈 | 显式负面反馈 |

#### 2.4.2 Director在线更新

**PPO在线学习**：
```rust
struct OnlineDirectorUpdater {
    actor: ActorNetwork,
    critic: CriticNetwork,
    trajectory_buffer: Vec<Trajectory>,
    buffer_size: usize,
    update_epochs: usize,
}

impl OnlineDirectorUpdater {
    fn add_trajectory(&mut self, trajectory: Trajectory) {
        self.trajectory_buffer.push(trajectory);
        if self.trajectory_buffer.len() > self.buffer_size {
            self.trajectory_buffer.remove(0);
        }
    }
    
    fn ppo_update(&mut self) -> Result<f32, DirectorError> {
        let mut total_loss = 0.0;
        
        for _ in 0..self.update_epochs {
            for trajectory in &self.trajectory_buffer {
                let loss = self.train_ppo_epoch(trajectory)?;
                total_loss += loss;
            }
        }
        
        Ok(total_loss / (self.update_epochs * self.trajectory_buffer.len()) as f32)
    }
}
```

**奖励计算延迟处理**：
```rust
struct DelayedRewardCalculator {
    pending_steps: Vec<PendingStep>,
    retention_window: Duration,
}

struct PendingStep {
    step: TrajectoryStep,
    timestamp: Instant,
    emotion_history: Vec<EmotionState>,
}

impl DelayedRewardCalculator {
    fn compute_reward(&self, step: &PendingStep) -> f32 {
        let progress = step.compute_progress_reward();
        
        let emotion_improvement = self.compute_emotion_improvement(
            &step.emotion_history
        );
        
        let retention = self.check_retention(
            step.timestamp,
            self.retention_window
        );
        
        0.4 * progress + 0.4 * emotion_improvement + 0.2 * retention
    }
    
    fn compute_emotion_improvement(&self, history: &[EmotionState]) -> f32 {
        if history.len() < 2 {
            return 0.0;
        }
        
        let current = history.last().unwrap();
        let baseline = &history[0];
        
        cosine_similarity(
            &current.to_vec(),
            &baseline.to_vec()
        )
    }
}
```

### 2.5 模型版本管理

#### 2.5.1 版本控制策略

```
模型版本命名：aip-{component}-{version}-{timestamp}

示例：
- aip-edm-v1.0.0-20240101.safetensors
- aip-director-v1.2.3-20240215.safetensors
```

#### 2.5.2 A/B测试框架

```rust
struct ModelABTest {
    model_a: ModelVersion,
    model_b: ModelVersion,
    traffic_split: f32,
    metrics: ABTestMetrics,
}

impl ModelABTest {
    fn select_model(&self, user_id: &str) -> ModelVersion {
        let hash = compute_hash(user_id);
        if hash % 100 < (self.traffic_split * 100.0) as u64 {
            self.model_a.clone()
        } else {
            self.model_b.clone()
        }
    }
    
    fn evaluate(&self) -> ABTestResult {
        ABTestResult {
            model_a_metrics: self.metrics.compute(&self.model_a),
            model_b_metrics: self.metrics.compute(&self.model_b),
            significance: self.compute_statistical_significance(),
        }
    }
}
```

### 2.6 性能监控

#### 2.6.1 实时监控指标

| 指标类别 | 指标名 | 目标值 | 告警阈值 |
|----------|--------|--------|----------|
| 推理性能 | EDM延迟 | <5ms | >10ms |
| 推理性能 | Director延迟 | <5ms | >10ms |
| 模型质量 | EDM Pearson r | ≥0.55 | <0.45 |
| 模型质量 | Director平均奖励 | ≥0.7 | <0.5 |
| 系统稳定性 | 模型更新成功率 | ≥99% | <95% |
| 用户体验 | 情绪稳定性 | 方差≤0.1 | 方差>0.2 |

#### 2.6.2 监控仪表盘

```
┌─────────────────────────────────────────────────────────────┐
│                    AIP实时监控仪表盘                         │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  EDM性能                    Director性能                    │
│  ├── 推理延迟: 3.2ms        ├── 推理延迟: 2.8ms             │
│  ├── Pearson r: 0.58        ├── 平均奖励: 0.75              │
│  └── 日预测量: 1.2M         └── 日决策量: 1.2M              │
│                                                             │
│  用户分布                    情绪分布                        │
│  ├── 在线用户: 12,345       ├── 平均效价: 0.62              │
│  ├── 活跃会话: 8,901        ├── 平均唤醒: 0.55              │
│  └── 新用户: 234            └── 平均掌控: 0.58              │
│                                                             │
│  模型更新                    系统健康                        │
│  ├── 上次更新: 2min ago     ├── CPU: 45%                    │
│  ├── 更新成功率: 99.2%      ├── 内存: 62%                   │
│  └── 待更新用户: 1,234      └── GPU: 78%                    │
│                                                             │
└─────────────────────────────────────────────────────────────┘
```

---

## 3. 训练标记方案

### 3.1 标记数据需求

#### 3.1.1 EDM训练数据

**数据来源**：用户行为特征 + SAM自评情绪

**SAM量表设计**：
```
┌─────────────────────────────────────────────────────────────┐
│                    SAM自评量表 (每3分钟)                      │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  效价 (Valence): 你现在感觉如何？                            │
│  😟 ────────────────────────────────────────────── 😊       │
│  1    2    3    4    5    6    7    8    9                 │
│                                                             │
│  唤醒 (Arousal): 你现在有多兴奋/激动？                       │
│  😴 ────────────────────────────────────────────── 😲       │
│  1    2    3    4    5    6    7    8    9                 │
│                                                             │
│  掌控感 (Dominance): 你感觉有多大掌控力？                    │
│  😰 ────────────────────────────────────────────── 😎       │
│  1    2    3    4    5    6    7    8    9                 │
│                                                             │
└─────────────────────────────────────────────────────────────┘
```

**数据格式**：
```json
{
  "sample_id": "sample_123",
  "user_id": "user_456",
  "session_id": "session_789",
  "timestamp": 1709876543210,
  "features": {
    "0": 0.65,   // 点击频率
    "1": 0.42,   // 滑动速度
    "2": 0.15,   // 多点触控比例
    "3": 0.08,   // 设备倾斜标准差
    "4": 0.32,   // 重试延迟
    "5": 0.0,    // 暂停时长
    "6": 0.28,   // 犹豫时间
    "7": 0.75,   // 路径效率
    "8": 0.05,   // 取消率
    "9": 0.82,   // 完成率
    "10": 0.45,  // 平均压力
    "11": 0.12,  // 压力标准差
    "12": 0.68,  // 动作精度
    "13": 0.35,  // 平均反应时间
    "14": 0.08   // 反应时间标准差
  },
  "emotion": {
    "valence": 0.75,
    "arousal": 0.60,
    "dominance": 0.70
  }
}
```

#### 3.1.2 Director训练数据

**数据来源**：交互轨迹 + 奖励信号

**奖励信号组成**：
```rust
struct RewardComponents {
    progress_reward: f32,        // 任务进度奖励
    emotion_improvement: f32,    // 情绪提升奖励
    retention_signal: f32,       // 留存信号 (延迟)
}

impl RewardComponents {
    fn compute_total(&self) -> f32 {
        0.4 * self.progress_reward 
          + 0.4 * self.emotion_improvement 
          + 0.2 * self.retention_signal
    }
}
```

### 3.2 主动学习标记策略

#### 3.2.1 不确定性采样

**策略**：优先标记模型预测不确定性高的样本。

```rust
struct UncertaintySampler {
    model: RogueliteEdm,
    uncertainty_threshold: f32,
}

impl UncertaintySampler {
    fn compute_uncertainty(&self, features: &HashMap<u32, f32>) -> f32 {
        let predictions = self.model.predict_with_dropout(features, n_samples=10);
        
        let variance = compute_variance(&predictions);
        variance
    }
    
    fn should_request_label(&self, features: &HashMap<u32, f32>) -> bool {
        self.compute_uncertainty(features) > self.uncertainty_threshold
    }
}
```

#### 3.2.2 多样性采样

**策略**：选择特征空间中代表性强的样本进行标记。

```rust
struct DiversitySampler {
    labeled_features: Vec<Vec<f32>>,
    diversity_threshold: f32,
}

impl DiversitySampler {
    fn compute_diversity(&self, features: &HashMap<u32, f32>) -> f32 {
        let feature_vec = features_to_vec(features);
        
        let min_distance = self.labeled_features.iter()
            .map(|labeled| euclidean_distance(&feature_vec, labeled))
            .min_by(|a, b| a.partial_cmp(b).unwrap())
            .unwrap_or(f32::MAX);
        
        min_distance
    }
    
    fn should_request_label(&self, features: &HashMap<u32, f32>) -> bool {
        self.compute_diversity(features) > self.diversity_threshold
    }
}
```

#### 3.2.3 标记请求策略

**触发条件**：
| 条件 | 权重 | 说明 |
|------|------|------|
| 高不确定性 | 0.4 | 模型预测方差 > 阈值 |
| 高多样性 | 0.3 | 特征空间距离 > 阈值 |
| 关键时刻 | 0.2 | 失败/成功/长时间暂停后 |
| 随机采样 | 0.1 | 保持数据分布覆盖 |

**用户体验优化**：
```rust
struct LabelRequestScheduler {
    min_interval: Duration,      // 最小请求间隔 (3分钟)
    max_per_session: usize,      // 每会话最大请求数 (6次)
    adaptive_frequency: bool,    // 自适应频率调整
}

impl LabelRequestScheduler {
    fn should_request(&mut self, context: &GameContext) -> bool {
        if self.requests_this_session >= self.max_per_session {
            return false;
        }
        
        if self.last_request.elapsed() < self.min_interval {
            return false;
        }
        
        if context.is_in_intense_gameplay() {
            return false;
        }
        
        true
    }
}
```

### 3.3 隐式标记方法

#### 3.3.1 行为推断标记

**策略**：从用户行为推断情绪状态，减少显式标记需求。

**推断规则**：
| 行为模式 | 推断情绪 | 置信度 |
|----------|----------|--------|
| 快速重试 | 高唤醒，正效价 | 0.7 |
| 长时间暂停 | 低唤醒，负效价 | 0.6 |
| 频繁失败后继续 | 高唤醒，正效价 | 0.75 |
| 立即退出 | 负效价 | 0.8 |
| 分享/评价 | 正效价 | 0.85 |

**实现**：
```rust
struct ImplicitLabelInferrer {
    rules: Vec<InferenceRule>,
    confidence_threshold: f32,
}

struct InferenceRule {
    pattern: BehaviorPattern,
    inferred_emotion: EmotionState,
    confidence: f32,
}

impl ImplicitLabelInferrer {
    fn infer(&self, behavior: &BehaviorSequence) -> Option<InferredLabel> {
        for rule in &self.rules {
            if rule.pattern.matches(behavior) {
                if rule.confidence >= self.confidence_threshold {
                    return Some(InferredLabel {
                        emotion: rule.inferred_emotion,
                        confidence: rule.confidence,
                        source: LabelSource::Implicit,
                    });
                }
            }
        }
        None
    }
}
```

#### 3.3.2 游戏事件标记

**策略**：利用游戏内事件作为情绪标记代理。

**事件-情绪映射**：
```rust
struct EventEmotionMapper {
    event_mappings: HashMap<GameEvent, EmotionState>,
}

impl EventEmotionMapper {
    fn get_mapping(&self) -> HashMap<GameEvent, EmotionState> {
        HashMap::from([
            (GameEvent::LevelComplete, EmotionState::new(0.8, 0.7, 0.8)),
            (GameEvent::LevelFailed, EmotionState::new(0.3, 0.6, 0.4)),
            (GameEvent::RareReward, EmotionState::new(0.9, 0.8, 0.7)),
            (GameEvent::TutorialComplete, EmotionState::new(0.6, 0.4, 0.6)),
            (GameEvent::AchievementUnlocked, EmotionState::new(0.85, 0.65, 0.75)),
        ])
    }
}
```

### 3.4 标记质量控制

#### 3.4.1 一致性检验

**策略**：检测并过滤不一致的标记数据。

```rust
struct LabelConsistencyChecker {
    history: Vec<LabeledSample>,
    consistency_threshold: f32,
}

impl LabelConsistencyChecker {
    fn check_consistency(&self, new_sample: &LabeledSample) -> ConsistencyResult {
        let similar_samples = self.find_similar_samples(&new_sample.features);
        
        if similar_samples.is_empty() {
            return ConsistencyResult::Accept;
        }
        
        let avg_distance = similar_samples.iter()
            .map(|s| emotion_distance(&s.emotion, &new_sample.emotion))
            .average();
        
        if avg_distance > self.consistency_threshold {
            ConsistencyResult::FlagForReview
        } else {
            ConsistencyResult::Accept
        }
    }
}
```

#### 3.4.2 标注者可靠性评估

**策略**：评估用户标注的可靠性，调整数据权重。

```rust
struct AnnotatorReliability {
    user_id: String,
    total_labels: usize,
    consistency_score: f32,
    temporal_stability: f32,
}

impl AnnotatorReliability {
    fn compute_weight(&self) -> f32 {
        let consistency_weight = self.consistency_score;
        let stability_weight = self.temporal_stability;
        let volume_weight = (self.total_labels as f32 / 100.0).min(1.0);
        
        0.4 * consistency_weight + 0.4 * stability_weight + 0.2 * volume_weight
    }
}
```

### 3.5 数据标注流程

#### 3.5.1 标注流程图

```
┌─────────────────────────────────────────────────────────────────┐
│                      数据标注流程                                │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  游戏进行中                                                     │
│      │                                                          │
│      ↓                                                          │
│  ┌─────────────────┐                                            │
│  │ 行为数据采集     │                                            │
│  │ (10秒窗口)      │                                            │
│  └────────┬────────┘                                            │
│           │                                                     │
│           ↓                                                     │
│  ┌─────────────────┐     否    ┌─────────────────┐             │
│  │ 是否需要标记？   │────────→│ 继续游戏         │             │
│  └────────┬────────┘           └─────────────────┘             │
│           │ 是                                                  │
│           ↓                                                     │
│  ┌─────────────────┐                                            │
│  │ 选择标记时机     │                                            │
│  │ (非紧张时刻)     │                                            │
│  └────────┬────────┘                                            │
│           │                                                     │
│           ↓                                                     │
│  ┌─────────────────┐                                            │
│  │ 展示SAM量表      │                                            │
│  │ (简化版3题)      │                                            │
│  └────────┬────────┘                                            │
│           │                                                     │
│           ↓                                                     │
│  ┌─────────────────┐                                            │
│  │ 质量检验         │                                            │
│  │ (一致性检查)     │                                            │
│  └────────┬────────┘                                            │
│           │                                                     │
│     ┌─────┴─────┐                                               │
│     ↓           ↓                                               │
│  通过         异常                                              │
│     │           │                                               │
│     ↓           ↓                                               │
│  存储数据    标记复核                                            │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘
```

#### 3.5.2 标注时机选择

**最佳标注时机**：
| 时机 | 优先级 | 原因 |
|------|--------|------|
| 关卡完成/失败后 | 高 | 自然中断点 |
| 暂停菜单 | 高 | 用户主动暂停 |
| 商店/背包界面 | 中 | 非紧张状态 |
| 等待匹配/加载 | 中 | 空闲时间 |
| 连续游戏5分钟后 | 低 | 强制休息 |

**避免标注时机**：
- 紧张战斗中
- 时间敏感任务
- 多人交互中
- 用户明显烦躁时（快速点击退出等）

### 3.6 LLM辅助标注方案

#### 3.6.1 方案概述

利用大语言模型（LLM）对用户行为数据进行情绪推断标注，可以显著降低人工标注成本，同时保持较高的标注质量。

**核心优势**：
- **成本降低**：减少80%以上的人工标注需求
- **规模扩展**：支持海量历史数据的批量标注
- **一致性**：避免人工标注的主观偏差
- **实时性**：支持在线实时标注

#### 3.6.2 LLM标注架构

```
┌─────────────────────────────────────────────────────────────────────┐
│                    LLM辅助标注系统架构                               │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│  ┌──────────────┐    ┌──────────────┐    ┌──────────────┐          │
│  │  行为数据     │    │  Prompt构建   │    │  LLM推理      │          │
│  │  输入层       │───→│  工程         │───→│  服务         │          │
│  └──────────────┘    └──────────────┘    └──────────────┘          │
│         │                   │                    │                  │
│         ↓                   ↓                    ↓                  │
│   原始行为事件         结构化Prompt         情绪标签输出            │
│   特征统计             上下文注入           (VAD分数)               │
│   游戏状态             Few-shot示例         置信度评估              │
│                                                                     │
│  ┌──────────────┐    ┌──────────────┐    ┌──────────────┐          │
│  │  质量验证     │    │  人机协同     │    │  数据存储     │          │
│  │  层          │←───│  标注         │───→│  层          │          │
│  └──────────────┘    └──────────────┘    └──────────────┘          │
│         │                   │                    │                  │
│         ↓                   ↓                    ↓                  │
│   一致性检验          低置信度样本          训练数据集              │
│   异常检测            人工复核              增量更新                │
│   置信度过滤          主动学习              版本管理                │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘
```

#### 3.6.3 Prompt工程设计

**方案一：结构化特征输入**

```python
EMOTION_INFERENCE_PROMPT = """
你是一个专业的情感分析专家，擅长从用户游戏行为推断情绪状态。

## 任务
根据以下游戏行为数据，推断用户的情绪状态（效价、唤醒、掌控感）。

## 行为特征 (10秒窗口统计)
| 特征 | 数值 | 说明 |
|------|------|------|
| 点击频率 | {tap_freq} | 每秒点击次数，高值→高唤醒 |
| 滑动速度 | {swipe_speed} | 滑动速度，快→高唤醒 |
| 多点触控比例 | {multi_touch} | 同时触控比例，高→高唤醒 |
| 设备运动 | {device_motion} | 设备倾斜变化，大→高唤醒 |
| 重试延迟 | {retry_delay} | 失败后重试时间，短→正效价 |
| 暂停时长 | {pause_duration} | 暂停时间，长→负效价 |
| 犹豫时间 | {hesitation} | 刺激响应延迟，长→负效价 |
| 路径效率 | {path_efficiency} | 操作效率，高→高掌控 |
| 取消率 | {cancel_rate} | 取消操作比例，高→低掌控 |
| 完成率 | {completion_rate} | 任务完成率，高→正效价 |

## 游戏上下文
- 当前关卡: {level}
- 游戏时长: {play_time}分钟
- 最近事件: {recent_events}
- 用户风格: {user_style}

## 输出格式
请输出JSON格式：
{
  "valence": <0.0-1.0>,
  "arousal": <0.0-1.0>,
  "dominance": <0.0-1.0>,
  "confidence": <0.0-1.0>,
  "reasoning": "<简短解释>"
}

## Few-shot示例

示例1:
输入: 点击频率=0.8, 重试延迟=0.2, 完成率=0.9, 最近事件=关卡完成
输出: {"valence": 0.85, "arousal": 0.75, "dominance": 0.80, "confidence": 0.9, "reasoning": "高频点击和快速重试表明兴奋和积极情绪，高完成率显示掌控感强"}

示例2:
输入: 点击频率=0.3, 暂停时长=0.8, 犹豫时间=0.7, 最近事件=连续失败
输出: {"valence": 0.25, "arousal": 0.35, "dominance": 0.30, "confidence": 0.85, "reasoning": "低频交互、长暂停和犹豫表明沮丧和低唤醒状态"}

现在请分析当前数据：
"""
```

**方案二：自然语言描述输入**

```python
NARRATIVE_PROMPT = """
你是一个游戏情感分析专家。请分析以下玩家行为描述，推断其情绪状态。

## 玩家行为描述
{behavior_narrative}

## 情绪维度说明
- 效价(valence): 愉悦程度，0=非常不愉快，1=非常愉快
- 唤醒(arousal): 激动程度，0=非常平静，1=非常激动
- 掌控感(dominance): 控制感，0=完全失控，1=完全掌控

## 输出要求
输出JSON: {"valence": float, "arousal": float, "dominance": float, "confidence": float}
"""

def generate_narrative(features: dict, context: dict) -> str:
    narrative_parts = []
    
    if features['tap_freq'] > 0.7:
        narrative_parts.append("玩家正在快速点击屏幕")
    elif features['tap_freq'] < 0.3:
        narrative_parts.append("玩家点击频率较低，显得有些犹豫")
    
    if features['retry_delay'] < 0.3:
        narrative_parts.append("失败后立即重试，表现出强烈兴趣")
    elif features['retry_delay'] > 0.7:
        narrative_parts.append("失败后犹豫了很久才重试")
    
    if context.get('recent_events') == 'level_complete':
        narrative_parts.append("刚刚完成了一个关卡")
    
    return "。".join(narrative_parts)
```

**方案三：多模态时序输入**

```python
TEMPORAL_PROMPT = """
分析以下连续5个时间窗口的行为序列，推断当前情绪状态。

## 时间序列数据
| 时间窗口 | 点击频率 | 滑动速度 | 完成率 | 游戏事件 |
|----------|----------|----------|--------|----------|
| t-4 | {t4_tap} | {t4_swipe} | {t4_complete} | {t4_event} |
| t-3 | {t3_tap} | {t3_swipe} | {t3_complete} | {t3_event} |
| t-2 | {t2_tap} | {t2_swipe} | {t2_complete} | {t2_event} |
| t-1 | {t1_tap} | {t1_swipe} | {t1_complete} | {t1_event} |
| t-0 | {t0_tap} | {t0_swipe} | {t0_complete} | {t0_event} |

## 分析要点
1. 观察行为趋势（上升/下降/稳定）
2. 识别关键事件的影响
3. 综合判断当前情绪状态

输出JSON格式的情绪推断结果。
"""
```

#### 3.6.4 LLM服务选择

**推荐方案对比**：

| 模型 | 成本 | 延迟 | 质量 | 适用场景 |
|------|------|------|------|----------|
| GPT-4o | 高 | 1-2s | 最高 | 高价值样本、质量验证 |
| GPT-4o-mini | 中 | 0.5-1s | 高 | 批量标注主力 |
| Claude-3.5-Sonnet | 中 | 0.5-1s | 高 | 批量标注、推理任务 |
| Qwen-Plus | 低 | 0.3-0.5s | 中高 | 大规模批量标注 |
| 本地部署模型 | 低 | <0.1s | 中 | 实时标注、隐私敏感 |

**混合策略**：
```rust
struct LLMAnnotationStrategy {
    high_value_threshold: f32,
    confidence_threshold: f32,
}

impl LLMAnnotationStrategy {
    fn select_model(&self, sample: &BehaviorSample) -> LLMModel {
        if sample.estimated_value > self.high_value_threshold {
            LLMModel::GPT4o
        } else if sample.estimated_value > 0.5 {
            LLMModel::GPT4oMini
        } else {
            LLMModel::QwenPlus
        }
    }
}
```

#### 3.6.5 质量保证机制

**多模型投票**：
```python
class MultiModelAnnotator:
    def __init__(self, models: List[LLMModel]):
        self.models = models
    
    def annotate(self, sample: BehaviorSample) -> AnnotatedResult:
        results = []
        for model in self.models:
            result = model.infer(sample)
            results.append(result)
        
        aggregated = self.aggregate(results)
        confidence = self.compute_confidence(results)
        
        return AnnotatedResult(
            emotion=aggregated,
            confidence=confidence,
            individual_results=results
        )
    
    def aggregate(self, results: List[EmotionState]) -> EmotionState:
        return EmotionState(
            valence=np.median([r.valence for r in results]),
            arousal=np.median([r.arousal for r in results]),
            dominance=np.median([r.dominance for r in results])
        )
    
    def compute_confidence(self, results: List[EmotionState]) -> float:
        valence_std = np.std([r.valence for r in results])
        arousal_std = np.std([r.arousal for r in results])
        dominance_std = np.std([r.dominance for r in results])
        
        avg_std = (valence_std + arousal_std + dominance_std) / 3
        return 1.0 - min(avg_std * 2, 1.0)
```

**人机协同标注**：
```rust
struct HumanInLoopAnnotator {
    llm_annotator: LLMAnnotator,
    confidence_threshold: f32,
    human_queue: Vec<PendingSample>,
}

impl HumanInLoopAnnotator {
    fn annotate(&mut self, sample: BehaviorSample) -> AnnotationResult {
        let llm_result = self.llm_annotator.annotate(&sample);
        
        if llm_result.confidence < self.confidence_threshold {
            self.human_queue.push(PendingSample {
                sample,
                llm_suggestion: llm_result.clone(),
            });
            
            AnnotationResult::PendingHumanReview
        } else {
            AnnotationResult::AutoApproved(llm_result)
        }
    }
}
```

**一致性自检验**：
```python
SELF_CONSISTENCY_PROMPT = """
你之前对这个行为样本的判断是：
{previous_prediction}

请重新审视数据，确认或修正你的判断：
1. 是否有遗漏的重要特征？
2. 推理逻辑是否合理？
3. 是否需要调整预测？

如果需要调整，请说明原因并给出新的预测。
"""

def self_consistency_check(model: LLMModel, sample: BehaviorSample, 
                          initial_prediction: EmotionState) -> EmotionState:
    consistency_prompt = SELF_CONSISTENCY_PROMPT.format(
        previous_prediction=initial_prediction.to_json()
    )
    
    revised = model.infer(sample, additional_prompt=consistency_prompt)
    
    if revised.confidence > initial_prediction.confidence:
        return revised
    return initial_prediction
```

#### 3.6.6 批量标注流程

```python
class BatchAnnotator:
    def __init__(self, llm_client, batch_size=100):
        self.llm_client = llm_client
        self.batch_size = batch_size
    
    async def annotate_batch(self, samples: List[BehaviorSample]) -> List[AnnotatedSample]:
        prompts = [self.build_prompt(s) for s in samples]
        
        results = await self.llm_client.batch_infer(prompts)
        
        annotated = []
        for sample, result in zip(samples, results):
            try:
                emotion = self.parse_result(result)
                annotated.append(AnnotatedSample(
                    sample=sample,
                    emotion=emotion,
                    source=AnnotationSource.LLM
                ))
            except ParseError:
                await self.human_review_queue.put(sample)
        
        return annotated
    
    def build_prompt(self, sample: BehaviorSample) -> str:
        return EMOTION_INFERENCE_PROMPT.format(
            tap_freq=sample.features.tap_freq,
            swipe_speed=sample.features.swipe_speed,
            multi_touch=sample.features.multi_touch,
            device_motion=sample.features.device_motion,
            retry_delay=sample.features.retry_delay,
            pause_duration=sample.features.pause_duration,
            hesitation=sample.features.hesitation,
            path_efficiency=sample.features.path_efficiency,
            cancel_rate=sample.features.cancel_rate,
            completion_rate=sample.features.completion_rate,
            level=sample.context.level,
            play_time=sample.context.play_time,
            recent_events=sample.context.recent_events,
            user_style=sample.context.user_style
        )
```

#### 3.6.7 LLM标注质量评估

**评估指标**：
| 指标 | 计算方法 | 目标值 |
|------|----------|--------|
| 与人工标注一致性 | Pearson相关系数 | ≥ 0.70 |
| 标注稳定性 | 同样本多次标注方差 | ≤ 0.05 |
| 推理合理性 | 专家抽检通过率 | ≥ 85% |
| 覆盖度 | 有效标注比例 | ≥ 95% |

**评估流程**：
```python
def evaluate_llm_annotations(
    llm_annotations: List[AnnotatedSample],
    human_annotations: List[AnnotatedSample]
) -> EvaluationReport:
    matched = match_samples(llm_annotations, human_annotations)
    
    valence_r, _ = pearsonr(
        [a.emotion.valence for a in matched.llm],
        [a.emotion.valence for a in matched.human]
    )
    
    arousal_r, _ = pearsonr(
        [a.emotion.arousal for a in matched.llm],
        [a.emotion.arousal for a in matched.human]
    )
    
    dominance_r, _ = pearsonr(
        [a.emotion.dominance for a in matched.llm],
        [a.emotion.dominance for a in matched.human]
    )
    
    return EvaluationReport(
        valence_correlation=valence_r,
        arousal_correlation=arousal_r,
        dominance_correlation=dominance_r,
        overall_score=(valence_r + arousal_r + dominance_r) / 3
    )
```

#### 3.6.8 成本效益分析

**成本对比**：
| 标注方式 | 单样本成本 | 100万样本成本 | 质量(Pearson r) |
|----------|-----------|---------------|-----------------|
| 纯人工标注 | ¥0.5-1.0 | ¥50-100万 | 0.90 |
| LLM标注 | ¥0.01-0.05 | ¥1-5万 | 0.70-0.80 |
| LLM+人工复核 | ¥0.05-0.1 | ¥5-10万 | 0.85 |

**推荐策略**：
1. **冷启动期**：LLM批量标注 + 10%人工抽检验证
2. **迭代优化期**：LLM标注低置信度样本 + 人工复核
3. **稳定期**：LLM全自动标注 + 定期质量审计

#### 3.6.9 实时LLM标注服务

```rust
struct RealtimeLLMAnnotator {
    client: LLMClient,
    cache: LruCache<BehaviorFingerprint, EmotionState>,
    fallback_model: RogueliteEdm,
}

impl RealtimeLLMAnnotator {
    async fn annotate(&mut self, features: &HashMap<u32, f32>) -> EmotionState {
        let fingerprint = self.compute_fingerprint(features);
        
        if let Some(cached) = self.cache.get(&fingerprint) {
            return cached.clone();
        }
        
        match self.client.infer(features).await {
            Ok(emotion) => {
                self.cache.put(fingerprint, emotion.clone());
                emotion
            }
            Err(_) => {
                self.fallback_model.infer(features)
            }
        }
    }
}
```

### 3.7 标注数据管理

#### 3.7.1 数据存储架构

```
标注数据存储结构：

/aip-data/
├── raw/
│   ├── behavior_events/
│   │   ├── year=2024/
│   │   │   ├── month=01/
│   │   │   │   ├── day=15/
│   │   │   │   │   ├── session_001.parquet
│   │   │   │   │   └── session_002.parquet
│   ├── emotion_labels/
│   │   ├── year=2024/
│   │   │   ├── month=01/
│   │   │   │   ├── day=15/
│   │   │   │   │   ├── labels_001.parquet
├── processed/
│   ├── edm_training/
│   │   ├── train.parquet
│   │   ├── val.parquet
│   │   └── test.parquet
│   ├── director_training/
│   │   ├── trajectories.parquet
├── models/
│   ├── edm/
│   │   ├── v1.0.0/
│   │   │   ├── model.safetensors
│   │   │   ├── config.json
│   │   │   └── metrics.json
│   ├── director/
│   │   ├── v1.0.0/
```

#### 3.7.2 数据隐私与合规

**数据脱敏**：
```rust
struct DataAnonymizer {
    user_id_salt: String,
}

impl DataAnonymizer {
    fn anonymize_user_id(&self, user_id: &str) -> String {
        let mut hasher = Sha256::new();
        hasher.update(user_id.as_bytes());
        hasher.update(self.user_id_salt.as_bytes());
        format!("{:x}", hasher.finalize())
    }
    
    fn anonymize_features(&self, features: &mut HashMap<u32, f32>) {
        // 添加微小噪声保护隐私
        for value in features.values_mut() {
            *value += rand::thread_rng().gen_range(-0.01..0.01);
        }
    }
}
```

**合规要求**：
- 用户知情同意
- 数据最小化原则
- 退出机制
- 数据删除权

---

## 4. 总结

### 4.1 方案要点

| 方案 | 核心策略 | 关键技术 |
|------|----------|----------|
| 冷启动 | 预训练模型 + 快速校准 | 迁移学习、群体聚类 |
| 实时训练 | 三层架构 + 联邦学习 | 增量学习、PPO |
| 训练标记 | 主动学习 + 隐式标记 | 不确定性采样、行为推断 |
| **LLM标注** | **Prompt工程 + 人机协同** | **Few-shot学习、多模型投票** |

### 4.2 标注方案对比

| 标注方式 | 成本 | 质量(Pearson r) | 适用场景 |
|----------|------|-----------------|----------|
| 纯人工SAM | 高 | 0.90 | 金标准、小规模验证 |
| LLM批量标注 | 低 | 0.70-0.80 | 大规模历史数据 |
| LLM+人工复核 | 中 | 0.85 | 生产环境推荐 |
| 隐式行为推断 | 极低 | 0.60-0.70 | 实时补充标注 |

### 4.3 实施路线图

| 阶段 | 时间 | 目标 |
|------|------|------|
| Phase 1 | 1-2月 | LLM标注系统开发、Prompt优化 |
| Phase 2 | 2-3月 | 冷启动方案实现、LLM+人工混合标注 |
| Phase 3 | 3-4月 | 实时训练系统部署、A/B测试 |
| Phase 4 | 4-6月 | 全量上线、LLM自动化标注 |

### 4.4 风险与缓解

| 风险 | 影响 | 缓解措施 |
|------|------|----------|
| 标注数据不足 | 模型性能差 | **LLM批量标注**、主动学习 |
| LLM标注偏差 | 数据质量下降 | 多模型投票、人工抽检 |
| 冷启动体验差 | 用户流失 | 保守策略、快速校准 |
| 实时训练延迟 | 体验不一致 | 边缘计算、异步更新 |
| 隐私合规问题 | 法律风险 | 联邦学习、数据脱敏 |
| LLM成本过高 | 预算超支 | 模型分层策略、缓存优化 |