# TGA Dataset Operation History System - Implementation Guide
## Overview
This document describes the complete implementation of the operation history tracking system for TGADataset.
## Core Data Structures
### 1. OperationRecord
```rust
#[derive(Clone, Debug)]
pub struct OperationRecord {
pub timestamp: usize, // Sequential counter
pub operation_name: String, // Function name
pub affected_columns: AffectedColumns,
pub expr: Option<Expr>, // Polars expression if applicable
pub description: String, // Human-readable description
pub reversible: bool, // Can operation be undone?
}
```
### 2. AffectedColumns
```rust
#[derive(Clone, Debug)]
pub enum AffectedColumns {
Specific(Vec<String>), // Explicit column names
All, // All columns affected
Semantic(Vec<ColumnTypes>), // Semantic column types
}
```
### 3. ColumnHistory
```rust
#[derive(Clone, Debug)]
pub struct ColumnHistory {
pub column_name: String,
pub operations: Vec<OperationRecord>,
}
impl ColumnHistory {
pub fn reversible_count(&self) -> usize;
pub fn irreversible_count(&self) -> usize;
pub fn has_irreversible(&self) -> bool;
}
```
## TGADataset Integration
### Updated Structure
```rust
pub struct TGADataset {
pub frame: LazyFrame,
pub schema: TGASchema,
pub oneframeplot: Option<OneFramePlot>,
pub history_of_operations: Vec<OperationRecord>, // NEW
}
```
## Core Methods
### 1. Logging Operations
```rust
fn log_operation(
&mut self,
name: &str,
affected: AffectedColumns,
expr: Option<Expr>,
description: String,
reversible: bool,
)
```
**Usage Example:**
```rust
self.log_operation(
"trim_edges",
AffectedColumns::All,
None,
format!("Trimmed {} rows from left, {} from right", left, right),
false, // irreversible - data is removed
);
```
### 2. Query Operations
```rust
// Get all operations affecting a specific column
pub fn operations_on_column(&self, col: &str) -> Vec<&OperationRecord>
// Get complete history for a column
pub fn get_column_history(&self, col: &str) -> ColumnHistory
```
### 3. Display Methods
```rust
// Print all operations
pub fn print_operation_history(&self)
// Print operations for specific column
pub fn print_column_history(&self, col: &str)
```
## Operation Classification
### Reversible Operations (reversible: true)
- **Column transformations**: scale, offset, unit conversions
- **Derived columns**: rates, dimensionless values
- **Algebraic operations**: exp, ln (with new column)
- **Renaming**: column name changes
**Characteristics:**
- Original data preserved
- Can be mathematically inverted
- No data loss
### Irreversible Operations (reversible: false)
- **Row removal**: trim_edges, filter_rows, cut_interval
- **Column deletion**: drop_column
- **Data aggregation**: operations that lose information
**Characteristics:**
- Data permanently removed
- Cannot be undone
- Information loss
## Implementation Examples
### Example 1: Reversible Operation (Unit Conversion)
```rust
pub fn celsius_to_kelvin(mut self) -> Self {
let col_name = self.schema.temperature.as_ref().unwrap().clone();
let expr = (col(&col_name) + lit(273.15)).alias(&col_name);
self.frame = self.frame.with_column(expr.clone());
// Update metadata
if let Some(meta) = self.schema.columns.get_mut(&col_name) {
meta.unit = Unit::Kelvin;
meta.origin = ColumnOrigin::PolarsDerived;
}
// Log operation
self.log_operation(
"celsius_to_kelvin",
AffectedColumns::Semantic(vec![ColumnTypes::Temperature]),
Some(expr),
format!("Converted {} from Celsius to Kelvin", col_name),
true, // reversible: can subtract 273.15
);
self
}
```
### Example 2: Irreversible Operation (Trim Edges)
```rust
pub fn trim_edges(mut self, left: usize, right: usize) -> Self {
let df = self.frame.collect().unwrap();
let total = df.height();
let length = total.saturating_sub(left + right);
let sliced_df = df.slice(left as i64, length);
let frame = sliced_df.lazy();
self.log_operation(
"trim_edges",
AffectedColumns::All,
None, // No Expr - physical row removal
format!("Trimmed {} rows from left, {} from right", left, right),
false, // irreversible: data is permanently removed
);
Self {
frame,
schema: self.schema,
oneframeplot: None,
history_of_operations: self.history_of_operations,
}
}
```
### Example 3: Derived Column (Rate Calculation)
```rust
pub fn derive_rate(mut self, source_col: &str, new_col: &str) -> Result<Self, TGADomainError> {
// ... calculation logic ...
let dv = col(source_col).shift(lit(-1)) - col(source_col).shift(lit(1));
let dt = col(&time).shift(lit(-1)) - col(time).shift(lit(1));
let rate_expr = (dv.clone() / dt.clone()).alias(new_col);
self.frame = self.frame.with_column(rate_expr.clone());
// ... metadata update ...
self.log_operation(
"derive_rate",
AffectedColumns::Specific(vec![new_col.to_string()]),
Some(rate_expr),
format!("Computed rate {} from {} with unit {:?}", new_col, source_col, out_unit),
true, // reversible: new column, original preserved
);
Ok(self)
}
```
## Usage Patterns
### Pattern 1: Track Processing Pipeline
```rust
let dataset = TGADataset::from_csv("data.csv", "time", "temp", "mass")?
.trim_edges(5, 5)
.celsius_to_kelvin()
.derive_rate("mass", "dm_dt")?
.dimensionless_mass(0.0, 10.0, "alpha")?;
// View complete history
dataset.print_operation_history();
```
**Output:**
```
=== Operation History ===
[0] trim_edges - Trimmed 5 rows from left, 5 from right (reversible: false)
Columns: ALL
[1] celsius_to_kelvin - Converted temp from Celsius to Kelvin (reversible: true)
Semantic: [Temperature]
[2] derive_rate - Computed rate dm_dt from mass with unit MilligramPerSecond (reversible: true)
Columns: ["dm_dt"]
[3] dimensionless_mass - Computed dimensionless mass alpha from mass (m0=10.5) (reversible: true)
Columns: ["alpha"]
```
### Pattern 2: Column-Specific History
```rust
// Get history for specific column
dataset.print_column_history("alpha");
```
**Output:**
```
=== History for column 'alpha' ===
Total operations: 2
Reversible: 1
Irreversible: 1
[0] trim_edges - Trimmed 5 rows from left, 5 from right (reversible: false)
[3] dimensionless_mass - Computed dimensionless mass alpha from mass (m0=10.5) (reversible: true)
```
### Pattern 3: Programmatic Query
```rust
let history = dataset.get_column_history("mass");
if history.has_irreversible() {
println!("Warning: Column 'mass' has undergone irreversible operations!");
}
println!("Total transformations: {}", history.operations.len());
println!("Reversible: {}", history.reversible_count());
println!("Irreversible: {}", history.irreversible_count());
```
## Integration Checklist
### For Each Column-Modifying Function:
1. **Identify operation type**:
- Does it remove data? → `reversible: false`
- Does it transform data? → `reversible: true`
2. **Determine affected columns**:
- Specific columns? → `AffectedColumns::Specific(vec![...])`
- All columns? → `AffectedColumns::All`
- Semantic type? → `AffectedColumns::Semantic(vec![...])`
3. **Capture expression** (if applicable):
- Polars operation? → `Some(expr.clone())`
- Physical operation? → `None`
4. **Add logging call**:
```rust
self.log_operation(
"function_name",
affected_columns,
optional_expr,
"Human-readable description".to_string(),
is_reversible,
);
```
## Functions Updated with Logging
### In one_experiment_dataset.rs:
- ✅ `derive_rate0`
- ✅ `derive_rate`
- ✅ `add_numeric_column`
### In exp_kinetics_column_manipulation.rs (see exp_kinetics_column_manipulation_logging.rs):
- ✅ `with_column_expr`
- ✅ `filter_rows`
- ✅ `trim_edges`
- ✅ `trim_null_edges`
- ✅ `rename_column`
- ✅ `drop_column`
- ✅ `celsius_to_kelvin`
- ✅ `seconds_to_hours`
- ✅ `scale_columns`
- ✅ `scale_column`
- ✅ `offset_column`
- ✅ `dimensionless_mass`
- ✅ `conversion`
### Functions to Update (TODO):
- `cut_interval`, `cut_time_interval`, `cut_temperature_interval`, `cut_mass_interval`
- `trim_range`, `trim_range_inverse`
- `trim_null_edges_for_columns`
- `scale_column_in_range_by_reference`, `scale_column_in_its_range`
- `offset_column_in_range_by_reference`, `offset_column_in_its_range`
- `calibrate_mass_from_voltage`, `calibrate_mass`
- `unary_column_op`
- `exp_column`, `ln_column`
- `derive_dimensionless_mass`, `derive_conversion`
## Benefits
1. **Transparency**: Complete audit trail of all transformations
2. **Debugging**: Trace issues back to specific operations
3. **Reproducibility**: Understand exact processing pipeline
4. **Safety**: Identify irreversible operations before they happen
5. **Documentation**: Self-documenting data processing workflow
## Future Enhancements
1. **Undo/Redo**: Implement operation reversal for reversible operations
2. **Export**: Save operation history to JSON/YAML
3. **Replay**: Reconstruct processing pipeline from history
4. **Validation**: Check operation compatibility before execution
5. **Visualization**: Generate flowchart of processing pipeline