Pipelines¶
Pipelines transform raw data (DataFrames, arrays) into Sample objects that models can consume. This guide covers all available pipelines and how to use them.
Overview¶
Pipelines follow a fit/transform pattern similar to scikit-learn:
pipeline = SomePipeline(param1=value1)
train_samples = pipeline.fit_transform({'df': train_df})
test_samples = pipeline.transform({'df': test_df}) # Uses fitted scalers
SummarySetPipeline¶
Extracts features from Performance Summary CSV files.
Features¶
| Feature | Description |
|---|---|
| Cumulative Charge Throughput | Total charge capacity in Ah |
| Cumulative Discharge Throughput | Total discharge capacity in Ah |
| 0.1s Resistance | Fast resistance measurement (Ohms) |
| 10s Resistance | Slow resistance measurement (Ohms) |
| Temperature (K) | Temperature in Kelvin |
| Arrhenius Factor | exp(-Ea/RT) for temperature effects |
| Inverse Temperature | 1000/T for linearization |
Usage¶
from src.pipelines.summary_set import SummarySetPipeline
pipeline = SummarySetPipeline(
include_arrhenius=True,
arrhenius_Ea=50000.0, # J/mol
normalize=True
)
samples = pipeline.fit_transform({'df': df})
Parameters¶
include_arrhenius(bool): Include Arrhenius temperature featuresarrhenius_Ea(float): Activation energy in J/mol (default: 50000.0)normalize(bool): Apply StandardScaler normalization
When to Use¶
- Fast baseline experiments
- When summary statistics are sufficient
- For initial model development
ICAPeaksPipeline¶
Extracts dQ/dV (Incremental Capacity Analysis) peak features from voltage curves.
Features¶
For each detected peak: - Peak Voltage: Position of peak (V) - Peak Height: Magnitude of dQ/dV at peak - Peak Width: Full-width at half-maximum (FWHM) - Peak Area: Integrated area under peak
Additional features: - Total Area: Total integrated dQ/dV curve - Number of Peaks: Count of detected peaks - Voltage at Max dQ/dV: Voltage at maximum dQ/dV value
Usage¶
from src.pipelines.ica_peaks import ICAPeaksPipeline
pipeline = ICAPeaksPipeline(
sg_window=51, # Savitzky-Golay window (must be odd)
sg_order=3, # Polynomial order
num_peaks=3, # Number of peaks to extract
voltage_range=(3.0, 4.2),
normalize=True,
use_cache=True # Cache expensive computations
)
samples = pipeline.fit_transform({
'curves': voltage_curves,
'targets': capacity_targets
})
Parameters¶
sg_window(int): Savitzky-Golay smoothing window (must be odd, default: 51)sg_order(int): Polynomial order for smoothing (default: 3)num_peaks(int): Number of peaks to extract features for (default: 3)voltage_range(tuple): Voltage range for analysis (default: (3.0, 4.2))resample_points(int): Points to resample curves to (default: 500)normalize(bool): Apply StandardScaler (default: True)use_cache(bool): Cache computed features (default: True)
ICA Theory¶
ICA features are highly diagnostic for degradation mechanisms:
- Peak Shifts: Indicate Loss of Lithium Inventory (LLI)
- Peak Height Changes: Indicate Loss of Active Material (LAM)
- Peak Width Changes: Indicate kinetic degradation / impedance rise
See ICA Analysis Theory for more details.
Caching¶
ICA computation is expensive. The pipeline automatically caches results:
# First run: computes and caches
samples1 = pipeline.fit_transform({'curves': curves, 'targets': targets})
# Second run: loads from cache (much faster)
samples2 = pipeline.fit_transform({'curves': curves, 'targets': targets})
Cache is invalidated if pipeline parameters change.
When to Use¶
- Degradation mechanism analysis
- When voltage curve data is available
- For interpretable features (SHAP analysis)
LatentODESequencePipeline¶
Creates time-series sequences with explicit time vectors for Neural ODE models.
Features¶
- Sequential Features: Time-series of summary statistics
- Time Vector: Explicit time values for ODE integration
- Variable Length: Supports variable-length sequences with masking
Usage¶
from src.pipelines.latent_ode_seq import LatentODESequencePipeline
pipeline = LatentODESequencePipeline(
time_unit="days", # or "throughput_Ah"
max_seq_len=50, # Maximum sequence length
normalize=True
)
# One sample per cell (entire degradation trajectory)
samples = pipeline.fit_transform({'df': df})
Parameters¶
time_unit(str): Time unit - "days" or "throughput_Ah" (default: "days")max_seq_len(int): Maximum sequence length (default: 50)normalize(bool): Apply StandardScaler (default: True)
Output Format¶
Each sample represents one cell's degradation trajectory:
sample.x.shape # (seq_len, feature_dim)
sample.t.shape # (seq_len,) - time vector
sample.mask.shape # (seq_len,) - boolean mask for valid steps
When to Use¶
- Neural ODE models
- Continuous-time degradation modeling
- When temporal dynamics are important
Creating Custom Pipelines¶
See Custom Pipeline Guide for step-by-step instructions.
Pipeline Interface¶
All pipelines must inherit from BasePipeline:
from src.pipelines.base import BasePipeline
from src.pipelines.sample import Sample
from src.pipelines.registry import PipelineRegistry
@PipelineRegistry.register("my_pipeline")
class MyPipeline(BasePipeline):
def fit(self, data: Dict[str, Any]) -> 'BasePipeline':
# Fit scalers, compute statistics, etc.
return self
def transform(self, data: Dict[str, Any]) -> List[Sample]:
# Transform to Sample objects
samples = []
# ... create samples ...
return samples
def get_feature_names(self) -> List[str]:
# Return feature names for interpretability
return ['feature1', 'feature2', ...]
Pipeline Registry¶
List available pipelines:
from src.pipelines.registry import PipelineRegistry
available = PipelineRegistry.list_available()
print(available) # ['summary_set', 'ica_peaks', 'latent_ode_seq']
Get pipeline by name:
Best Practices¶
- Always fit on training data first: Use
fit_transformon training,transformon test - Use caching for expensive pipelines: Enable
use_cache=Truefor ICA pipelines - Normalize features: Most models benefit from normalized features
- Check feature names: Use
get_feature_names()for interpretability - Validate samples: Check
sample.feature_dimandsample.seq_lenmatch expectations
Next Steps¶
- Models - Using models with pipeline outputs
- Training - Training workflows
- Custom Pipeline - Creating custom pipelines
- API Reference - Complete API documentation