Pipeline System¶
The pipeline system transforms raw data into Sample objects that models can consume.
Pipeline Architecture¶
graph TB
RawData[Raw Data] --> BasePipeline[BasePipeline]
BasePipeline -->|fit| FitState[Fitted State]
BasePipeline -->|transform| Sample[Sample Objects]
FitState -->|used by| Transform
Registry[PipelineRegistry] -->|manages| BasePipeline
Cache[PipelineCache] -->|caches| BasePipeline
Pipeline Interface¶
All pipelines inherit from BasePipeline:
classDiagram
class BasePipeline {
+fit(data) BasePipeline
+transform(data) List[Sample]
+fit_transform(data) List[Sample]
+get_feature_names() List[str]
+get_params() dict
}
class SummarySetPipeline {
+include_arrhenius: bool
+arrhenius_Ea: float
+normalize: bool
}
class ICAPeaksPipeline {
+sg_window: int
+num_peaks: int
+use_cache: bool
}
class LatentODESequencePipeline {
+time_unit: str
+max_seq_len: int
}
BasePipeline <|-- SummarySetPipeline
BasePipeline <|-- ICAPeaksPipeline
BasePipeline <|-- LatentODESequencePipeline
Pipeline Lifecycle¶
sequenceDiagram
participant User
participant Pipeline
participant Scaler
participant Cache
participant Sample
User->>Pipeline: fit_transform(train_data)
Pipeline->>Scaler: fit(train_data)
Scaler-->>Pipeline: Fitted scaler
Pipeline->>Cache: Check cache
Cache-->>Pipeline: Cache miss
Pipeline->>Sample: Create samples
Sample-->>Pipeline: Sample objects
Pipeline->>Cache: Save to cache
Pipeline-->>User: Samples
User->>Pipeline: transform(test_data)
Pipeline->>Scaler: transform(test_data)
Scaler-->>Pipeline: Scaled data
Pipeline->>Sample: Create samples
Sample-->>User: Samples
Sample Creation¶
flowchart TD
Row[DataFrame Row] --> Extract[Extract Features]
Extract --> Features[Feature Array]
Features --> Normalize{Normalize?}
Normalize -->|Yes| Scaler[Apply Scaler]
Normalize -->|No| Sample
Scaler --> Sample[Create Sample]
Sample --> Meta[Add Metadata]
Meta --> Return[Return Sample]
Caching Strategy¶
flowchart TD
Request[Pipeline Request] --> Params[Get Parameters]
Params --> Hash[Hash Parameters]
Hash --> Key[Cache Key]
Key --> Check{File Exists?}
Check -->|Yes| Load[Load Pickle]
Check -->|No| Compute[Compute]
Load --> Validate{Valid?}
Validate -->|Yes| Return[Return]
Validate -->|No| Compute
Compute --> Save[Save Pickle]
Save --> Return
Feature Extraction Flow¶
SummarySetPipeline¶
flowchart TD
DataFrame[DataFrame] --> Iterate[Iterate Rows]
Iterate --> Extract[Extract Features]
Extract --> Throughput[Cumulative Throughput]
Extract --> Resistance[Resistance Values]
Extract --> Temperature[Temperature]
Throughput --> Combine[Combine Features]
Resistance --> Combine
Temperature --> Combine
Combine --> Arrhenius{Include Arrhenius?}
Arrhenius -->|Yes| ComputeArrhenius[Compute Arrhenius Factor]
Arrhenius -->|No| Normalize
ComputeArrhenius --> Normalize[Normalize Features]
Normalize --> Sample[Create Sample]
ICAPeaksPipeline¶
flowchart TD
Curves[Voltage Curves] --> Cache{Check Cache}
Cache -->|Hit| Load[Load Cached]
Cache -->|Miss| ComputeICA[Compute ICA]
ComputeICA --> Smooth[Savitzky-Golay Smoothing]
Smooth --> Peaks[Find Peaks]
Peaks --> Extract[Extract Peak Features]
Extract --> Save[Save to Cache]
Load --> Features[Feature Vector]
Save --> Features
Features --> Normalize[Normalize]
Normalize --> Sample[Create Sample]
Pipeline Registration¶
sequenceDiagram
participant Developer
participant PipelineClass
participant Registry
participant User
Developer->>PipelineClass: @PipelineRegistry.register("name")
PipelineClass->>Registry: Register class
Registry->>Registry: Store in _pipelines dict
User->>Registry: get("name", **kwargs)
Registry->>PipelineClass: Instantiate
PipelineClass-->>User: Pipeline instance
Error Handling¶
flowchart TD
Transform[Transform] --> Validate[Validate Input]
Validate -->|Invalid| Error[Raise Error]
Validate -->|Valid| Process[Process Data]
Process --> Check{Check Errors}
Check -->|Error| Handle[Handle Error]
Check -->|OK| Return[Return Samples]
Handle -->|Recoverable| Retry[Retry]
Handle -->|Fatal| Error
Retry --> Process
Next Steps¶
- Model System - Model architecture
- Custom Pipeline - Create custom pipeline
- API Reference - Complete API docs