Data API Reference¶
The data module provides utilities for discovering experimental files, resolving symbolic paths for the LG M50T dataset, and performing unit conversions (e.g., mAh to Ah, °C to K). It also implements various data splitting strategies for cross-validation and holdout testing.
expt_paths ¶
Path resolution for experiment data following Dataset.md conventions.
Classes¶
ExperimentPaths
dataclass
¶
Resolves all paths for a given experiment following Dataset.md conventions.
Handles differences between experiments: - Expt 5 uses "Cell A", Expt 1-4 use "cell A" - Different folder naming conventions
Example usage
paths = ExperimentPaths(5, Path("Raw Data")) summary_path = paths.performance_summary("A", 10) print(summary_path)
Functions¶
exists ¶
Check if the experiment directory exists.
Returns:
| Type | Description |
|---|---|
bool
|
True if directory exists |
list_all_files ¶
List all files matching pattern in experiment directory.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pattern |
str
|
Glob pattern (default: "*.csv") |
'*.csv'
|
Returns:
| Type | Description |
|---|---|
List[Path]
|
List of matching file paths |
Source code in src/data/expt_paths.py
list_available_rpts ¶
List all available RPT indices for a cell.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cell_id |
str
|
Cell identifier |
required |
curve_type |
str
|
Type of curve |
'0.1C'
|
Returns:
| Type | Description |
|---|---|
List[int]
|
Sorted list of available RPT indices |
Source code in src/data/expt_paths.py
performance_summary ¶
Performance Summary CSV (set-level health indicators).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cell_id |
str
|
Cell identifier ('A', 'B', ..., 'H') |
required |
temp_C |
int
|
Temperature in Celsius |
required |
Returns:
| Type | Description |
|---|---|
Path
|
Path to the Performance Summary CSV file |
Source code in src/data/expt_paths.py
summary_per_cycle ¶
Summary per Cycle CSV (cycle-level metrics).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cell_id |
str
|
Cell identifier |
required |
Returns:
| Type | Description |
|---|---|
Path
|
Path to the Summary per Cycle CSV file |
Source code in src/data/expt_paths.py
summary_per_set ¶
Summary per Set CSV.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cell_id |
str
|
Cell identifier |
required |
Returns:
| Type | Description |
|---|---|
Path
|
Path to the Summary per Set CSV file |
Source code in src/data/expt_paths.py
voltage_curve ¶
voltage_curve(cell_id: str, rpt: int, curve_type: str = '0.1C', direction: str = 'discharge') -> Path
Processed voltage curve CSV.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cell_id |
str
|
Cell identifier |
required |
rpt |
int
|
RPT measurement index |
required |
curve_type |
str
|
Type of curve (e.g., "0.1C") |
'0.1C'
|
direction |
str
|
"discharge" or "charge" |
'discharge'
|
Returns:
| Type | Description |
|---|---|
Path
|
Path to the voltage curve CSV file |
Source code in src/data/expt_paths.py
tables ¶
Data loaders for summary CSV files.
Classes¶
SummaryDataLoader ¶
Load and normalize summary CSV data.
Example usage
loader = SummaryDataLoader(5, Path("Raw Data")) df = loader.load_all_cells( ... cells=['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H'], ... temp_map={10: ['A', 'B', 'C'], 25: ['D', 'E'], 40: ['F', 'G', 'H']} ... )
Initialize the loader.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
experiment_id |
int
|
Experiment ID (1-5) |
required |
base_path |
Path
|
Base path to raw data |
required |
Source code in src/data/tables.py
Functions¶
get_available_cells ¶
Get list of cells with available data.
Returns:
| Type | Description |
|---|---|
List[str]
|
List of cell IDs that have data files |
Source code in src/data/tables.py
load_all_cells ¶
Load performance summary for all specified cells.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cells |
List[str]
|
List of cell IDs to load |
required |
temp_map |
Dict[int, List[str]]
|
Mapping from temperature (°C) to cell IDs |
required |
Returns:
| Type | Description |
|---|---|
DataFrame
|
Combined DataFrame with all cells |
Source code in src/data/tables.py
load_performance_summary ¶
Load Performance Summary with unit normalization.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cell_id |
str
|
Cell identifier ('A', 'B', ..., 'H') |
required |
temp_C |
int
|
Temperature in Celsius |
required |
Returns:
| Type | Description |
|---|---|
DataFrame
|
DataFrame with normalized units and metadata columns |
Raises:
| Type | Description |
|---|---|
FileNotFoundError
|
If the summary file doesn't exist |
Source code in src/data/tables.py
load_summary_per_cycle ¶
Load cycle-level summary with unit normalization.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cell_id |
str
|
Cell identifier |
required |
Returns:
| Type | Description |
|---|---|
DataFrame
|
DataFrame with cycle-level metrics |
Raises:
| Type | Description |
|---|---|
FileNotFoundError
|
If the summary file doesn't exist |
Source code in src/data/tables.py
load_summary_per_set ¶
Load set-level summary with unit normalization.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cell_id |
str
|
Cell identifier |
required |
Returns:
| Type | Description |
|---|---|
DataFrame
|
DataFrame with set-level metrics |
Raises:
| Type | Description |
|---|---|
FileNotFoundError
|
If the summary file doesn't exist |
Source code in src/data/tables.py
TimeseriesDataLoader ¶
Load voltage curve timeseries data.
Initialize the loader.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
experiment_id |
int
|
Experiment ID (1-5) |
required |
base_path |
Path
|
Base path to raw data |
required |
Source code in src/data/tables.py
Functions¶
load_all_curves ¶
load_all_curves(cell_id: str, curve_type: str = '0.1C', direction: str = 'discharge') -> Dict[int, pd.DataFrame]
Load all available voltage curves for a cell.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cell_id |
str
|
Cell identifier |
required |
curve_type |
str
|
Type of curve |
'0.1C'
|
direction |
str
|
"discharge" or "charge" |
'discharge'
|
Returns:
| Type | Description |
|---|---|
Dict[int, DataFrame]
|
Dictionary mapping RPT index to DataFrame |
Source code in src/data/tables.py
load_voltage_curve ¶
load_voltage_curve(cell_id: str, rpt: int, curve_type: str = '0.1C', direction: str = 'discharge') -> pd.DataFrame
Load a single voltage curve.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cell_id |
str
|
Cell identifier |
required |
rpt |
int
|
RPT measurement index |
required |
curve_type |
str
|
Type of curve (e.g., "0.1C") |
'0.1C'
|
direction |
str
|
"discharge" or "charge" |
'discharge'
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
DataFrame with voltage curve data |
Raises:
| Type | Description |
|---|---|
FileNotFoundError
|
If the curve file doesn't exist |
Source code in src/data/tables.py
splits ¶
Data split strategies for battery degradation experiments.
Classes¶
Functions¶
leave_one_cell_out ¶
Leave-one-cell-out split.
Use for testing generalization to unseen cells.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
samples |
List[Sample]
|
List of Sample objects |
required |
test_cell |
str
|
Cell ID to hold out for testing |
required |
Returns:
| Type | Description |
|---|---|
Tuple[List[Sample], List[Sample]]
|
Tuple of (train_samples, test_samples) |
Example
train, test = leave_one_cell_out(samples, test_cell='A')
Source code in src/data/splits.py
loco_cv_splits ¶
Generate all leave-one-cell-out cross-validation splits.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
samples |
List[Sample]
|
List of Sample objects |
required |
Returns:
| Type | Description |
|---|---|
List[Tuple[str, List[Sample], List[Sample]]]
|
List of (cell_id, train_samples, test_samples) tuples |
Example
for cell_id, train, test in loco_cv_splits(samples): ... model.fit(train) ... score = model.evaluate(test)
Source code in src/data/splits.py
random_split ¶
random_split(samples: List[Sample], train_fraction: float = 0.7, val_fraction: float = 0.15, seed: int = 42) -> Tuple[List[Sample], List[Sample], List[Sample]]
Random split with fixed seed.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
samples |
List[Sample]
|
List of Sample objects |
required |
train_fraction |
float
|
Fraction for training |
0.7
|
val_fraction |
float
|
Fraction for validation |
0.15
|
seed |
int
|
Random seed for reproducibility |
42
|
Returns:
| Type | Description |
|---|---|
Tuple[List[Sample], List[Sample], List[Sample]]
|
Tuple of (train_samples, val_samples, test_samples) |
Source code in src/data/splits.py
stratified_temperature_split ¶
stratified_temperature_split(samples: List[Sample], val_fraction: float = 0.2, seed: int = 42) -> Tuple[List[Sample], List[Sample]]
Stratified split maintaining temperature distribution.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
samples |
List[Sample]
|
List of Sample objects |
required |
val_fraction |
float
|
Fraction for validation |
0.2
|
seed |
int
|
Random seed |
42
|
Returns:
| Type | Description |
|---|---|
Tuple[List[Sample], List[Sample]]
|
Tuple of (train_samples, val_samples) |
Source code in src/data/splits.py
temperature_split ¶
temperature_split(samples: List[Sample], train_temps: List[int], val_temps: List[int]) -> Tuple[List[Sample], List[Sample]]
Split samples by temperature.
Default for Expt 5: train on [10, 40], val on [25]. Tests temperature interpolation capability.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
samples |
List[Sample]
|
List of Sample objects |
required |
train_temps |
List[int]
|
Temperatures for training (e.g., [10, 40]) |
required |
val_temps |
List[int]
|
Temperatures for validation (e.g., [25]) |
required |
Returns:
| Type | Description |
|---|---|
Tuple[List[Sample], List[Sample]]
|
Tuple of (train_samples, val_samples) |
Example
train, val = temperature_split(samples, train_temps=[10, 40], val_temps=[25])
Source code in src/data/splits.py
temporal_split ¶
temporal_split(samples: List[Sample], train_fraction: float = 0.7, val_fraction: float = 0.15) -> Tuple[List[Sample], List[Sample], List[Sample]]
Split samples temporally (early cycles for train, later for val/test).
Useful for testing extrapolation to future degradation states.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
samples |
List[Sample]
|
List of Sample objects (should have 'set_idx' or 'cycle_idx' in meta) |
required |
train_fraction |
float
|
Fraction of samples for training |
0.7
|
val_fraction |
float
|
Fraction of samples for validation |
0.15
|
Returns:
| Type | Description |
|---|---|
Tuple[List[Sample], List[Sample], List[Sample]]
|
Tuple of (train_samples, val_samples, test_samples) |
Source code in src/data/splits.py
units ¶
Centralized unit conversions for battery data.
Call these ONCE during data loading to ensure consistent internal units.
Classes¶
UnitConverter ¶
Ensures all data uses consistent units internally.
Internal units (after conversion): - Capacity: Ah (not mAh) - Current: A (not mA) - Temperature: K (for Arrhenius calculations) - Time: seconds (or days for long-term analysis) - Resistance: Ohms
Example usage
capacity_mAh = 4800 capacity_Ah = UnitConverter.mAh_to_Ah(capacity_mAh) print(capacity_Ah) # 4.8
Functions¶
A_to_mA
staticmethod
¶
Convert amps to milliamps.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
value |
Union[float, ndarray, Series]
|
Value(s) in A |
required |
Returns:
| Type | Description |
|---|---|
Union[float, ndarray, Series]
|
Value(s) in mA |
Ah_to_mAh
staticmethod
¶
Convert amp-hours to milliamp-hours.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
value |
Union[float, ndarray, Series]
|
Value(s) in Ah |
required |
Returns:
| Type | Description |
|---|---|
Union[float, ndarray, Series]
|
Value(s) in mAh |
celsius_to_kelvin
staticmethod
¶
celsius_to_kelvin(value: Union[float, np.ndarray, pd.Series]) -> Union[float, np.ndarray, pd.Series]
Convert Celsius to Kelvin.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
value |
Union[float, ndarray, Series]
|
Temperature(s) in °C |
required |
Returns:
| Type | Description |
|---|---|
Union[float, ndarray, Series]
|
Temperature(s) in K |
Source code in src/data/units.py
compute_arrhenius_factor
staticmethod
¶
compute_arrhenius_factor(temp_K: Union[float, np.ndarray], Ea: float = 50000.0) -> Union[float, np.ndarray]
Compute Arrhenius factor exp(-Ea/RT).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
temp_K |
Union[float, ndarray]
|
Temperature in Kelvin |
required |
Ea |
float
|
Activation energy in J/mol (default: 50000) |
50000.0
|
Returns:
| Type | Description |
|---|---|
Union[float, ndarray]
|
Arrhenius factor |
Source code in src/data/units.py
kelvin_to_celsius
staticmethod
¶
kelvin_to_celsius(value: Union[float, np.ndarray, pd.Series]) -> Union[float, np.ndarray, pd.Series]
Convert Kelvin to Celsius.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
value |
Union[float, ndarray, Series]
|
Temperature(s) in K |
required |
Returns:
| Type | Description |
|---|---|
Union[float, ndarray, Series]
|
Temperature(s) in °C |
Source code in src/data/units.py
mA_to_A
staticmethod
¶
Convert milliamps to amps.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
value |
Union[float, ndarray, Series]
|
Value(s) in mA |
required |
Returns:
| Type | Description |
|---|---|
Union[float, ndarray, Series]
|
Value(s) in A |
mAh_to_Ah
staticmethod
¶
Convert milliamp-hours to amp-hours.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
value |
Union[float, ndarray, Series]
|
Value(s) in mAh |
required |
Returns:
| Type | Description |
|---|---|
Union[float, ndarray, Series]
|
Value(s) in Ah |
normalize_all_capacity_columns
staticmethod
¶
Normalize all capacity-related columns to Ah.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df |
DataFrame
|
DataFrame to normalize |
required |
Returns:
| Type | Description |
|---|---|
DataFrame
|
DataFrame with all capacity columns normalized |
Source code in src/data/units.py
normalize_capacity_column
staticmethod
¶
Auto-detect mAh vs Ah and normalize to Ah.
Heuristic: if column name contains 'mA' or mean value > 100, assume mAh and convert.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df |
DataFrame
|
DataFrame containing the column |
required |
col |
str
|
Column name to normalize |
required |
Returns:
| Type | Description |
|---|---|
DataFrame
|
DataFrame with normalized column (modified copy) |
Source code in src/data/units.py
discovery ¶
File discovery utilities for experiment data.
Classes¶
Functions¶
discover_experiment_files ¶
Discover all available data files for an experiment.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
base_path |
Path
|
Base path to data |
required |
experiment_id |
int
|
Experiment ID (1-5) |
required |
Returns:
| Type | Description |
|---|---|
Dict[str, List[Path]]
|
Dictionary with keys: |
Dict[str, List[Path]]
|
|
Dict[str, List[Path]]
|
|
Dict[str, List[Path]]
|
|
Dict[str, List[Path]]
|
|
Source code in src/data/discovery.py
parse_filename_metadata ¶
Extract metadata from standardized filename.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
filename |
str
|
Filename to parse |
required |
Returns:
| Type | Description |
|---|---|
Dict[str, Any]
|
Dictionary with extracted metadata (experiment_id, cell_id, temp_C, rpt_id, etc.) |
Example
meta = parse_filename_metadata("Expt 5 - cell A (10degC) - Processed Data.csv") print(meta) # {'experiment_id': 5, 'cell_id': 'A', 'temperature_C': 10}
Source code in src/data/discovery.py
validate_data_structure ¶
Validate that expected data structure exists.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
base_path |
Path
|
Base path to data |
required |
experiment_id |
int
|
Experiment ID |
required |
Returns:
| Type | Description |
|---|---|
Dict[str, Any]
|
Dictionary with validation results: |
Dict[str, Any]
|
|
Dict[str, Any]
|
|
Dict[str, Any]
|
|
Dict[str, Any]
|
|