Data Loading¶
This guide covers how to load data from the LG M50T dataset for use with BatteryML.
Data Structure¶
The LG M50T dataset is organized by experiment (1-5), with each experiment containing:
Raw Data/
└── Expt N - [Experiment Name]/
├── Summary Data/
│ ├── Performance Summary/
│ │ └── Cell_[ID]_[Temp]C_PerformanceSummary.csv
│ └── Ageing Sets Summary/
│ └── Cell_[ID]_AgeingSetsSummary.csv
└── Processed Timeseries Data/
└── 0.1C Voltage Curves/
└── Cell_[ID]_RPT_[N]_0.1C_Discharge.csv
Loading Summary Data¶
Basic Usage¶
from pathlib import Path
from src.data.tables import SummaryDataLoader
# Initialize loader for Experiment 5
loader = SummaryDataLoader(experiment_id=5, base_path=Path("Raw Data"))
# Load single cell
df = loader.load_performance_summary(cell_id='A', temp_C=25)
Loading Multiple Cells¶
# Define temperature mapping
temp_map = {
10: ['A', 'B', 'C'],
25: ['D', 'E'],
40: ['F', 'G', 'H']
}
# Load all cells
df = loader.load_all_cells(
cells=['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H'],
temp_map=temp_map
)
Available Methods¶
load_performance_summary(cell_id, temp_C)¶
Loads Performance Summary CSV with: - Cumulative throughput (charge/discharge) - Resistance measurements (0.1s, 10s) - Capacity measurements - Cycle counts
Returns: DataFrame with normalized units (mAh → Ah) and metadata columns
load_summary_per_cycle(cell_id)¶
Loads cycle-level summary with per-cycle metrics.
load_summary_per_set(cell_id)¶
Loads ageing set-level summary (one row per RPT measurement).
load_all_cells(cells, temp_map)¶
Convenience method to load multiple cells and combine into single DataFrame.
Unit Normalization¶
BatteryML automatically normalizes units:
- Capacity: mAh → Ah
- Temperature: °C → K (Kelvin)
- Time: Various formats → consistent units
This ensures consistency across experiments and prevents unit-related bugs.
Experiment Path Resolution¶
The ExperimentPaths class handles path resolution for different experiment naming conventions:
from src.data.expt_paths import ExperimentPaths
paths = ExperimentPaths(experiment_id=5, base_path=Path("Raw Data"))
# Get paths
perf_summary_path = paths.performance_summary(cell_id='A', temp_C=25)
voltage_curve_path = paths.voltage_curve(cell_id='A', rpt_id=1)
Supported Experiments¶
- Experiment 1: Si-based Degradation
- Experiment 2: C-based Degradation
- Experiment 3: Cathode Degradation and Li-Plating
- Experiment 4: Drive Cycle Aging (Control)
- Experiment 5: Standard Cycle Aging (Control)
Loading Voltage Curves¶
For ICA analysis, load 0.1C discharge curves:
from src.data.discovery import find_voltage_curves
# Find all voltage curves for a cell
curves = find_voltage_curves(
experiment_id=5,
cell_id='A',
base_path=Path("Raw Data")
)
# Load specific curve
import pandas as pd
curve_df = pd.read_csv(curves[0]) # First RPT
Data Validation¶
The loader performs basic validation:
- Checks file existence before loading
- Validates required columns
- Handles missing values gracefully
- Logs warnings for data quality issues
Common Issues¶
File Not Found¶
Error: FileNotFoundError: Performance summary not found
Solutions:
1. Verify data path matches expected structure
2. Check experiment ID is correct (1-5)
3. Verify cell ID and temperature match file naming convention
4. Use ExperimentPaths to debug path resolution
Missing Columns¶
Error: KeyError for expected columns
Solutions: 1. Check CSV file structure matches expected format 2. Verify column names match exactly (case-sensitive) 3. Some experiments may have different column names
Unit Mismatches¶
Issue: Values seem incorrect (e.g., capacity in thousands)
Solution: Unit normalization should handle this automatically. Check that UnitConverter is being used.
Example: Complete Data Loading Workflow¶
from pathlib import Path
from src.data.tables import SummaryDataLoader
# Setup
BASE_PATH = Path("Raw Data")
EXPERIMENT_ID = 5
CELLS = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H']
TEMP_MAP = {
10: ['A', 'B', 'C'],
25: ['D', 'E'],
40: ['F', 'G', 'H']
}
# Load data
loader = SummaryDataLoader(EXPERIMENT_ID, BASE_PATH)
df = loader.load_all_cells(cells=CELLS, temp_map=TEMP_MAP)
# Verify
print(f"Loaded {len(df)} samples")
print(f"Columns: {df.columns.tolist()}")
print(f"Cells: {df['cell_id'].unique()}")
print(f"Temperatures: {df['temperature_C'].unique()}")
Next Steps¶
- Pipelines - Transform data into features
- Splits - Split data for training/validation
- API Reference - Complete API documentation