Performance Optimization¶
This guide covers performance optimization tips for BatteryML.
Data Loading Optimization¶
Use Caching¶
Enable caching for expensive computations:
Batch Data Loading¶
Load data in batches for large datasets:
def load_in_batches(loader, batch_size=100):
"""Load data in batches."""
all_data = []
for i in range(0, len(cells), batch_size):
batch_cells = cells[i:i+batch_size]
batch_data = loader.load_cells(batch_cells)
all_data.append(batch_data)
return pd.concat(all_data)
Training Optimization¶
Use GPU¶
Always use GPU when available:
Mixed Precision Training¶
Enable AMP for faster GPU training:
Optimize Batch Size¶
Find optimal batch size:
# Start with 32, increase if memory allows
batch_sizes = [16, 32, 64, 128]
for bs in batch_sizes:
try:
config = {'batch_size': bs}
# Train and measure time
except RuntimeError: # Out of memory
break
DataLoader Workers¶
Use multiple workers for data loading:
Model Optimization¶
Choose Right Model¶
- LightGBM: Fastest for tabular data
- MLP: Fast neural baseline
- LSTM: Slower, for sequences
- Neural ODE: Slowest, for continuous-time
Model Pruning¶
Reduce model size:
# Smaller hidden dimensions
model = MLPModel(input_dim=15, hidden_dims=[32, 16]) # Instead of [64, 32]
Quantization (Future)¶
Post-training quantization can speed up inference:
# Quantize model (if supported)
quantized_model = torch.quantization.quantize_dynamic(
model, {torch.nn.Linear}, dtype=torch.qint8
)
Pipeline Optimization¶
Vectorize Operations¶
Use vectorized NumPy operations:
# Good: Vectorized
features = df[['col1', 'col2']].values
# Bad: Loop
features = []
for _, row in df.iterrows():
features.append([row['col1'], row['col2']])
Avoid Redundant Computations¶
Cache intermediate results:
Memory Optimization¶
Gradient Checkpointing¶
For very large models:
# Trade compute for memory
from torch.utils.checkpoint import checkpoint
output = checkpoint(model, x)
Clear Cache¶
Clear GPU cache if needed:
Profiling¶
Profile Training¶
Identify bottlenecks:
import torch.profiler as profiler
with profiler.profile(
activities=[profiler.ProfilerActivity.CPU, profiler.ProfilerActivity.CUDA],
record_shapes=True,
) as prof:
trainer.fit(train_samples, val_samples)
print(prof.key_averages().table(sort_by="cuda_time_total"))
Time Operations¶
import time
start = time.time()
# Your operation
duration = time.time() - start
print(f"Operation took {duration:.2f}s")
Best Practices¶
- Profile First: Identify bottlenecks before optimizing
- Use GPU: Always use GPU when available
- Enable AMP: Use mixed precision for GPU training
- Cache Expensive Operations: Cache ICA and other expensive computations
- Batch Operations: Process data in batches
- Choose Right Model: Use fastest model that meets accuracy requirements
Benchmarking¶
Compare Configurations¶
configs = [
{'batch_size': 16, 'use_amp': False},
{'batch_size': 32, 'use_amp': False},
{'batch_size': 32, 'use_amp': True},
]
for config in configs:
start = time.time()
trainer = Trainer(model, config, tracker)
trainer.fit(train_samples, val_samples)
duration = time.time() - start
print(f"Config {config}: {duration:.2f}s")
Next Steps¶
- Training Issues - Training problems
- Common Issues - Other issues
- Training Guide - Training documentation