Code Structure¶
This document explains the organization of the BatteryML codebase.
Directory Structure¶
battery-ml/
├── src/ # Source code
│ ├── data/ # Data loading and processing
│ │ ├── expt_paths.py # Experiment path resolution
│ │ ├── tables.py # CSV loaders
│ │ ├── splits.py # Data splitting strategies
│ │ ├── units.py # Unit conversions
│ │ └── discovery.py # File discovery utilities
│ │
│ ├── pipelines/ # Feature extraction pipelines
│ │ ├── base.py # BasePipeline ABC
│ │ ├── sample.py # Sample dataclass
│ │ ├── registry.py # PipelineRegistry
│ │ ├── cache.py # Hash-based caching
│ │ ├── summary_set.py # Summary features pipeline
│ │ ├── ica_peaks.py # ICA feature extraction
│ │ └── latent_ode_seq.py # Sequence pipeline
│ │
│ ├── models/ # ML models
│ │ ├── base.py # BaseModel ABC
│ │ ├── registry.py # ModelRegistry
│ │ ├── lgbm.py # LightGBM wrapper
│ │ ├── mlp.py # MLP model
│ │ ├── lstm_attn.py # LSTM + Attention
│ │ └── neural_ode.py # Neural ODE
│ │
│ ├── training/ # Training infrastructure
│ │ ├── trainer.py # Training loop
│ │ ├── metrics.py # Evaluation metrics
│ │ ├── losses.py # Loss functions
│ │ └── callbacks.py # Training callbacks
│ │
│ ├── tracking/ # Experiment tracking
│ │ ├── base.py # BaseTracker ABC
│ │ ├── local.py # Local file tracking
│ │ ├── mlflow_tracker.py # MLflow integration
│ │ └── dual_tracker.py # Combined tracking
│ │
│ ├── explainability/ # Interpretability
│ │ ├── shap_analysis.py # SHAP analysis
│ │ └── attention_viz.py # Attention visualization
│ │
│ └── config_schema.py # Pydantic config schemas
│
├── tests/ # Test suite
│ ├── conftest.py # Shared fixtures
│ ├── test_pipelines.py
│ ├── test_models.py
│ └── test_cache.py
│
├── examples/ # Example scripts
│ ├── milestone_a.py
│ ├── milestone_b.py
│ └── milestone_c.py
│
├── configs/ # Hydra configurations
│ ├── config.yaml
│ ├── data/
│ ├── pipeline/
│ ├── model/
│ ├── split/
│ └── tracking/
│
└── docs/ # Documentation
├── mkdocs.yml
└── ... (documentation files)
Module Organization¶
Data Module (src/data/)¶
Purpose: Load and preprocess raw data
expt_paths.py: Resolve file paths for different experimentstables.py: Load CSV files with unit normalizationsplits.py: Data splitting strategiesunits.py: Unit conversion utilitiesdiscovery.py: File discovery and validation
Pipelines Module (src/pipelines/)¶
Purpose: Transform raw data to Sample objects
base.py: Abstract base class for all pipelinessample.py: Universal Sample dataclassregistry.py: Pipeline registration systemcache.py: Hash-based caching for expensive computationssummary_set.py: Summary statistics featuresica_peaks.py: ICA peak extractionlatent_ode_seq.py: Time-series sequences for ODEs
Models Module (src/models/)¶
Purpose: Machine learning models
base.py: Abstract base class for neural modelsregistry.py: Model registration systemlgbm.py: LightGBM gradient boostingmlp.py: Multi-layer perceptronlstm_attn.py: LSTM with self-attentionneural_ode.py: Neural ODE for continuous-time modeling
Training Module (src/training/)¶
Purpose: Training infrastructure
trainer.py: Training loop with AMP, early stoppingmetrics.py: Evaluation metrics (RMSE, MAE, MAPE, R²)losses.py: Loss functionscallbacks.py: Training callbacks
Tracking Module (src/tracking/)¶
Purpose: Experiment tracking
base.py: Abstract base class for trackerslocal.py: Local file + TensorBoard trackingmlflow_tracker.py: MLflow integrationdual_tracker.py: Combined local + MLflow
Explainability Module (src/explainability/)¶
Purpose: Model interpretability
shap_analysis.py: SHAP value computation and visualizationattention_viz.py: Attention weight visualization
Design Patterns¶
Registry Pattern¶
Used in:
- PipelineRegistry (src/pipelines/registry.py)
- ModelRegistry (src/models/registry.py)
Purpose: Enable plugin-like extensibility
Strategy Pattern¶
Used in:
- Split strategies (src/data/splits.py)
- Models (src/models/)
- Pipelines (src/pipelines/)
Purpose: Interchangeable algorithms
Template Method Pattern¶
Used in:
- BasePipeline.fit_transform() (src/pipelines/base.py)
- BaseModel.predict() (src/models/base.py)
Purpose: Define algorithm structure, allow customization
Adding New Code¶
Where to Add¶
- New pipeline:
src/pipelines/your_pipeline.py - New model:
src/models/your_model.py - New split:
src/data/splits.py(add function) - New metric:
src/training/metrics.py(add function) - New tracker:
src/tracking/your_tracker.py
Import Conventions¶
# Standard library
import os
from pathlib import Path
from typing import List, Dict, Optional
# Third-party
import numpy as np
import pandas as pd
import torch
import torch.nn as nn
# Local imports
from .base import BasePipeline
from .sample import Sample
from .registry import PipelineRegistry
Code Style¶
- Type hints: Use type hints for all function signatures
- Docstrings: Google-style docstrings for all public functions/classes
- Line length: Maximum 100 characters
- Naming:
- Classes:
PascalCase - Functions/variables:
snake_case - Constants:
UPPER_SNAKE_CASE
Next Steps¶
- Contributing Overview - Contribution workflow
- Adding Pipelines - Add new pipeline
- Adding Models - Add new model