Data Issues¶
This guide covers data loading and processing issues.
Path Resolution Issues¶
Experiment Path Not Found¶
Error: FileNotFoundError: Experiment path not found
Solutions: 1. Verify experiment ID is correct (1-5):
from src.data.expt_paths import ExperimentPaths
paths = ExperimentPaths(experiment_id=5, base_path=Path("Raw Data"))
print(paths.base_dir) # Check resolved path
-
Check base path:
-
Verify experiment directory exists:
Cell ID Not Found¶
Error: FileNotFoundError: Cell A not found
Solutions: 1. List available cells:
from src.data.discovery import discover_available_cells
cells = discover_available_cells(experiment_id=5, base_path=Path("Raw Data"))
print(cells)
-
Check cell naming convention (may vary by experiment)
-
Verify cell exists in experiment directory
CSV Loading Issues¶
Encoding Errors¶
Error: UnicodeDecodeError
Solutions: 1. Specify encoding:
- Handle encoding in loader:
Missing Columns¶
Error: KeyError: 'column_name'
Solutions: 1. Check CSV structure:
-
Different experiments may have different columns
-
Handle missing columns gracefully:
Data Type Issues¶
Error: ValueError: could not convert string to float
Solutions: 1. Check data types:
- Clean data:
Unit Conversion Issues¶
Incorrect Units¶
Issue: Values seem wrong (e.g., capacity in thousands)
Solutions: 1. Verify unit conversion:
from src.data.units import UnitConverter
df_normalized = UnitConverter.normalize_all_capacity_columns(df)
-
Check original units in CSV headers
-
Verify conversion factors are correct
Temperature Conversion¶
Issue: Temperature values incorrect
Solutions: 1. Check temperature is in Celsius:
- Verify conversion to Kelvin:
Sample Creation Issues¶
Missing Metadata¶
Error: KeyError: 'cell_id' in sample.meta
Solutions: 1. Ensure metadata is added:
sample = Sample(
meta={
'cell_id': row['cell_id'],
'temperature_C': row['temperature_C'],
'experiment_id': row['experiment_id'],
},
x=features,
y=target
)
- Check DataFrame has required columns before creating samples
Feature Dimension Mismatch¶
Error: Samples have different feature dimensions
Solutions: 1. Verify feature extraction is consistent:
- Handle missing values consistently:
Split Issues¶
Empty Splits¶
Error: Split returns empty list
Solutions: 1. Check metadata values:
-
Verify split criteria match available data
-
Check for None values in metadata
Imbalanced Splits¶
Issue: One split much larger than other
Solutions: 1. Check data distribution:
from collections import Counter
temps = [s.meta['temperature_C'] for s in samples]
print(Counter(temps))
-
Consider alternative split strategies
-
Use stratification if possible
Best Practices¶
- Validate Data Early: Check data quality before processing
- Handle Missing Values: Replace NaN/inf appropriately
- Check Units: Verify unit conversions are correct
- Log Warnings: Log data quality issues
- Test with Small Data: Test pipelines on small subset first
Next Steps¶
- Common Issues - Other common problems
- Training Issues - Training-specific issues
- Data Loading Guide - Data loading documentation