Feature Extraction
The TimeSeriesFeatureExtractor class provides comprehensive feature extraction from time series data, extracting 74+ features across multiple categories for machine learning applications.
Overview
The feature extractor is designed to capture the essential characteristics of time series data that are relevant for Hurst exponent estimation. It provides features in several categories:
Statistical Features: Basic statistical measures
Spectral Features: Frequency domain characteristics
Wavelet Features: Time-frequency analysis
Fractal Features: Self-similarity measures
Biomedical Features: Domain-specific characteristics
Basic Usage
from neurological_lrd_analysis import TimeSeriesFeatureExtractor
import numpy as np
# Create feature extractor
extractor = TimeSeriesFeatureExtractor()
# Generate sample data
data = np.random.randn(1000)
# Extract features
features = extractor.extract_features(data)
print(f"Extracted {len(features)} features")
# Display feature names and values
for name, value in features.items():
print(f"{name}: {value:.4f}")
Feature Categories
Statistical Features
Basic statistical measures of the time series:
Mean: Average value
Variance: Measure of spread
Skewness: Measure of asymmetry
Kurtosis: Measure of tail heaviness
Percentiles: 25th, 50th, 75th, 90th, 95th percentiles
Range: Difference between max and min
Interquartile Range: Difference between 75th and 25th percentiles
Autocorrelation Features
Autocorrelation at various lags:
Autocorrelation at lag 1: First-order autocorrelation
Autocorrelation at lag 2: Second-order autocorrelation
Autocorrelation at lag 5: Fifth-order autocorrelation
Autocorrelation at lag 10: Tenth-order autocorrelation
Spectral Features
Frequency domain characteristics:
Spectral Centroid: Center of mass of the spectrum
Spectral Bandwidth: Measure of spectral spread
Spectral Rolloff: Frequency below which 85% of energy lies
Spectral Flatness: Measure of noisiness
Zero Crossing Rate: Rate of sign changes
Frequency Band Power
Power in different frequency bands (biomedical relevance):
Delta Power: 0.5-4 Hz (deep sleep, unconsciousness)
Theta Power: 4-8 Hz (light sleep, meditation)
Alpha Power: 8-13 Hz (relaxed wakefulness)
Beta Power: 13-30 Hz (active concentration)
Gamma Power: 30-100 Hz (high-level cognitive processing)
Power Ratios
Relative power in different bands:
Delta Ratio: Delta power / Total power
Theta Ratio: Theta power / Total power
Alpha Ratio: Alpha power / Total power
Beta Ratio: Beta power / Total power
Gamma Ratio: Gamma power / Total power
Wavelet Features
Time-frequency analysis using wavelets:
Wavelet Energy: Energy at different scales
Wavelet Entropy: Measure of complexity
Wavelet Complexity: Measure of irregularity
Multiresolution Analysis: Features at multiple scales
Fractal Features
Self-similarity and fractal characteristics:
Detrended Fluctuation Analysis (DFA): Long-range correlations
Higuchi Fractal Dimension: Measure of complexity
Generalized Hurst Exponent: Multifractal analysis
Sample Entropy: Measure of regularity
Biomedical Features
Domain-specific features for biomedical signals:
EEG Features: Electrode characteristics, brain activity patterns
ECG Features: Heart rate variability, cardiac rhythms
Respiratory Features: Breathing patterns, respiratory variability
Advanced Usage
Custom Feature Extraction
# Extract specific feature categories
extractor = TimeSeriesFeatureExtractor()
# Extract only statistical features
statistical_features = extractor.extract_statistical_features(data)
# Extract only spectral features
spectral_features = extractor.extract_spectral_features(data)
# Extract only wavelet features
wavelet_features = extractor.extract_wavelet_features(data)
Batch Processing
# Extract features from multiple time series
data_list = [np.random.randn(1000) for _ in range(10)]
all_features = []
for data in data_list:
features = extractor.extract_features(data)
all_features.append(list(features.values()))
# Convert to numpy array for ML training
X = np.array(all_features)
print(f"Feature matrix shape: {X.shape}")
Feature Selection
# Get feature names for selection
feature_names = list(extractor.extract_features(data).keys())
print(f"Available features: {len(feature_names)}")
# Select specific features
selected_features = ['mean', 'variance', 'skewness', 'kurtosis']
selected_indices = [feature_names.index(f) for f in selected_features]
# Extract only selected features
X_selected = X[:, selected_indices]
print(f"Selected features shape: {X_selected.shape}")
Performance Considerations
Memory Usage
The feature extractor is designed to be memory-efficient:
Features are computed on-demand
No unnecessary data is stored
Efficient algorithms for large datasets
Computation Time
Feature extraction time scales with data length:
Short series (< 1000 points): ~1-10ms
Medium series (1000-5000 points): ~10-100ms
Long series (> 5000 points): ~100ms-1s
Optimization Tips
For best performance:
Use appropriate data length: 1000-2000 points is optimal
Batch processing: Extract features for multiple series at once
Feature selection: Use only relevant features for your application
Memory management: Process data in chunks for very large datasets
API Reference
TimeSeriesFeatureExtractor
- class neurological_lrd_analysis.TimeSeriesFeatureExtractor(include_spectral=True, include_wavelet=True, include_fractal=True, include_biomedical=True, sampling_rate=250.0)[source]
Bases:
objectComprehensive feature extractor for time series data.
Extracts statistical, spectral, wavelet, fractal, and biomedical-specific features that are relevant for Hurst exponent estimation.
- __init__(include_spectral=True, include_wavelet=True, include_fractal=True, include_biomedical=True, sampling_rate=250.0)[source]
Initialize the feature extractor.
Parameters:
- include_spectralbool
Whether to include spectral features
- include_waveletbool
Whether to include wavelet features
- include_fractalbool
Whether to include fractal features
- include_biomedicalbool
Whether to include biomedical-specific features
- sampling_ratefloat
Sampling rate for biomedical feature extraction
Methods
- TimeSeriesFeatureExtractor.extract_features(data, true_hurst=None)[source]
Extract comprehensive features from time series data.
- Return type:
FeatureSet
Parameters:
- datanp.ndarray
Time series data
- true_hurstfloat, optional
True Hurst exponent (for validation)
Returns:
: FeatureSet
Extracted features
- neurological_lrd_analysis.ml_baselines.extract_statistical_features(data)[source]
Extract statistical features from time series data.
Parameters:
- datanp.ndarray
Time series data
Returns:
: Dict[str, float]
Statistical features
- neurological_lrd_analysis.ml_baselines.extract_spectral_features(data, scipy_signal, fft, fftfreq)[source]
Extract spectral features from time series data.
Parameters:
- datanp.ndarray
Time series data
- scipy_signalmodule
scipy.signal module
- fftfunction
FFT function
- fftfreqfunction
FFT frequency function
Returns:
: Dict[str, float]
Spectral features
- neurological_lrd_analysis.ml_baselines.extract_wavelet_features(data, pywt)[source]
Extract wavelet features from time series data.
Parameters:
- datanp.ndarray
Time series data
- pywtmodule
PyWavelets module
Returns:
: Dict[str, float]
Wavelet features
- neurological_lrd_analysis.ml_baselines.extract_fractal_features(data)[source]
Extract fractal features from time series data.
Parameters:
- datanp.ndarray
Time series data
Returns:
: Dict[str, float]
Fractal features
Examples
Complete Feature Extraction Pipeline
from neurological_lrd_analysis import TimeSeriesFeatureExtractor, fbm_davies_harte
import numpy as np
# Generate sample data
data = fbm_davies_harte(1000, 0.7, seed=42)
# Create feature extractor
extractor = TimeSeriesFeatureExtractor()
# Extract all features
features = extractor.extract_features(data)
# Display feature summary
print(f"Extracted {len(features)} features")
print(f"Feature categories:")
print(f" Statistical: {len([f for f in features.keys() if 'statistical' in f])}")
print(f" Spectral: {len([f for f in features.keys() if 'spectral' in f])}")
print(f" Wavelet: {len([f for f in features.keys() if 'wavelet' in f])}")
print(f" Fractal: {len([f for f in features.keys() if 'fractal' in f])}")
# Show top features by value
sorted_features = sorted(features.items(), key=lambda x: abs(x[1]), reverse=True)
print(f"\nTop 10 features by absolute value:")
for name, value in sorted_features[:10]:
print(f" {name}: {value:.4f}")
Feature Analysis for ML
# Analyze feature importance for ML
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
# Generate training data
X_train = []
y_train = []
for hurst in [0.3, 0.5, 0.7, 0.9]:
for _ in range(20):
data = fbm_davies_harte(1000, hurst, seed=np.random.randint(0, 10000))
features = extractor.extract_features(data)
X_train.append(list(features.values()))
y_train.append(hurst)
X_train = np.array(X_train)
y_train = np.array(y_train)
# Train model
rf = RandomForestRegressor(n_estimators=100, random_state=42)
rf.fit(X_train, y_train)
# Get feature importance
feature_names = list(features.keys())
importance = rf.feature_importances_
# Show most important features
importance_pairs = list(zip(feature_names, importance))
importance_pairs.sort(key=lambda x: x[1], reverse=True)
print("Most important features for Hurst estimation:")
for name, imp in importance_pairs[:10]:
print(f" {name}: {imp:.4f}")
Troubleshooting
Common Issues
Memory Errors - Reduce data length or process in chunks - Use feature selection to reduce dimensionality
Computation Time - Use shorter time series for real-time applications - Consider using only essential features
Feature Quality - Ensure data is properly preprocessed - Check for NaN or infinite values - Use appropriate data length (1000-2000 points recommended)
Best Practices
Data Preprocessing: Clean and normalize data before feature extraction
Feature Selection: Use domain knowledge to select relevant features
Validation: Always validate features on test data
Documentation: Keep track of which features are used in your models