Machine Learning Baselines
The neurological LRD analysis library includes comprehensive machine learning baselines for Hurst exponent estimation, providing state-of-the-art performance with fast inference capabilities.
Overview
The ML baselines system provides:
74+ Feature Extraction: Comprehensive feature engineering for time series data
Multiple ML Models: Random Forest, SVR, Gradient Boosting, and Ensemble methods
Hyperparameter Optimization: Automated tuning with Optuna
Pretrained Models: Fast inference with pre-trained models
Real-time Performance: 10-50ms prediction times
Quick Start
from neurological_lrd_analysis import (
create_pretrained_suite, quick_predict, quick_ensemble_predict
)
# Create pretrained models (one-time setup)
create_pretrained_suite("pretrained_models", force_retrain=True)
# Fast ML prediction
hurst_ml = quick_predict(your_time_series, "pretrained_models", "random_forest")
# Ensemble prediction (best accuracy)
hurst_ensemble, uncertainty = quick_ensemble_predict(your_time_series, "pretrained_models")
Feature Extraction
The TimeSeriesFeatureExtractor class provides comprehensive feature extraction from time series data:
from neurological_lrd_analysis import TimeSeriesFeatureExtractor
# Create feature extractor
extractor = TimeSeriesFeatureExtractor()
# Extract features
features = extractor.extract_features(time_series_data)
print(f"Extracted {len(features)} features")
Feature Categories
The feature extractor provides features in several categories:
Statistical Features - Mean, variance, skewness, kurtosis - Percentiles, quartiles, range - Autocorrelation at various lags
Spectral Features - Power spectral density - Spectral centroid, bandwidth, rolloff - Frequency band power ratios (delta, theta, alpha, beta, gamma)
Wavelet Features - Wavelet energy at multiple scales - Wavelet entropy and complexity - Multiresolution analysis
Fractal Features - Detrended Fluctuation Analysis (DFA) - Higuchi fractal dimension - Generalized Hurst exponent
Biomedical Features - EEG-specific features (electrode characteristics) - ECG-specific features (heart rate variability) - Respiratory features (breathing patterns)
ML Estimators
The library provides several ML estimators for Hurst exponent estimation:
Random Forest Estimator
from neurological_lrd_analysis import RandomForestEstimator
# Create estimator
estimator = RandomForestEstimator()
# Train model
result = estimator.train(X_train, y_train, validation_split=0.2)
# Make predictions
predictions = estimator.predict(X_test)
# Get feature importance
importance = estimator.get_feature_importance()
SVR Estimator
from neurological_lrd_analysis import SVREstimator
# Create estimator
estimator = SVREstimator()
# Train model
result = estimator.train(X_train, y_train, validation_split=0.2)
# Make predictions
predictions = estimator.predict(X_test)
Gradient Boosting Estimator
from neurological_lrd_analysis import GradientBoostingEstimator
# Create estimator
estimator = GradientBoostingEstimator()
# Train model
result = estimator.train(X_train, y_train, validation_split=0.2)
# Make predictions
predictions = estimator.predict(X_test)
Hyperparameter Optimization
The library integrates with Optuna for automated hyperparameter optimization:
from neurological_lrd_analysis import (
OptunaOptimizer, create_optuna_study, optimize_hyperparameters
)
# Create optimization study
study = create_optuna_study(
model_type="random_forest",
X_train=X_train,
y_train=y_train,
n_trials=100
)
# Run optimization
best_params = optimize_hyperparameters(
model_type="random_forest",
X_train=X_train,
y_train=y_train,
n_trials=100
)
print(f"Best parameters: {best_params}")
Pretrained Models
The pretrained model system provides efficient model management and fast inference:
Model Management
from neurological_lrd_analysis import PretrainedModelManager, TrainingConfig, MLBaselineType
# Create model manager
manager = PretrainedModelManager("models_directory")
# Create training data
X, y, training_info = manager.create_training_data(
hurst_values=[0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9],
lengths=[500, 1000, 2000],
generators=['fbm', 'fgn', 'arfima'],
contaminations=['none', 'noise', 'missing'],
biomedical_scenarios=['eeg', 'ecg', 'respiratory']
)
# Create training configurations
configs = [
TrainingConfig(
model_type=MLBaselineType.RANDOM_FOREST,
hyperparameters={'n_estimators': 100, 'random_state': 42},
description="Random Forest model"
),
TrainingConfig(
model_type=MLBaselineType.SVR,
hyperparameters={'C': 1.0, 'gamma': 'scale'},
description="SVR model"
)
]
# Train models
results = manager.create_model_suite(configs, X, y, training_info)
# List trained models
models = manager.list_models()
for model in models:
print(f"Model: {model.model_id}, Type: {model.model_type}")
Fast Inference
from neurological_lrd_analysis import (
quick_predict, quick_ensemble_predict, PretrainedInference
)
# Single model prediction
hurst_rf = quick_predict(time_series, "models_directory", "random_forest")
hurst_svr = quick_predict(time_series, "models_directory", "svr")
# Ensemble prediction (best accuracy)
hurst_ensemble, uncertainty = quick_ensemble_predict(time_series, "models_directory")
# Batch prediction
inference = PretrainedInference("models_directory")
predictions = inference.predict_batch(time_series_list)
# Ensemble batch prediction
ensemble_predictions = inference.ensemble_predict_batch(time_series_list)
Benchmark Comparison
The library provides comprehensive benchmarking between classical and ML methods:
from neurological_lrd_analysis import ClassicalMLBenchmark, run_comprehensive_benchmark
# Create benchmark system
benchmark = ClassicalMLBenchmark(
pretrained_models_dir="pretrained_models",
classical_estimators=[EstimatorType.DFA, EstimatorType.RS_ANALYSIS],
ml_estimators=['random_forest', 'ensemble']
)
# Run comprehensive benchmark
results = benchmark.run_comprehensive_benchmark(
test_scenarios=test_scenarios,
save_results=True
)
# Access results
print("Performance Summary:")
for method_name, summary in results['summaries'].items():
print(f"{method_name}:")
print(f" MAE: {summary.mean_absolute_error:.4f}")
print(f" RMSE: {summary.root_mean_squared_error:.4f}")
print(f" Correlation: {summary.correlation:.4f}")
print(f" Mean time: {summary.mean_computation_time*1000:.1f}ms")
Performance Results
Based on comprehensive benchmarking, the ML methods show superior performance:
Performance Rankings (MAE - Mean Absolute Error) 1. Ensemble (ML): MAE 0.1518 - BEST OVERALL 2. DFA (Classical): MAE 0.1983 - Best Classical 3. R/S Analysis (Classical): MAE 0.1993 4. Periodogram (Classical): MAE 0.2038 5. Higuchi (Classical): MAE 0.9906
Speed Rankings (computation time) 1. Periodogram: 14.0ms - FASTEST 2. Ensemble (ML): 59.3ms - ML is very fast! 3. R/S Analysis: 694.2ms 4. Higuchi: 811.5ms 5. DFA: 2044.5ms
Key Findings
ML Ensemble method achieved the best accuracy (MAE: 0.1518)
ML methods are significantly faster than most classical methods
ML ensemble is 4x faster than DFA while being more accurate
ML methods show excellent correlation (0.9294) with true values
API Reference
TimeSeriesFeatureExtractor
- class neurological_lrd_analysis.TimeSeriesFeatureExtractor(include_spectral=True, include_wavelet=True, include_fractal=True, include_biomedical=True, sampling_rate=250.0)[source]
Bases:
objectComprehensive feature extractor for time series data.
Extracts statistical, spectral, wavelet, fractal, and biomedical-specific features that are relevant for Hurst exponent estimation.
- __init__(include_spectral=True, include_wavelet=True, include_fractal=True, include_biomedical=True, sampling_rate=250.0)[source]
Initialize the feature extractor.
Parameters:
- include_spectralbool
Whether to include spectral features
- include_waveletbool
Whether to include wavelet features
- include_fractalbool
Whether to include fractal features
- include_biomedicalbool
Whether to include biomedical-specific features
- sampling_ratefloat
Sampling rate for biomedical feature extraction
RandomForestEstimator
- class neurological_lrd_analysis.RandomForestEstimator(n_estimators=100, max_depth=None, min_samples_split=2, min_samples_leaf=1, random_state=42, **kwargs)[source]
Bases:
BaseMLEstimatorRandom Forest estimator for Hurst exponent prediction.
- __init__(n_estimators=100, max_depth=None, min_samples_split=2, min_samples_leaf=1, random_state=42, **kwargs)[source]
Initialize Random Forest estimator.
Parameters:
- n_estimatorsint
Number of trees in the forest
- max_depthint, optional
Maximum depth of trees
- min_samples_splitint
Minimum samples to split a node
- min_samples_leafint
Minimum samples in a leaf
- random_stateint
Random state for reproducibility
- **kwargs
Additional parameters for RandomForestRegressor
SVREstimator
- class neurological_lrd_analysis.SVREstimator(kernel='rbf', C=1.0, gamma='scale', epsilon=0.1, **kwargs)[source]
Bases:
BaseMLEstimatorSupport Vector Regression estimator for Hurst exponent prediction.
- __init__(kernel='rbf', C=1.0, gamma='scale', epsilon=0.1, **kwargs)[source]
Initialize SVR estimator.
Parameters:
- kernelstr
Kernel type (‘rbf’, ‘linear’, ‘poly’, ‘sigmoid’)
- Cfloat
Regularization parameter
- gammastr or float
Kernel coefficient
- epsilonfloat
Epsilon-tube parameter
- **kwargs
Additional parameters for SVR
GradientBoostingEstimator
- class neurological_lrd_analysis.GradientBoostingEstimator(n_estimators=100, learning_rate=0.1, max_depth=3, min_samples_split=2, min_samples_leaf=1, random_state=42, **kwargs)[source]
Bases:
BaseMLEstimatorGradient Boosting estimator for Hurst exponent prediction.
- __init__(n_estimators=100, learning_rate=0.1, max_depth=3, min_samples_split=2, min_samples_leaf=1, random_state=42, **kwargs)[source]
Initialize Gradient Boosting estimator.
Parameters:
- n_estimatorsint
Number of boosting stages
- learning_ratefloat
Learning rate
- max_depthint
Maximum depth of trees
- min_samples_splitint
Minimum samples to split a node
- min_samples_leafint
Minimum samples in a leaf
- random_stateint
Random state for reproducibility
- **kwargs
Additional parameters for GradientBoostingRegressor
PretrainedModelManager
- class neurological_lrd_analysis.PretrainedModelManager(models_dir='pretrained_models')[source]
Bases:
objectManager for pretrained ML models.
Handles creation, storage, loading, and management of pretrained models for Hurst exponent estimation.
- __init__(models_dir='pretrained_models')[source]
Initialize the pretrained model manager.
Parameters:
- models_dirstr or Path
Directory to store pretrained models
- create_training_data(hurst_values=None, lengths=None, n_samples_per_config=100, generators=None, contaminations=None, biomedical_scenarios=None, random_state=42)[source]
Create comprehensive training dataset.
Parameters:
- hurst_valuesList[float], optional
Hurst values to generate
- lengthsList[int], optional
Time series lengths
- n_samples_per_configint
Number of samples per configuration
- generatorsList[str], optional
Data generators to use
- contaminationsList[str], optional
Contamination types
- biomedical_scenariosList[str], optional
Biomedical scenarios
- random_stateint
Random state for reproducibility
Returns:
: Tuple[np.ndarray, np.ndarray, Dict[str, Any]]
(X, y, training_info) - features, targets, and metadata
- train_model(training_config, X, y, training_info)[source]
Train a model and save it as pretrained.
- Return type:
ModelMetadata
Parameters:
- training_configTrainingConfig
Training configuration
- Xnp.ndarray
Training features
- ynp.ndarray
Training targets
- training_infoDict[str, Any]
Training dataset information
Returns:
: ModelMetadata
Metadata for the trained model
- load_model(model_id)[source]
Load a pretrained model.
- Return type:
Tuple[BaseMLEstimator,ModelMetadata]
Parameters:
- model_idstr
ID of the model to load
Returns:
: Tuple[BaseMLEstimator, ModelMetadata]
Loaded model and its metadata
- list_models(model_type=None, status=None, tags=None)[source]
List available models with optional filtering.
- Return type:
List[ModelMetadata]
Parameters:
- model_typeMLBaselineType, optional
Filter by model type
- statusModelStatus, optional
Filter by status
- tagsList[str], optional
Filter by tags
Returns:
: List[ModelMetadata]
List of matching models
- get_best_model(model_type, metric='validation_score')[source]
Get the best performing model of a given type.
- Return type:
Tuple[BaseMLEstimator,ModelMetadata]
Parameters:
- model_typeMLBaselineType
Type of model to get
- metricstr
Metric to use for ranking
Returns:
: Tuple[BaseMLEstimator, ModelMetadata]
Best model and its metadata
- predict(model_id, data, return_metadata=False)[source]
Make prediction using a pretrained model.
Parameters:
- model_idstr
ID of the model to use
- datanp.ndarray
Time series data
- return_metadatabool
Whether to return model metadata
Returns:
: Union[float, Tuple[float, ModelMetadata]]
Prediction result and optionally metadata
- create_model_suite(training_configs, X, y, training_info)[source]
Create a suite of pretrained models.
- Return type:
List[ModelMetadata]
Parameters:
- training_configsList[TrainingConfig]
List of training configurations
- Xnp.ndarray
Training features
- ynp.ndarray
Training targets
- training_infoDict[str, Any]
Training dataset information
Returns:
: List[ModelMetadata]
Metadata for all trained models
PretrainedInference
- class neurological_lrd_analysis.PretrainedInference(models_dir='pretrained_models')[source]
Bases:
objectHigh-level interface for pretrained model inference.
Provides easy-to-use methods for Hurst exponent estimation using pretrained ML models with support for single predictions, batch processing, and ensemble methods.
- __init__(models_dir='pretrained_models')[source]
Initialize the inference system.
Parameters:
- models_dirstr or Path
Directory containing pretrained models
- predict_single(data, model_id=None, model_type=None, use_best=True)[source]
Predict Hurst exponent for a single time series.
- Return type:
PredictionResult
Parameters:
- datanp.ndarray
Time series data
- model_idstr, optional
Specific model ID to use
- model_typeMLBaselineType, optional
Type of model to use (will select best if multiple available)
- use_bestbool
Whether to use the best performing model if model_id not specified
Returns:
: PredictionResult
Prediction result with metadata
- predict_batch(data_list, model_id=None, model_type=None, use_best=True, show_progress=True)[source]
Predict Hurst exponents for multiple time series.
- Return type:
List[PredictionResult]
Parameters:
- data_listList[np.ndarray]
List of time series data
- model_idstr, optional
Specific model ID to use
- model_typeMLBaselineType, optional
Type of model to use
- use_bestbool
Whether to use the best performing model
- show_progressbool
Whether to show progress during batch processing
Returns:
: List[PredictionResult]
List of prediction results
- predict_ensemble(data, model_types=None, weights=None, include_uncertainty=True)[source]
Predict using ensemble of models.
- Return type:
EnsembleResult
Parameters:
- datanp.ndarray
Time series data
- model_typesList[MLBaselineType], optional
Types of models to include in ensemble
- weightsDict[str, float], optional
Weights for each model type
- include_uncertaintybool
Whether to include uncertainty quantification
Returns:
: EnsembleResult
Ensemble prediction result
- compare_models(data, model_types=None)[source]
Compare predictions from different model types.
Parameters:
- datanp.ndarray
Time series data
- model_typesList[MLBaselineType], optional
Types of models to compare
Returns:
: Dict[str, PredictionResult]
Predictions from each model type
- get_model_info(model_id=None)[source]
Get information about available models.
- Return type:
Union[ModelMetadata,List[ModelMetadata]]
Parameters:
- model_idstr, optional
Specific model ID, or None for all models
Returns:
: Union[ModelMetadata, List[ModelMetadata]]
Model metadata
- benchmark_models(test_data, true_hurst, model_types=None)[source]
Benchmark model performance on test data.
Parameters:
- test_dataList[np.ndarray]
Test time series data
- true_hurstList[float]
True Hurst exponents
- model_typesList[MLBaselineType], optional
Types of models to benchmark
Returns:
: Dict[str, Dict[str, float]]
Performance metrics for each model type
ClassicalMLBenchmark
- class neurological_lrd_analysis.ClassicalMLBenchmark(pretrained_models_dir='pretrained_models', classical_estimators=None, ml_estimators=None)[source]
Bases:
objectComprehensive benchmark comparing classical and ML methods.
Provides systematic comparison of classical Hurst estimation methods with machine learning baseline models across various test scenarios.
- __init__(pretrained_models_dir='pretrained_models', classical_estimators=None, ml_estimators=None)[source]
Initialize the benchmark system.
Parameters:
- pretrained_models_dirstr or Path
Directory containing pretrained ML models
- classical_estimatorsList[EstimatorType], optional
Classical estimators to include
- ml_estimatorsList[str], optional
ML model types to include
- create_test_scenarios(hurst_values=None, lengths=None, n_samples_per_config=10, include_contamination=True, include_biomedical=True)[source]
Create comprehensive test scenarios.
- Return type:
Parameters:
- hurst_valuesList[float], optional
Hurst values to test
- lengthsList[int], optional
Time series lengths
- n_samples_per_configint
Number of samples per configuration
- include_contaminationbool
Whether to include contaminated data
- include_biomedicalbool
Whether to include biomedical scenarios
Returns:
: List[TimeSeriesSample]
Test scenarios
- benchmark_classical_methods(samples)[source]
Benchmark classical Hurst estimation methods.
Parameters:
- samplesList[TimeSeriesSample]
Test scenarios
Returns:
: Dict[str, List[BenchmarkResult]]
Results for each classical method
- benchmark_ml_methods(samples)[source]
Benchmark machine learning methods.
Parameters:
- samplesList[TimeSeriesSample]
Test scenarios
Returns:
: Dict[str, List[BenchmarkResult]]
Results for each ML method
- run_comprehensive_benchmark(test_scenarios=None, save_results=True, results_dir='benchmark_results')[source]
Run comprehensive benchmark comparison.
Parameters:
- test_scenariosList[TimeSeriesSample], optional
Test scenarios to use
- save_resultsbool
Whether to save results to disk
- results_dirstr or Path
Directory to save results
Returns:
: Dict[str, Any]
Complete benchmark results
Functions
- neurological_lrd_analysis.create_pretrained_suite(models_dir='pretrained_models', force_retrain=False)[source]
Create a complete suite of pretrained models.
- Return type:
Parameters:
- models_dirstr or Path
Directory to store models
- force_retrainbool
Whether to retrain existing models
Returns:
: PretrainedModelManager
Manager with trained models
- neurological_lrd_analysis.quick_predict(data, models_dir='pretrained_models', model_type=None)[source]
Quick prediction function for single time series.
- Return type:
Parameters:
- datanp.ndarray
Time series data
- models_dirstr or Path
Directory containing pretrained models
- model_typeMLBaselineType, optional
Type of model to use
Returns:
: float
Predicted Hurst exponent
- neurological_lrd_analysis.quick_ensemble_predict(data, models_dir='pretrained_models', model_types=None)[source]
Quick ensemble prediction function.
Parameters:
- datanp.ndarray
Time series data
- models_dirstr or Path
Directory containing pretrained models
- model_typesList[MLBaselineType], optional
Types of models to include in ensemble
Returns:
: Tuple[float, float]
(mean_estimate, std_estimate)
- neurological_lrd_analysis.run_comprehensive_benchmark(pretrained_models_dir='pretrained_models', results_dir='benchmark_results', test_scenarios=None)[source]
Run comprehensive benchmark comparison.
Parameters:
- pretrained_models_dirstr or Path
Directory containing pretrained models
- results_dirstr or Path
Directory to save results
- test_scenariosList[TimeSeriesSample], optional
Test scenarios to use
Returns:
: Dict[str, Any]
Complete benchmark results