Machine Learning Baselines

The neurological LRD analysis library includes comprehensive machine learning baselines for Hurst exponent estimation, providing state-of-the-art performance with fast inference capabilities.

Overview

The ML baselines system provides:

  • 74+ Feature Extraction: Comprehensive feature engineering for time series data

  • Multiple ML Models: Random Forest, SVR, Gradient Boosting, and Ensemble methods

  • Hyperparameter Optimization: Automated tuning with Optuna

  • Pretrained Models: Fast inference with pre-trained models

  • Real-time Performance: 10-50ms prediction times

Quick Start

from neurological_lrd_analysis import (
    create_pretrained_suite, quick_predict, quick_ensemble_predict
)

# Create pretrained models (one-time setup)
create_pretrained_suite("pretrained_models", force_retrain=True)

# Fast ML prediction
hurst_ml = quick_predict(your_time_series, "pretrained_models", "random_forest")

# Ensemble prediction (best accuracy)
hurst_ensemble, uncertainty = quick_ensemble_predict(your_time_series, "pretrained_models")

Feature Extraction

The TimeSeriesFeatureExtractor class provides comprehensive feature extraction from time series data:

from neurological_lrd_analysis import TimeSeriesFeatureExtractor

# Create feature extractor
extractor = TimeSeriesFeatureExtractor()

# Extract features
features = extractor.extract_features(time_series_data)
print(f"Extracted {len(features)} features")

Feature Categories

The feature extractor provides features in several categories:

Statistical Features - Mean, variance, skewness, kurtosis - Percentiles, quartiles, range - Autocorrelation at various lags

Spectral Features - Power spectral density - Spectral centroid, bandwidth, rolloff - Frequency band power ratios (delta, theta, alpha, beta, gamma)

Wavelet Features - Wavelet energy at multiple scales - Wavelet entropy and complexity - Multiresolution analysis

Fractal Features - Detrended Fluctuation Analysis (DFA) - Higuchi fractal dimension - Generalized Hurst exponent

Biomedical Features - EEG-specific features (electrode characteristics) - ECG-specific features (heart rate variability) - Respiratory features (breathing patterns)

ML Estimators

The library provides several ML estimators for Hurst exponent estimation:

Random Forest Estimator

from neurological_lrd_analysis import RandomForestEstimator

# Create estimator
estimator = RandomForestEstimator()

# Train model
result = estimator.train(X_train, y_train, validation_split=0.2)

# Make predictions
predictions = estimator.predict(X_test)

# Get feature importance
importance = estimator.get_feature_importance()

SVR Estimator

from neurological_lrd_analysis import SVREstimator

# Create estimator
estimator = SVREstimator()

# Train model
result = estimator.train(X_train, y_train, validation_split=0.2)

# Make predictions
predictions = estimator.predict(X_test)

Gradient Boosting Estimator

from neurological_lrd_analysis import GradientBoostingEstimator

# Create estimator
estimator = GradientBoostingEstimator()

# Train model
result = estimator.train(X_train, y_train, validation_split=0.2)

# Make predictions
predictions = estimator.predict(X_test)

Hyperparameter Optimization

The library integrates with Optuna for automated hyperparameter optimization:

from neurological_lrd_analysis import (
    OptunaOptimizer, create_optuna_study, optimize_hyperparameters
)

# Create optimization study
study = create_optuna_study(
    model_type="random_forest",
    X_train=X_train,
    y_train=y_train,
    n_trials=100
)

# Run optimization
best_params = optimize_hyperparameters(
    model_type="random_forest",
    X_train=X_train,
    y_train=y_train,
    n_trials=100
)

print(f"Best parameters: {best_params}")

Pretrained Models

The pretrained model system provides efficient model management and fast inference:

Model Management

from neurological_lrd_analysis import PretrainedModelManager, TrainingConfig, MLBaselineType

# Create model manager
manager = PretrainedModelManager("models_directory")

# Create training data
X, y, training_info = manager.create_training_data(
    hurst_values=[0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9],
    lengths=[500, 1000, 2000],
    generators=['fbm', 'fgn', 'arfima'],
    contaminations=['none', 'noise', 'missing'],
    biomedical_scenarios=['eeg', 'ecg', 'respiratory']
)

# Create training configurations
configs = [
    TrainingConfig(
        model_type=MLBaselineType.RANDOM_FOREST,
        hyperparameters={'n_estimators': 100, 'random_state': 42},
        description="Random Forest model"
    ),
    TrainingConfig(
        model_type=MLBaselineType.SVR,
        hyperparameters={'C': 1.0, 'gamma': 'scale'},
        description="SVR model"
    )
]

# Train models
results = manager.create_model_suite(configs, X, y, training_info)

# List trained models
models = manager.list_models()
for model in models:
    print(f"Model: {model.model_id}, Type: {model.model_type}")

Fast Inference

from neurological_lrd_analysis import (
    quick_predict, quick_ensemble_predict, PretrainedInference
)

# Single model prediction
hurst_rf = quick_predict(time_series, "models_directory", "random_forest")
hurst_svr = quick_predict(time_series, "models_directory", "svr")

# Ensemble prediction (best accuracy)
hurst_ensemble, uncertainty = quick_ensemble_predict(time_series, "models_directory")

# Batch prediction
inference = PretrainedInference("models_directory")
predictions = inference.predict_batch(time_series_list)

# Ensemble batch prediction
ensemble_predictions = inference.ensemble_predict_batch(time_series_list)

Benchmark Comparison

The library provides comprehensive benchmarking between classical and ML methods:

from neurological_lrd_analysis import ClassicalMLBenchmark, run_comprehensive_benchmark

# Create benchmark system
benchmark = ClassicalMLBenchmark(
    pretrained_models_dir="pretrained_models",
    classical_estimators=[EstimatorType.DFA, EstimatorType.RS_ANALYSIS],
    ml_estimators=['random_forest', 'ensemble']
)

# Run comprehensive benchmark
results = benchmark.run_comprehensive_benchmark(
    test_scenarios=test_scenarios,
    save_results=True
)

# Access results
print("Performance Summary:")
for method_name, summary in results['summaries'].items():
    print(f"{method_name}:")
    print(f"  MAE: {summary.mean_absolute_error:.4f}")
    print(f"  RMSE: {summary.root_mean_squared_error:.4f}")
    print(f"  Correlation: {summary.correlation:.4f}")
    print(f"  Mean time: {summary.mean_computation_time*1000:.1f}ms")

Performance Results

Based on comprehensive benchmarking, the ML methods show superior performance:

Performance Rankings (MAE - Mean Absolute Error) 1. Ensemble (ML): MAE 0.1518 - BEST OVERALL 2. DFA (Classical): MAE 0.1983 - Best Classical 3. R/S Analysis (Classical): MAE 0.1993 4. Periodogram (Classical): MAE 0.2038 5. Higuchi (Classical): MAE 0.9906

Speed Rankings (computation time) 1. Periodogram: 14.0ms - FASTEST 2. Ensemble (ML): 59.3ms - ML is very fast! 3. R/S Analysis: 694.2ms 4. Higuchi: 811.5ms 5. DFA: 2044.5ms

Key Findings

  • ML Ensemble method achieved the best accuracy (MAE: 0.1518)

  • ML methods are significantly faster than most classical methods

  • ML ensemble is 4x faster than DFA while being more accurate

  • ML methods show excellent correlation (0.9294) with true values

API Reference

TimeSeriesFeatureExtractor

class neurological_lrd_analysis.TimeSeriesFeatureExtractor(include_spectral=True, include_wavelet=True, include_fractal=True, include_biomedical=True, sampling_rate=250.0)[source]

Bases: object

Comprehensive feature extractor for time series data.

Extracts statistical, spectral, wavelet, fractal, and biomedical-specific features that are relevant for Hurst exponent estimation.

__init__(include_spectral=True, include_wavelet=True, include_fractal=True, include_biomedical=True, sampling_rate=250.0)[source]

Initialize the feature extractor.

Parameters:

include_spectralbool

Whether to include spectral features

include_waveletbool

Whether to include wavelet features

include_fractalbool

Whether to include fractal features

include_biomedicalbool

Whether to include biomedical-specific features

sampling_ratefloat

Sampling rate for biomedical feature extraction

extract_features(data, true_hurst=None)[source]

Extract comprehensive features from time series data.

Return type:

FeatureSet

Parameters:

datanp.ndarray

Time series data

true_hurstfloat, optional

True Hurst exponent (for validation)

Returns:

: FeatureSet

Extracted features

RandomForestEstimator

class neurological_lrd_analysis.RandomForestEstimator(n_estimators=100, max_depth=None, min_samples_split=2, min_samples_leaf=1, random_state=42, **kwargs)[source]

Bases: BaseMLEstimator

Random Forest estimator for Hurst exponent prediction.

__init__(n_estimators=100, max_depth=None, min_samples_split=2, min_samples_leaf=1, random_state=42, **kwargs)[source]

Initialize Random Forest estimator.

Parameters:

n_estimatorsint

Number of trees in the forest

max_depthint, optional

Maximum depth of trees

min_samples_splitint

Minimum samples to split a node

min_samples_leafint

Minimum samples in a leaf

random_stateint

Random state for reproducibility

**kwargs

Additional parameters for RandomForestRegressor

SVREstimator

class neurological_lrd_analysis.SVREstimator(kernel='rbf', C=1.0, gamma='scale', epsilon=0.1, **kwargs)[source]

Bases: BaseMLEstimator

Support Vector Regression estimator for Hurst exponent prediction.

__init__(kernel='rbf', C=1.0, gamma='scale', epsilon=0.1, **kwargs)[source]

Initialize SVR estimator.

Parameters:

kernelstr

Kernel type (‘rbf’, ‘linear’, ‘poly’, ‘sigmoid’)

Cfloat

Regularization parameter

gammastr or float

Kernel coefficient

epsilonfloat

Epsilon-tube parameter

**kwargs

Additional parameters for SVR

GradientBoostingEstimator

class neurological_lrd_analysis.GradientBoostingEstimator(n_estimators=100, learning_rate=0.1, max_depth=3, min_samples_split=2, min_samples_leaf=1, random_state=42, **kwargs)[source]

Bases: BaseMLEstimator

Gradient Boosting estimator for Hurst exponent prediction.

__init__(n_estimators=100, learning_rate=0.1, max_depth=3, min_samples_split=2, min_samples_leaf=1, random_state=42, **kwargs)[source]

Initialize Gradient Boosting estimator.

Parameters:

n_estimatorsint

Number of boosting stages

learning_ratefloat

Learning rate

max_depthint

Maximum depth of trees

min_samples_splitint

Minimum samples to split a node

min_samples_leafint

Minimum samples in a leaf

random_stateint

Random state for reproducibility

**kwargs

Additional parameters for GradientBoostingRegressor

PretrainedModelManager

class neurological_lrd_analysis.PretrainedModelManager(models_dir='pretrained_models')[source]

Bases: object

Manager for pretrained ML models.

Handles creation, storage, loading, and management of pretrained models for Hurst exponent estimation.

__init__(models_dir='pretrained_models')[source]

Initialize the pretrained model manager.

Parameters:

models_dirstr or Path

Directory to store pretrained models

create_training_data(hurst_values=None, lengths=None, n_samples_per_config=100, generators=None, contaminations=None, biomedical_scenarios=None, random_state=42)[source]

Create comprehensive training dataset.

Return type:

Tuple[ndarray, ndarray, Dict[str, Any]]

Parameters:

hurst_valuesList[float], optional

Hurst values to generate

lengthsList[int], optional

Time series lengths

n_samples_per_configint

Number of samples per configuration

generatorsList[str], optional

Data generators to use

contaminationsList[str], optional

Contamination types

biomedical_scenariosList[str], optional

Biomedical scenarios

random_stateint

Random state for reproducibility

Returns:

: Tuple[np.ndarray, np.ndarray, Dict[str, Any]]

(X, y, training_info) - features, targets, and metadata

train_model(training_config, X, y, training_info)[source]

Train a model and save it as pretrained.

Return type:

ModelMetadata

Parameters:

training_configTrainingConfig

Training configuration

Xnp.ndarray

Training features

ynp.ndarray

Training targets

training_infoDict[str, Any]

Training dataset information

Returns:

: ModelMetadata

Metadata for the trained model

load_model(model_id)[source]

Load a pretrained model.

Return type:

Tuple[BaseMLEstimator, ModelMetadata]

Parameters:

model_idstr

ID of the model to load

Returns:

: Tuple[BaseMLEstimator, ModelMetadata]

Loaded model and its metadata

list_models(model_type=None, status=None, tags=None)[source]

List available models with optional filtering.

Return type:

List[ModelMetadata]

Parameters:

model_typeMLBaselineType, optional

Filter by model type

statusModelStatus, optional

Filter by status

tagsList[str], optional

Filter by tags

Returns:

: List[ModelMetadata]

List of matching models

get_best_model(model_type, metric='validation_score')[source]

Get the best performing model of a given type.

Return type:

Tuple[BaseMLEstimator, ModelMetadata]

Parameters:

model_typeMLBaselineType

Type of model to get

metricstr

Metric to use for ranking

Returns:

: Tuple[BaseMLEstimator, ModelMetadata]

Best model and its metadata

predict(model_id, data, return_metadata=False)[source]

Make prediction using a pretrained model.

Return type:

Union[float, Tuple[float, ModelMetadata]]

Parameters:

model_idstr

ID of the model to use

datanp.ndarray

Time series data

return_metadatabool

Whether to return model metadata

Returns:

: Union[float, Tuple[float, ModelMetadata]]

Prediction result and optionally metadata

create_model_suite(training_configs, X, y, training_info)[source]

Create a suite of pretrained models.

Return type:

List[ModelMetadata]

Parameters:

training_configsList[TrainingConfig]

List of training configurations

Xnp.ndarray

Training features

ynp.ndarray

Training targets

training_infoDict[str, Any]

Training dataset information

Returns:

: List[ModelMetadata]

Metadata for all trained models

cleanup_models(keep_best=True, max_models_per_type=5)[source]

Clean up old or redundant models.

Return type:

None

Parameters:

keep_bestbool

Whether to keep the best performing model of each type

max_models_per_typeint

Maximum number of models to keep per type

PretrainedInference

class neurological_lrd_analysis.PretrainedInference(models_dir='pretrained_models')[source]

Bases: object

High-level interface for pretrained model inference.

Provides easy-to-use methods for Hurst exponent estimation using pretrained ML models with support for single predictions, batch processing, and ensemble methods.

__init__(models_dir='pretrained_models')[source]

Initialize the inference system.

Parameters:

models_dirstr or Path

Directory containing pretrained models

predict_single(data, model_id=None, model_type=None, use_best=True)[source]

Predict Hurst exponent for a single time series.

Return type:

PredictionResult

Parameters:

datanp.ndarray

Time series data

model_idstr, optional

Specific model ID to use

model_typeMLBaselineType, optional

Type of model to use (will select best if multiple available)

use_bestbool

Whether to use the best performing model if model_id not specified

Returns:

: PredictionResult

Prediction result with metadata

predict_batch(data_list, model_id=None, model_type=None, use_best=True, show_progress=True)[source]

Predict Hurst exponents for multiple time series.

Return type:

List[PredictionResult]

Parameters:

data_listList[np.ndarray]

List of time series data

model_idstr, optional

Specific model ID to use

model_typeMLBaselineType, optional

Type of model to use

use_bestbool

Whether to use the best performing model

show_progressbool

Whether to show progress during batch processing

Returns:

: List[PredictionResult]

List of prediction results

predict_ensemble(data, model_types=None, weights=None, include_uncertainty=True)[source]

Predict using ensemble of models.

Return type:

EnsembleResult

Parameters:

datanp.ndarray

Time series data

model_typesList[MLBaselineType], optional

Types of models to include in ensemble

weightsDict[str, float], optional

Weights for each model type

include_uncertaintybool

Whether to include uncertainty quantification

Returns:

: EnsembleResult

Ensemble prediction result

compare_models(data, model_types=None)[source]

Compare predictions from different model types.

Return type:

Dict[str, PredictionResult]

Parameters:

datanp.ndarray

Time series data

model_typesList[MLBaselineType], optional

Types of models to compare

Returns:

: Dict[str, PredictionResult]

Predictions from each model type

get_model_info(model_id=None)[source]

Get information about available models.

Return type:

Union[ModelMetadata, List[ModelMetadata]]

Parameters:

model_idstr, optional

Specific model ID, or None for all models

Returns:

: Union[ModelMetadata, List[ModelMetadata]]

Model metadata

benchmark_models(test_data, true_hurst, model_types=None)[source]

Benchmark model performance on test data.

Return type:

Dict[str, Dict[str, float]]

Parameters:

test_dataList[np.ndarray]

Test time series data

true_hurstList[float]

True Hurst exponents

model_typesList[MLBaselineType], optional

Types of models to benchmark

Returns:

: Dict[str, Dict[str, float]]

Performance metrics for each model type

ClassicalMLBenchmark

class neurological_lrd_analysis.ClassicalMLBenchmark(pretrained_models_dir='pretrained_models', classical_estimators=None, ml_estimators=None)[source]

Bases: object

Comprehensive benchmark comparing classical and ML methods.

Provides systematic comparison of classical Hurst estimation methods with machine learning baseline models across various test scenarios.

__init__(pretrained_models_dir='pretrained_models', classical_estimators=None, ml_estimators=None)[source]

Initialize the benchmark system.

Parameters:

pretrained_models_dirstr or Path

Directory containing pretrained ML models

classical_estimatorsList[EstimatorType], optional

Classical estimators to include

ml_estimatorsList[str], optional

ML model types to include

create_test_scenarios(hurst_values=None, lengths=None, n_samples_per_config=10, include_contamination=True, include_biomedical=True)[source]

Create comprehensive test scenarios.

Return type:

List[TimeSeriesSample]

Parameters:

hurst_valuesList[float], optional

Hurst values to test

lengthsList[int], optional

Time series lengths

n_samples_per_configint

Number of samples per configuration

include_contaminationbool

Whether to include contaminated data

include_biomedicalbool

Whether to include biomedical scenarios

Returns:

: List[TimeSeriesSample]

Test scenarios

benchmark_classical_methods(samples)[source]

Benchmark classical Hurst estimation methods.

Return type:

Dict[str, List[BenchmarkResult]]

Parameters:

samplesList[TimeSeriesSample]

Test scenarios

Returns:

: Dict[str, List[BenchmarkResult]]

Results for each classical method

benchmark_ml_methods(samples)[source]

Benchmark machine learning methods.

Return type:

Dict[str, List[BenchmarkResult]]

Parameters:

samplesList[TimeSeriesSample]

Test scenarios

Returns:

: Dict[str, List[BenchmarkResult]]

Results for each ML method

run_comprehensive_benchmark(test_scenarios=None, save_results=True, results_dir='benchmark_results')[source]

Run comprehensive benchmark comparison.

Return type:

Dict[str, Any]

Parameters:

test_scenariosList[TimeSeriesSample], optional

Test scenarios to use

save_resultsbool

Whether to save results to disk

results_dirstr or Path

Directory to save results

Returns:

: Dict[str, Any]

Complete benchmark results

calculate_summaries(results)[source]

Calculate performance summaries for all methods.

Return type:

Dict[str, BenchmarkSummary]

print_benchmark_summary(summaries)[source]

Print benchmark summary.

Return type:

None

save_benchmark_results(benchmark_data, results_dir)[source]

Save benchmark results to disk.

Return type:

None

create_visualizations(benchmark_data, save_path=None)[source]

Create comprehensive visualizations of benchmark results.

Return type:

None

Functions

neurological_lrd_analysis.create_pretrained_suite(models_dir='pretrained_models', force_retrain=False)[source]

Create a complete suite of pretrained models.

Return type:

PretrainedModelManager

Parameters:

models_dirstr or Path

Directory to store models

force_retrainbool

Whether to retrain existing models

Returns:

: PretrainedModelManager

Manager with trained models

neurological_lrd_analysis.quick_predict(data, models_dir='pretrained_models', model_type=None)[source]

Quick prediction function for single time series.

Return type:

float

Parameters:

datanp.ndarray

Time series data

models_dirstr or Path

Directory containing pretrained models

model_typeMLBaselineType, optional

Type of model to use

Returns:

: float

Predicted Hurst exponent

neurological_lrd_analysis.quick_ensemble_predict(data, models_dir='pretrained_models', model_types=None)[source]

Quick ensemble prediction function.

Return type:

Tuple[float, float]

Parameters:

datanp.ndarray

Time series data

models_dirstr or Path

Directory containing pretrained models

model_typesList[MLBaselineType], optional

Types of models to include in ensemble

Returns:

: Tuple[float, float]

(mean_estimate, std_estimate)

neurological_lrd_analysis.run_comprehensive_benchmark(pretrained_models_dir='pretrained_models', results_dir='benchmark_results', test_scenarios=None)[source]

Run comprehensive benchmark comparison.

Return type:

Dict[str, Any]

Parameters:

pretrained_models_dirstr or Path

Directory containing pretrained models

results_dirstr or Path

Directory to save results

test_scenariosList[TimeSeriesSample], optional

Test scenarios to use

Returns:

: Dict[str, Any]

Complete benchmark results