Neurological LRD Analysis - Tutorial Guide
Table of Contents
Installation and Setup
Prerequisites
Python 3.11 or higher
Basic understanding of time series analysis
Familiarity with NumPy and pandas
Quick Setup
# Clone the repository
git clone <repository-url>
cd neurological_lrd_analysis
# Set up virtual environment
./setup_venv.sh
# Activate environment
source neurological_env/bin/activate
# Verify installation
python -c "from neurological_lrd_analysis import BiomedicalHurstEstimatorFactory; print('Installation successful!')"
Manual Setup
# Create virtual environment
python3 -m venv neurological_env
source neurological_env/bin/activate
# Install dependencies
pip install -e .
# Install optional dependencies
pip install jax jaxlib numba pywavelets scikit-learn matplotlib seaborn
Basic Usage
Your First Hurst Estimation
import numpy as np
from neurological_lrd_analysis import BiomedicalHurstEstimatorFactory, EstimatorType
# Generate sample data (fractional Brownian motion)
np.random.seed(42)
n = 1000
hurst_true = 0.7
data = np.cumsum(np.random.randn(n) * 0.1) # Simplified fBm
# Create factory instance
factory = BiomedicalHurstEstimatorFactory()
# Estimate Hurst exponent using DFA
result = factory.estimate(data, EstimatorType.DFA)
# Display results
print(f"Estimated Hurst exponent: {result.hurst_estimate:.3f}")
print(f"Confidence interval: [{result.confidence_interval[0]:.3f}, {result.confidence_interval[1]:.3f}]")
print(f"Data quality score: {result.data_quality_score:.3f}")
print(f"Computation time: {result.computation_time:.3f} seconds")
Comparing Multiple Methods
# Compare all available methods
group_result = factory.estimate(data, EstimatorType.ALL)
print(f"Ensemble estimate: {group_result.ensemble_estimate:.3f}")
print(f"Best method: {group_result.best_method}")
print(f"Method agreement: {group_result.method_agreement:.3f}")
# Display individual results
for result in group_result.individual_results:
print(f"{result.estimator_name:15s}: {result.hurst_estimate:.3f} ± {result.standard_error:.3f}")
Understanding Hurst Exponents
What is the Hurst Exponent?
The Hurst exponent (H) is a measure of long-range dependence in time series data:
H = 0.5: Random walk (no long-range dependence)
H > 0.5: Persistent behavior (trends tend to continue)
H < 0.5: Anti-persistent behavior (trends tend to reverse)
Interpretation in Biomedical Context
EEG signals: H ≈ 0.5-0.8 (brain activity shows persistence)
Heart rate variability: H ≈ 0.5-0.7 (healthy heart shows some persistence)
Blood pressure: H ≈ 0.5-0.6 (regulatory mechanisms)
Respiratory signals: H ≈ 0.5-0.8 (breathing patterns)
Example: Different Hurst Values
import matplotlib.pyplot as plt
# Generate data with different Hurst values
hurst_values = [0.3, 0.5, 0.7]
fig, axes = plt.subplots(3, 1, figsize=(12, 8))
for i, h in enumerate(hurst_values):
# Generate fBm-like data
np.random.seed(42)
if h == 0.5:
data = np.cumsum(np.random.randn(500))
else:
# Simplified fBm generation
data = np.cumsum(np.random.randn(500) * (h - 0.5 + 0.1))
axes[i].plot(data)
axes[i].set_title(f'Hurst ≈ {h} - {"Anti-persistent" if h < 0.5 else "Random" if h == 0.5 else "Persistent"}')
axes[i].set_ylabel('Value')
plt.xlabel('Time')
plt.tight_layout()
plt.show()
Choosing the Right Estimator
Estimator Categories
1. Temporal Methods
DFA: Robust, works well with trends
R/S Analysis: Classic method, sensitive to trends
Higuchi: Good for short time series
GHE: Generalized approach, flexible
2. Spectral Methods
Periodogram: Fast, good for stationary data
GPH: Robust to trends, good for long series
Whittle MLE: Maximum likelihood, statistically optimal
3. Wavelet Methods
DWT: Good time-frequency localization
NDWT: Better for non-stationary data
Abry-Veitch: Robust wavelet method
4. Multifractal Methods
MFDFA: Captures multifractal properties
MF-DMA: Alternative to MFDFA
Selection Guide
def choose_estimator(data_length, data_type, has_trends=False):
"""Guide for choosing the right estimator."""
if data_length < 100:
return EstimatorType.HIGUCHI
elif data_length < 500:
if has_trends:
return EstimatorType.DFA
else:
return EstimatorType.PERIODOGRAM
else:
if has_trends:
return EstimatorType.DFA
elif data_type == 'stationary':
return EstimatorType.WHITTLE_MLE
else:
return EstimatorType.ABRY_VEITCH
# Example usage
data_length = len(your_data)
estimator = choose_estimator(data_length, 'non-stationary', has_trends=True)
result = factory.estimate(your_data, estimator)
Method Comparison Example
# Compare methods for your specific data
methods_to_compare = [
EstimatorType.DFA,
EstimatorType.PERIODOGRAM,
EstimatorType.WHITTLE_MLE,
EstimatorType.ABRY_VEITCH
]
results = {}
for method in methods_to_compare:
try:
result = factory.estimate(data, method)
results[method.value] = {
'hurst': result.hurst_estimate,
'ci': result.confidence_interval,
'r_squared': result.regression_r_squared,
'time': result.computation_time
}
except Exception as e:
print(f"Method {method.value} failed: {e}")
# Display comparison
print("Method Comparison:")
print("-" * 50)
for method, metrics in results.items():
print(f"{method:15s}: H={metrics['hurst']:.3f}, R²={metrics['r_squared']:.3f}, Time={metrics['time']:.3f}s")
Data Preprocessing
Quality Assessment
from neurological_lrd_analysis import BiomedicalDataProcessor
processor = BiomedicalDataProcessor()
# Assess data quality
quality_metrics = processor.assess_data_quality(data)
print("Data Quality Assessment:")
print(f"Overall score: {quality_metrics['data_quality_score']:.3f}")
print(f"Missing data: {quality_metrics['missing_data_fraction']:.1%}")
print(f"Outliers: {quality_metrics['outlier_fraction']:.1%}")
print(f"SNR: {quality_metrics.get('signal_to_noise_ratio', 'N/A')} dB")
print(f"Stationarity p-value: {quality_metrics.get('stationarity_p_value', 'N/A')}")
Preprocessing Options
# Basic preprocessing
processed_data, log = processor.preprocess_biomedical_data(
data,
handle_missing='interpolate', # or 'remove', 'forward_fill'
remove_outliers=True,
detrend=True,
filter_artifacts=True
)
print("Preprocessing log:")
for key, value in log.items():
print(f" {key}: {value}")
# Advanced preprocessing with convergence trimming
processed_data, log = processor.preprocess_biomedical_data(
data,
trim_convergence=True,
stability_threshold=0.05,
min_stable_fraction=0.1
)
Handling Missing Data
# Example with missing data
data_with_missing = data.copy()
data_with_missing[100:110] = np.nan # Add missing values
# Different handling strategies
strategies = ['interpolate', 'remove', 'forward_fill']
results = {}
for strategy in strategies:
processed, _ = processor.preprocess_biomedical_data(
data_with_missing,
handle_missing=strategy
)
result = factory.estimate(processed, EstimatorType.DFA)
results[strategy] = result.hurst_estimate
print("Missing data handling comparison:")
for strategy, hurst in results.items():
print(f" {strategy}: H = {hurst:.3f}")
Advanced Features
Custom Confidence Intervals
from neurological_lrd_analysis import ConfidenceMethod
# Bootstrap confidence intervals
result = factory.estimate(
data,
EstimatorType.DFA,
confidence_method=ConfidenceMethod.BOOTSTRAP,
confidence_level=0.99,
n_bootstrap=1000
)
# Theoretical confidence intervals
result = factory.estimate(
data,
EstimatorType.DFA,
confidence_method=ConfidenceMethod.THEORETICAL,
confidence_level=0.95
)
Method-Specific Parameters
# DFA with custom parameters
result = factory.estimate(
data,
EstimatorType.DFA,
min_window=20,
max_window=200,
polynomial_order=2,
overlap=0.3
)
# Wavelet methods with custom parameters
result = factory.estimate(
data,
EstimatorType.DWT,
wavelet='db8',
max_level=6
)
# Multifractal methods
result = factory.estimate(
data,
EstimatorType.MFDFA,
q=2.0,
min_window=50,
max_window=500
)
Batch Processing
# Process multiple time series
time_series_list = [data1, data2, data3]
results = []
for i, ts in enumerate(time_series_list):
result = factory.estimate(ts, EstimatorType.DFA)
results.append({
'series_id': i,
'hurst': result.hurst_estimate,
'ci': result.confidence_interval,
'quality': result.data_quality_score
})
# Convert to DataFrame for analysis
import pandas as pd
df = pd.DataFrame(results)
print(df)
Performance Optimization
Lazy Loading
The library uses lazy loading for heavy dependencies:
# Heavy modules are loaded only when needed
# This reduces startup time and memory usage
# First use of wavelet methods loads PyWavelets
result = factory.estimate(data, EstimatorType.DWT)
# First use of optimization loads scipy.optimize
result = factory.estimate(data, EstimatorType.WHITTLE_MLE)
Memory Management
# For large datasets, consider chunking
def process_large_dataset(data, chunk_size=10000):
results = []
for i in range(0, len(data), chunk_size):
chunk = data[i:i+chunk_size]
result = factory.estimate(chunk, EstimatorType.DFA)
results.append(result.hurst_estimate)
return results
# Process in chunks
large_data = np.random.randn(100000)
chunk_results = process_large_dataset(large_data)
Parallel Processing
from concurrent.futures import ThreadPoolExecutor
import multiprocessing as mp
def estimate_hurst_parallel(data, method):
return factory.estimate(data, method)
# Parallel estimation of multiple methods
methods = [EstimatorType.DFA, EstimatorType.PERIODOGRAM, EstimatorType.WHITTLE_MLE]
with ThreadPoolExecutor(max_workers=mp.cpu_count()) as executor:
futures = [executor.submit(estimate_hurst_parallel, data, method) for method in methods]
results = [future.result() for future in futures]
for result in results:
print(f"{result.estimator_name}: {result.hurst_estimate:.3f}")
Troubleshooting
Common Issues
1. “Data too short” Error
# Minimum data length requirements
min_lengths = {
'DFA': 50,
'Higuchi': 20,
'Periodogram': 100,
'R/S': 50,
'GPH': 100,
'Whittle MLE': 100,
'GHE': 50,
'DWT': 100,
'Abry-Veitch': 100,
'NDWT': 100,
'MFDFA': 50,
'MF-DMA': 50
}
# Check data length
if len(data) < min_lengths['DFA']:
print(f"Data too short for DFA. Minimum: {min_lengths['DFA']}, Got: {len(data)}")
2. “Insufficient valid windows” Error
# Adjust window parameters for short data
if len(data) < 500:
result = factory.estimate(
data,
EstimatorType.DFA,
min_window=10,
max_window=len(data)//4
)
3. “Too many missing values” Error
# Check missing data fraction
missing_fraction = np.sum(np.isnan(data)) / len(data)
if missing_fraction > 0.2:
print(f"Too many missing values: {missing_fraction:.1%}")
# Consider different handling strategy
processed, _ = processor.preprocess_biomedical_data(
data,
handle_missing='remove' # or 'forward_fill'
)
4. Convergence Issues
# Check convergence flags
result = factory.estimate(data, EstimatorType.DFA)
if not result.convergence_flag:
print("Method did not converge. Try:")
print("1. Different window parameters")
print("2. Data preprocessing")
print("3. Alternative method")
# Try with different parameters
result = factory.estimate(
data,
EstimatorType.DFA,
min_window=20,
max_window=100,
polynomial_order=1
)
Debugging Tips
# Enable verbose logging
import logging
logging.basicConfig(level=logging.DEBUG)
# Check data quality
quality = processor.assess_data_quality(data)
if quality['data_quality_score'] < 0.5:
print("Poor data quality detected")
print(f"Missing: {quality['missing_data_fraction']:.1%}")
print(f"Outliers: {quality['outlier_fraction']:.1%}")
# Test with synthetic data
synthetic_data = np.cumsum(np.random.randn(1000))
result = factory.estimate(synthetic_data, EstimatorType.DFA)
print(f"Synthetic test: H = {result.hurst_estimate:.3f}")
Best Practices
1. Data Preparation
# Always assess data quality first
quality = processor.assess_data_quality(data)
if quality['data_quality_score'] < 0.7:
print("Warning: Poor data quality")
# Preprocess data appropriately
processed_data, log = processor.preprocess_biomedical_data(
data,
handle_missing='interpolate',
remove_outliers=True,
detrend=True,
filter_artifacts=True
)
2. Method Selection
# Use multiple methods for robust estimation
methods = [EstimatorType.DFA, EstimatorType.PERIODOGRAM, EstimatorType.WHITTLE_MLE]
results = []
for method in methods:
try:
result = factory.estimate(processed_data, method)
if result.convergence_flag and result.regression_r_squared > 0.7:
results.append(result)
except Exception as e:
print(f"Method {method.value} failed: {e}")
# Use ensemble estimate
if results:
ensemble_hurst = np.mean([r.hurst_estimate for r in results])
print(f"Ensemble estimate: {ensemble_hurst:.3f}")
3. Validation
# Validate results
def validate_hurst_estimate(result):
"""Validate Hurst estimate results."""
checks = []
# Check convergence
checks.append(("Convergence", result.convergence_flag))
# Check R-squared
if result.regression_r_squared:
checks.append(("R-squared > 0.7", result.regression_r_squared > 0.7))
# Check confidence interval width
ci_width = result.confidence_interval[1] - result.confidence_interval[0]
checks.append(("CI width < 0.3", ci_width < 0.3))
# Check data quality
checks.append(("Quality > 0.5", result.data_quality_score > 0.5))
return checks
# Validate result
checks = validate_hurst_estimate(result)
for check_name, passed in checks:
status = "✓" if passed else "✗"
print(f"{status} {check_name}")
4. Reporting
def generate_report(data, result):
"""Generate a comprehensive report."""
report = f"""
Hurst Exponent Analysis Report
=============================
Data Information:
- Length: {len(data)}
- Mean: {np.mean(data):.3f}
- Std: {np.std(data):.3f}
- Quality Score: {result.data_quality_score:.3f}
Estimation Results:
- Method: {result.estimator_name}
- Hurst Exponent: {result.hurst_estimate:.3f}
- Confidence Interval: [{result.confidence_interval[0]:.3f}, {result.confidence_interval[1]:.3f}]
- Standard Error: {result.standard_error:.3f}
- Computation Time: {result.computation_time:.3f}s
Quality Metrics:
- Convergence: {'Yes' if result.convergence_flag else 'No'}
- R-squared: {result.regression_r_squared:.3f if result.regression_r_squared else 'N/A'}
- Missing Data: {result.missing_data_fraction:.1%}
- Outliers: {result.outlier_fraction:.1%}
Interpretation:
- H = {result.hurst_estimate:.3f} indicates {'persistent' if result.hurst_estimate > 0.5 else 'anti-persistent' if result.hurst_estimate < 0.5 else 'random'} behavior
- {'High' if result.data_quality_score > 0.8 else 'Moderate' if result.data_quality_score > 0.6 else 'Low'} data quality
"""
return report
# Generate and print report
report = generate_report(data, result)
print(report)
5. Reproducibility
# Set random seeds for reproducibility
np.random.seed(42)
# Use consistent parameters
default_params = {
'confidence_method': ConfidenceMethod.BOOTSTRAP,
'confidence_level': 0.95,
'n_bootstrap': 100,
'preprocess': True,
'assess_quality': True
}
# Save results with metadata
import json
from datetime import datetime
result_data = {
'timestamp': datetime.now().isoformat(),
'data_length': len(data),
'method': result.estimator_name,
'hurst_estimate': result.hurst_estimate,
'confidence_interval': result.confidence_interval,
'parameters': default_params,
'quality_score': result.data_quality_score
}
# Save to file
with open('hurst_analysis_results.json', 'w') as f:
json.dump(result_data, f, indent=2)
## Machine Learning Estimators
The library includes advanced machine learning estimators that often outperform classical methods on short or contaminated biomedical signals.
### 🎯 Fast ML Prediction
Using pre-trained models for instantaneous Hurst estimation:
```python
from neurological_lrd_analysis.ml_baselines import quick_predict
# Estimate using a pre-trained Random Forest model
# Note: 'pretrained_models' is the directory where model weights are stored
hurst_rf = quick_predict(
data=your_data,
model_dir="pretrained_models",
method="random_forest"
)
print(f"ML Estimate (RF): {hurst_rf:.3f}")
🤝 Ensemble Prediction (Recommended)
For the highest accuracy and automated uncertainty quantification, use the ensemble inference:
from neurological_lrd_analysis.ml_baselines import quick_ensemble_predict
# Get ensemble estimate and uncertainty
hurst_ensemble, uncertainty = quick_ensemble_predict(
data=your_data,
model_dir="pretrained_models"
)
print(f"Ensemble Estimate: {hurst_ensemble:.3f}")
print(f"Model Uncertainty: {uncertainty:.3f}")
🧠 Training & Optimization
If you have custom data, you can optimize hyperparameters and train your own models:
from neurological_lrd_analysis.ml_baselines import optimize_hyperparameters, MLBaselineType
# Optimize Random Forest for your specific data
best_params = optimize_hyperparameters(
method_type=MLBaselineType.RANDOM_FOREST,
data=X_train,
labels=y_train,
n_trials=50
)
📊 Comparing Classical and ML
You can use the specialized benchmark class to see which approach works best for your data:
from neurological_lrd_analysis.ml_baselines import ClassicalMLBenchmark
benchmark = ClassicalMLBenchmark(model_dir="pretrained_models")
comparison = benchmark.run_comprehensive_benchmark(n_samples=20)
print(comparison['leaderboard'])
This tutorial provides a comprehensive guide to using the Neurological LRD Analysis library. For more advanced topics and specific use cases, refer to the API Reference and the project documentation.