Pilot Research on Machine Learning-driven Mainshock Prediction for Earthquake Precaution
Pilot Research: Mainshock Prediction Using Machine Learning
Exploring earthquake patterns with data-driven methods for safer communities
Motivation
In recent months, Bangladesh has experienced a surge in seismic activity, culminating in a 5.7-magnitude earthquake near Narsingdi and Ghorashal that caused multiple fatalities, widespread building damage, and public panic. For a developing country like Bangladesh, where infrastructure is comparatively weaker and urban density is extremely high, distinguishing whether an initial quake is a mainshock or a foreshock is critically important for emergency response and public safety. However, identifying these classifications before the seismic sequence unfolds is not possible through traditional seismology, leaving communities vulnerable.
This motivated our research to explore data-driven approaches leveraging decades of historical earthquake data, aiming to predict whether an event is likely a mainshock and provide actionable insights for risk mitigation.
Dataset Preparation
We used the global earthquake dataset spanning 1930–2018, sourced from Kaggle (“Earthquakes for ML Prediction” by Gustavo Martins) with data from the United States Geological Survey (USGS). The dataset contains over 797,000 events with latitude, longitude, depth, magnitude, and station measurements.
Our preprocessing pipeline included:
- Converting all magnitudes to moment magnitude (Mw) for consistency.
- Iterative imputation of missing numeric values to retain the full dataset.
- Feature engineering from timestamps: year, month, day, day of week, hour, minute, second, and elapsed time since previous quake.
- Algorithmic labeling of seismic sequences: foreshock, mainshock, and aftershock, based on 30-day temporal and 50 km spatial windows.
- Retention of only mainshock labels for modeling, discarding foreshocks and aftershocks for focused prediction.
Results
We trained gradient-boosting models, XGBoost and LightGBM, on the preprocessed dataset (88,457 mainshock/non-mainshock samples). The models were evaluated on accuracy, precision, recall, F1-score, and AUC.
XGBoost: Accuracy: 0.78 | AUC: 0.851 | F1-score (mainshock): 0.82
LightGBM: Accuracy: 0.76 | AUC: 0.830 | F1-score (mainshock): 0.81
SHAP analysis revealed the most influential features for predicting mainshocks:
- Magnitude (mag_in_mw)
- Latitude & Longitude
- Depth
- Time since previous quake
- Number of seismic stations, azimuthal gap, RMS residuals
These results demonstrate that even using event-level features alone, predictive signals for mainshock identification exist and can be captured by modern machine learning models.
Future Direction & Conclusion
This pilot study provides a reproducible pipeline for preparing seismic data and building baseline predictive models for mainshock identification. Future work can focus on:
- Incorporating physical modeling of tectonic stress and fault zones to enhance prediction accuracy.
- Testing sequence-aware models (e.g., LSTM, Transformers) to capture temporal dependencies between foreshocks and mainshocks.
- Building region-specific models for high-risk areas such as Bangladesh, integrating local soil, urban density, and infrastructure data.
- Deploying real-time early-warning frameworks for public safety applications.
In conclusion, our research demonstrates that data-driven machine learning methods can extract meaningful predictive signals from historical earthquake records. While further refinement is required, these results represent a first step toward actionable earthquake hazard assessment and preparedness. The deliverables of this research will be published in E1 web after publishing the research paper.