Earthquake AI: Predicting Seismic Risk for Turkey With Machine Learning

Turkey sits at the convergence of the Eurasian, Anatolian, and Arabian tectonic plates. It is one of the most seismically active countries in the world. The 1999 İzmit earthquake killed over 17,000 people. The 2023 Kahramanmaraş earthquakes killed over 50,000. Seismic risk here is not theoretical — it is a present, recurring condition that affects millions of people's decisions about where to live, what to build, and how to prepare.

We built Earthquake AI to ask a specific question: can a machine learning model, trained on historical seismic data, produce meaningful probability estimates for earthquake occurrence in different regions — and if so, how does that model perform compared to existing global seismic hazard frameworks?

What We Built

The system is a web application that provides earthquake probability estimates across 10 regions covering Turkey and the surrounding seismic geography. Each estimate is band-based: rather than predicting a single magnitude, the model outputs probability distributions across magnitude bands — M4–M5, M5–M6, M6–M7, and M7+. This reflects how seismic hazard is actually discussed and used in engineering and policy contexts.

A personal risk report feature aggregates regional probability estimates, historical frequency data, and distance-from-fault information into a user-readable risk profile for a specified location. The goal is to translate technical probability estimates into something actionable for individuals and organizations.

The Model Architecture

The core of the system is a champion-competitor model architecture. Rather than deploying a single model, we run a pool of candidate models continuously evaluated against incoming seismic data. The "champion" model is the current best performer on a defined accuracy metric. Competitor models are either updated versions of the champion or alternative approaches that are evaluated in parallel.

When a competitor model outperforms the champion on a statistically significant sample of recent events, it replaces the champion. This allows the system to improve continuously as new seismic data is collected, without requiring manual retraining decisions.

Each model in the pool takes 102 features as input. These include seismic features (recent micro-earthquake frequency in surrounding zones, depth distribution of recent activity, strain rate estimates, time since last significant event per fault segment), geographic features (proximity to known fault systems, crustal thickness estimates, regional geological classifications), and temporal features (seasonal patterns in seismic activity, which appear in the historical record for some regions).

Ten Regions, Different Data Densities

The 10 regions covered are not uniform in data quality. The Marmara region, which includes Istanbul, has the densest seismic monitoring network in Turkey — the Kandilli Observatory and AFAD maintain extensive instrumentation, and every micro-earthquake is recorded and catalogued. Historical records here go back centuries for major events and decades for instrumentally recorded events.

Other regions — particularly in eastern Anatolia and across the Syrian border zone — have sparser instrumentation and less complete historical records. This creates a real modelling problem: the regions where prediction accuracy matters most (high-risk zones with historically catastrophic events) are often the regions with the least complete data.

We handle this by widening uncertainty bands for data-sparse regions and explicitly reporting prediction confidence alongside probability estimates. A prediction with a narrow confidence interval in a data-rich region means something different than the same nominal probability in a region where the historical record has gaps.

Benchmarking Against Global Models

Part of the project was comparative: how does a locally-trained ML model compare against established global seismic hazard frameworks for the same regions?

We benchmarked against two reference frameworks that publish probabilistic seismic hazard assessments for Turkey: the Global Earthquake Model (GEM) Foundation's hazard data, and the USGS PAGER regional hazard estimates. These frameworks use physics-based approaches — they model fault geometry, slip rates, and rupture propagation mechanics directly, rather than learning from historical data statistically.

The comparison is not straightforward, because the frameworks produce different output types and use different probability windows. We normalized the outputs to a common format: probability of exceeding magnitude thresholds in a 50-year window, disaggregated by the same regional boundaries we use.

The results were instructive. For the Marmara and Aegean regions, where historical data is dense and fault systems are well-characterised, our ML model's estimates aligned closely with the physics-based frameworks. For eastern Anatolian regions where the 2023 earthquakes occurred, there were more significant divergences — which raised questions about whether the divergence reflected genuine differences in how the approaches model that specific fault system, or whether it reflected gaps in the historical training data that the ML model couldn't overcome.

Personal Risk Reports

Beyond the regional probability estimates, the application generates personal risk reports for specific locations. A user enters an address or coordinates, and the system produces a report that combines regional seismic probability estimates with site-specific factors: distance to the nearest major fault, local geological conditions where data is available, and a historical frequency count of significant events within specified distances.

The report is deliberately non-prescriptive. It presents probability estimates and context, not recommendations. Seismic risk communication is a field where oversimplification creates its own harms — we wanted to make the uncertainty explicit rather than hiding it behind a false precision.

What This Project Is and Isn't

Earthquake AI is a Talivio Labs project — a research and development effort rather than a production SaaS product. It does not claim to predict when or where specific earthquakes will occur. No such prediction is possible with current scientific understanding. What it does is estimate probability distributions based on historical patterns and current seismic conditions — the same kind of probabilistic reasoning that underlies building codes and insurance actuarial models.

The champion-competitor architecture means the system is designed to improve as data accumulates. The benchmark comparison with established global models gives us a basis for evaluating whether improvements in the ML model represent genuine gains in predictive accuracy or just overfitting to recent data.

We're continuing to develop the model and expand the regional coverage. If you're working in seismic risk assessment, emergency planning, or building code policy in Turkey or the surrounding region and want to explore what this kind of ML-augmented approach can offer alongside conventional hazard modelling, we're interested in the conversation.

Welcome

Earthquake AI: Predicting Seismic Risk for Turkey With Machine Learning

What We Built

The Model Architecture

Ten Regions, Different Data Densities

Benchmarking Against Global Models

Personal Risk Reports

What This Project Is and Isn't

Earthquake AI

Eren Bostan

More from the Blog

Introducing Talivio News: An AI-Written, Editorially Governed Newsroom

Samplio: Test Your Message on a Synthetic Europe Before You Spend the Budget

Conditional Surveys in VoxSim: Skip Logic Without Trusting the Model to Do the Math