Download this notebook

Step 3: Calculate and Prepare Metrics

This notebook walks through the metric workflow outputs used by the rest of Spatial-VTK. You will resolve metric settings from the config, write a task preview, create a long metrics table, and export dashboard-ready files.

Imports

These helpers resolve metric settings and write standard downstream metric outputs.

[1]:
from spatial_vtk.config.notebook import notebook_timer, register_svtk_cell_timer

with notebook_timer():
    from IPython.display import display
    from spatial_vtk.config import SpatialVTKConfig
    from spatial_vtk.config.metrics import metric_settings_summary, metrics_settings_from_config
    from spatial_vtk.io import read_config_table, load_output_table, write_output_table
    from spatial_vtk.metrics.workflow import summarize_metric_tasks, write_metric_outputs
    from spatial_vtk.metrics.plot import plot_band_score_distribution, plot_residuals_vs_distance, plot_score_trends
    from spatial_vtk.spatial.map import plot_station_metric_map
    register_svtk_cell_timer()
Run time: 38.84 s

Configuration

Load the config, then choose a small set of metrics and transforms for this tutorial run.

[2]:
from pathlib import Path

# Use the repository root so paths match the public source checkout.
repo_root = Path.cwd()
config_path = repo_root / "data/examples/configuration/example_spatial_vtk_config.yaml"

# Load the tutorial run scenario and make it the active config for later package calls.
cfg = SpatialVTKConfig.from_file(config_path, run_scenario="tutorial").activate()

notebook_overrides = {
    "groups": ["amplitude", "spectral"],
    "transforms": ["log2_residual", "anderson_2004_gof"],
    "output_mode": "full",
}
Run time: 22.2 ms

Resolve the Metric Settings

The config file provides the defaults. The override above narrows the tutorial to amplitude and spectral metrics.

[3]:
# Resolve the metric plan from the active config plus the small notebook override.
metric_settings = metrics_settings_from_config(overrides=notebook_overrides)

# Display the metric settings with human-readable labels.
metric_settings_summary(metric_settings)
[3]:
Setting Value
0 Groups amplitude, spectral
1 Metrics PGA, PGV, PGD, PSA, FAS
2 Transforms log2(observed / synthetic), Anderson 2004 GOF
3 Components Z, R, T
4 Passbands 1-2 sec, 2-3 sec
5 Output mode full
6 Spectral periods 1 s, 2 s, 3 s, 5 s
7 Synthetic max frequency 1 Hz
Run time: 29.5 ms

Write a Task Preview

A full run would create one task per event, station, component, model, passband, and metric set. This preview shows the kind of task table you can batch locally or submit on SLURM.

[4]:
# Read the example metric snapshot from the configured input path.
metric_snapshot = read_config_table("paths.metric_snapshot")

# Build the complete task table from the unique event, station, component, model, and passband combinations.
metric_tasks = metric_snapshot[["event_id", "station", "component", "model", "band"]].drop_duplicates().copy()
metric_tasks["metrics"] = ", ".join(metric_settings.metrics)
metric_tasks["transforms"] = ", ".join(metric_settings.transforms)
metric_tasks["output_mode"] = metric_settings.output_mode
metric_tasks["use_qc"] = True

# Estimate the number of tasks and rough resources needed for this metric run.
task_estimate = summarize_metric_tasks(
    metric_tasks,
    seconds_per_task=60.0,
    memory_gb_per_task=2.0,
    cpus_per_task=1,
    parallel_tasks=4,
)

# Save the full task table so a larger run can be launched in batches or on a cluster.
write_output_table("metric_tasks", metric_tasks)

# Show the resource estimate, then preview the first few tasks.
display(task_estimate)
metric_tasks.head(12)
Estimate Value Notes
0 Metric tasks 20 One task is one event-station-component-model-...
1 Approximate metric evaluations 100 Task count multiplied by the requested metrics...
2 Unique events 5
3 Unique stations 5
4 Components R, T, Z
5 Models cvmsi
6 Passbands 1-2 sec, 2-3 sec, 3-5 sec Period-band labels, for example 1-2 sec.
7 Approximate CPU-hours 0.33 Uses 60 seconds/task and 1 CPU/task.
8 Memory per task 2 GB Planning estimate; benchmark a small batch for...
9 Wall time at 4 parallel tasks 5.0 min Approximate elapsed time if every task takes t...
10 Peak memory at 4 parallel tasks 8 GB Memory per task multiplied by concurrent tasks.
[4]:
event_id station component model band metrics transforms output_mode use_qc
0 ci38038071 BFS Z cvmsi 1-2 sec PGA, PGV, PGD, PSA, FAS log2_residual, anderson_2004_gof full True
1 ci38038071 BHP Z cvmsi 1-2 sec PGA, PGV, PGD, PSA, FAS log2_residual, anderson_2004_gof full True
2 ci38695658 BLC R cvmsi 2-3 sec PGA, PGV, PGD, PSA, FAS log2_residual, anderson_2004_gof full True
3 ci39812319 BRE T cvmsi 3-5 sec PGA, PGV, PGD, PSA, FAS log2_residual, anderson_2004_gof full True
4 ci38695658 CAC Z cvmsi 1-2 sec PGA, PGV, PGD, PSA, FAS log2_residual, anderson_2004_gof full True
5 ci39812319 BHP Z cvmsi 1-2 sec PGA, PGV, PGD, PSA, FAS log2_residual, anderson_2004_gof full True
6 ci39756418 BLC Z cvmsi 1-2 sec PGA, PGV, PGD, PSA, FAS log2_residual, anderson_2004_gof full True
7 ci40699207 BRE Z cvmsi 1-2 sec PGA, PGV, PGD, PSA, FAS log2_residual, anderson_2004_gof full True
8 ci38695658 BFS Z cvmsi 1-2 sec PGA, PGV, PGD, PSA, FAS log2_residual, anderson_2004_gof full True
9 ci39812319 CAC Z cvmsi 1-2 sec PGA, PGV, PGD, PSA, FAS log2_residual, anderson_2004_gof full True
10 ci39756418 BHP Z cvmsi 1-2 sec PGA, PGV, PGD, PSA, FAS log2_residual, anderson_2004_gof full True
11 ci40699207 BLC Z cvmsi 1-2 sec PGA, PGV, PGD, PSA, FAS log2_residual, anderson_2004_gof full True
Run time: 105.2 ms

Write Standard Metric Outputs

The long metrics table keeps observed values, synthetic values, residuals, scores, and metadata in one row-oriented format.

[5]:
# Read prepared station metadata for metric enrichment.
stations = load_output_table("prepared_stations")

# Read prepared event metadata for metric enrichment.
events = load_output_table("prepared_events")

# Write the standard long metrics, path summary, and dashboard-ready metric outputs.
write_metric_outputs(
    metric_snapshot,
    events=events,
    stations=stations,
    residual_column="log2_residual",
    score_column="anderson_2004_gof",
    table_format="parquet",
)

# Read the long metrics table written by the workflow helper.
metrics_long = load_output_table("metrics_long")

# Save the enriched metrics table used by the spatial and mapping notebooks.
write_output_table("metrics_enriched", metrics_long)
metrics_long.head()
[5]:
event_id station network component model band metric metric_group period_s value_obs ... strike dip rake usgs_url observed_pickle event_json synthetic_mseed selected_station_count overlapping_broadband_station_count event_count
0 ci38038071 BFS CI Z cvmsi 1-2 sec PGA amplitude NaN 0.118 ... 312.0 79.0 178.0 https://earthquake.usgs.gov/earthquakes/eventp... data/examples/example_five_event_subset/obser... data/examples/example_five_event_subset/obser... data/examples/example_five_event_subset/synth... 30 52 4.0
1 ci38038071 BHP CI Z cvmsi 1-2 sec PGV amplitude NaN 5.410 ... 312.0 79.0 178.0 https://earthquake.usgs.gov/earthquakes/eventp... data/examples/example_five_event_subset/obser... data/examples/example_five_event_subset/obser... data/examples/example_five_event_subset/synth... 30 52 5.0
2 ci38695658 BLC CI R cvmsi 2-3 sec PSA spectral 2.0 0.284 ... 96.0 48.0 108.0 https://earthquake.usgs.gov/earthquakes/eventp... data/examples/example_five_event_subset/obser... data/examples/example_five_event_subset/obser... data/examples/example_five_event_subset/synth... 30 75 4.0
3 ci39812319 BRE CI T cvmsi 3-5 sec FAS spectral 3.0 0.037 ... 122.0 85.0 179.0 https://earthquake.usgs.gov/earthquakes/eventp... data/examples/example_five_event_subset/obser... data/examples/example_five_event_subset/obser... data/examples/example_five_event_subset/synth... 30 77 5.0
4 ci38695658 CAC CI Z cvmsi 1-2 sec PGA amplitude NaN 0.096 ... 96.0 48.0 108.0 https://earthquake.usgs.gov/earthquakes/eventp... data/examples/example_five_event_subset/obser... data/examples/example_five_event_subset/obser... data/examples/example_five_event_subset/synth... 30 75 NaN

5 rows × 47 columns

Run time: 5.75 s

Make Metric Diagnostic Figures

These figures use a larger QC-passed metrics table so the trends are easier to inspect than they would be with the tiny task-preview snapshot. The distance plots use LOWESS fits so the trend lines stay flexible without forcing a straight-line relationship.

[6]:
# Load the larger QC-passed metric table for plotting examples.
figure_metrics = read_config_table("paths.metric_figure_snapshot")

# Keep these example figures focused on metrics with enough rows to show clear trends.
figure_metrics = figure_metrics.loc[figure_metrics["metric"].isin(["PGA", "PGV", "PGD"])].copy()

# Plot residuals against source-to-station distance with a LOWESS fit for each metric.
distance_fig = plot_residuals_vs_distance(
    figure_metrics,
    y_col="log2_residual",
    group_col="metric",
    fit="lowess",
    connect_points=False,
    title="Residuals vs Distance",
    showfig=True,
    savefig=True,
)

# Plot GOF scores against distance with a LOWESS fit for each metric.
score_fig = plot_score_trends(
    figure_metrics,
    score_col="anderson_2004_gof",
    group_col="metric",
    fit="lowess",
    connect_points=False,
    title="Anderson 2004 GOF vs Distance",
    showfig=True,
    savefig=True,
)
../_images/examples_step_03_calculate_metrics_13_0.png
../_images/examples_step_03_calculate_metrics_13_1.png
Run time: 1.61 s

Map Station Residuals

A station map is a useful first look at whether residuals are spatially organized. Here, each station is colored by its mean PGA log2 residual.

[7]:
# Average PGA residuals by station so each station appears once on the map.
station_pga = (
    figure_metrics.loc[figure_metrics["metric"].eq("PGA")]
    .groupby(["station", "sta_lat", "sta_lon", "model", "component", "metric"], dropna=False, as_index=False)["log2_residual"]
    .mean()
)

# Map mean station residuals with a basemap for geographic context.
station_metric_fig = plot_station_metric_map(
    station_pga,
    value_col="log2_residual",
    title="Mean Station PGA Log2 Residual",
    showfig=True,
    savefig=True,
)
../_images/examples_step_03_calculate_metrics_15_0.png
Run time: 2.70 s

Compare Residuals by Period Band

This distribution groups log2 residuals by period band and metric so you can quickly see whether residual behavior changes across the requested bands.

[8]:
# Compare log2 residual distributions across period bands for the same metric set.
band_residual_fig = plot_band_score_distribution(
    figure_metrics,
    band_col="band",
    score_col="log2_residual",
    color_col="metric",
    title="Band Residual Distribution (CVM-SI)",
    showfig=True,
    savefig=True,
)
../_images/examples_step_03_calculate_metrics_17_0.png
Run time: 428.0 ms