Configuration
Spatial-VTK can read one project configuration file so your notebooks,
scripts, and CLI commands do not need long path lists or repeated settings.
Use it for project paths, output folders, named map bounds, metric choices,
synthetic frequency limits, waveform preprocessing, and defaults you want to
reuse across workflows.
The top-level sections are the defaults for your project. Optional
run_scenarios let you reuse a focused set of overrides for a particular
analysis without editing the main config.
Start With This Example
Save a config file such as spatial-vtk.yaml in your project folder, then
edit the paths and settings for your data. The comments in this example are
meant to show what each section is for.
The downloadable example includes a tutorial run scenario that points at
the lightweight LA Basin metadata committed in data/examples/. That
scenario still expects you to download or generate the companion waveform
bundle before you run the full notebooks.
Download example_spatial_vtk_config.yaml
# PROJECT
# Give this workflow a name and set the project root. Relative paths below are
# resolved from the directory that contains this config file unless you use
# {root_dir} or {config_dir} explicitly.
project:
name: example_ground_motion_validation
root_dir: ../../..
# PATHS
# Point Spatial-VTK to the input files you already have on disk.
# Templates can use {root_dir}, {config_dir}, {model}, and {event_id}.
paths:
observed_root: data/observed
synthetic_template: data/synthetics/{model}/{event_id}.mseed
station_metadata: data/metadata/stations.csv
event_metadata: data/metadata/events.csv
event_station_table: data/metadata/event_station_records.csv
region_geojson: data/metadata/regions.geojson
# OUTPUTS
# Choose where generated tables, figures, maps, and dashboard files should go.
outputs:
root: outputs
tables: outputs/tables
prepared_inputs: outputs/prepared
preprocessed_waveforms: outputs/preprocessed_waveforms
qc: outputs/qc
metrics: outputs/metrics/metrics_long.parquet
spatial: outputs/spatial
figures: outputs/figures
dashboards: outputs/dashboards
# NOTEBOOKS
# Set this to false if you do not want tutorial cells to print run-time lines.
notebooks:
show_cell_timing: true
# COMPUTE
# Optional SLURM settings for long-running workflows. Task-specific sections
# such as qc.slurm or metrics.slurm can override any value below.
compute:
slurm:
python_command: python
# environment_setup:
# - module load mamba
# - mamba activate spatial-vtk-py312
partition:
account:
walltime: "12:00:00"
memory: 16G
cpus_per_task: 1
max_concurrent: 10
log_dir: outputs/logs
# BOUNDS
# Named bounds let you reuse map windows or station/event subsets.
# Bounds are always [lon_min, lon_max, lat_min, lat_max].
bounds:
presets:
study_area:
lon_min: -119.5
lon_max: -116.5
lat_min: 33.0
lat_max: 35.0
specific_area_of_interest:
lon_min: -118.8
lon_max: -117.8
lat_min: 33.5
lat_max: 34.5
# Optional: load more named bounds from a CSV with columns
# keyword, lon_min, lon_max, lat_min, lat_max.
presets_csv: data/metadata/bounds_presets.csv
# METRICS
# Choose either groups OR individual metrics. Do not set both in one run.
metrics:
# Group options: all, duration, amplitude, spectral, intensity,
# delay, cross_correlation.
# duration: arias_duration, energy_duration
# amplitude: PGA, PGV, PGD
# spectral: PSA, FAS
# intensity: arias_intensity, energy_intensity, CAV
# delay: traveltime_delay
# cross_correlation: original_cc, delay_corrected_cc
groups: [amplitude, spectral, cross_correlation]
# If you prefer to select individual metrics, comment out groups above and
# use metrics instead. Metric options: all, PGA, PGV, PGD, PSA, FAS, CAV,
# arias_duration, energy_duration, arias_intensity, energy_intensity,
# traveltime_delay, original_cc, delay_corrected_cc.
# metrics: [PGA, PGV, PSA, original_cc]
# Transform options: residual, log2_residual, ln_residual,
# anderson_2004_gof, olsen_mayhew_gof.
transforms: [log2_residual, anderson_2004_gof]
# Output mode options: observed, synthetic, residual, gof, full.
output_mode: full
# Component examples: Z, N, E, R, T.
components: [R, T, Z]
# Passbands are period bands in seconds.
passbands:
- [1, 2]
- [2, 3]
- [3, 5]
models: [example_model]
spectral:
# Periods, in seconds, where PSA/FAS values should be stored.
periods_s: [1.0, 2.0, 3.0, 5.0]
# A period passes spectral QC only if its amplitude is at least this
# fraction of the maximum supported spectral amplitude.
relative_amplitude_threshold: 0.25
# Require this many cycles in the record before accepting a period.
min_cycles_in_record: 3.0
# Optional SLURM overrides for metric task arrays.
slurm:
job_name: svtk-metrics
walltime: "24:00:00"
memory: 32G
# SYNTHETICS
# Set the maximum valid synthetic frequency in Hz. Synthetic spectral periods
# shorter than 1 / max_frequency_hz will be rejected.
synthetics:
max_frequency_hz: 0.5
# WAVEFORMS
# Optional project-specific preprocessing applied to observed and synthetic
# waveforms before QC, metric calculations, and waveform figures.
# Leave values empty when your inputs are already filtered/sampled as needed.
# Use either bandpass_low_hz/bandpass_high_hz OR highpass_hz/lowpass_hz.
waveforms:
preprocessing:
lowpass_hz:
highpass_hz:
bandpass_low_hz:
bandpass_high_hz:
resample_hz:
filter_order: 4
# QC
# These automatic checks are used when waveform-level QC is requested.
qc:
automatic:
min_record_length_s: 60.0
min_end_after_origin_s: 60.0
snr_threshold: 3.0
# Optional SLURM overrides for building qc_trace_summary and qc_inventory.
slurm:
job_name: svtk-qc
walltime: "24:00:00"
memory: 32G
# RUN SCENARIOS
# Optional named scenarios let you reuse focused overrides without editing the
# main defaults above. Select one with --run-scenario or run_scenario="...".
run_scenarios:
tutorial:
# This scenario expects the companion five-event LA Basin waveform bundle
# under data/examples/example_five_event_subset/. The repo keeps the
# metadata tables in git and ignores the large MiniSEED products.
paths:
observed_root: "{root_dir}/data/examples/example_five_event_subset/observed"
synthetic_root: "{root_dir}/data/examples/example_five_event_subset/synthetics/cvmsi_20260506_material_0p6x1p2_asdf"
synthetic_template: "{root_dir}/data/examples/example_five_event_subset/synthetics/{model}/{event_id}.mseed"
station_metadata: "{root_dir}/data/examples/example_five_event_subset/metadata/selected_stations.csv"
event_metadata: "{root_dir}/data/examples/example_five_event_subset/metadata/events.csv"
event_station_table: "{root_dir}/data/examples/example_five_event_subset/metadata/selected_event_stations.csv"
site_metadata: "{root_dir}/data/examples/data_formats/example_site_metadata.csv"
region_geojson: "{root_dir}/data/examples/example_five_event_subset/metadata/example_path_regions.geojson"
metric_snapshot: "{root_dir}/data/examples/data_formats/example_metrics_snapshot.csv"
metric_figure_snapshot: "{root_dir}/data/examples/data_formats/example_metrics_large_qc_passed.parquet"
outputs:
tutorials_root: "{root_dir}/outputs/tutorials"
root: "{root_dir}/outputs/tutorials"
tables: "{root_dir}/outputs/tutorials/tables"
preprocessed_waveforms: "{root_dir}/outputs/tutorials/preprocessed_waveforms"
figures: "{root_dir}/outputs/tutorials/figures"
dashboards: "{root_dir}/outputs/tutorials/dashboards"
metrics:
groups: [amplitude, spectral]
transforms: [log2_residual, anderson_2004_gof]
output_mode: full
components: [Z, R, T]
models: [cvmsi_20260506_material_0p6x1p2_asdf]
passbands:
- [1, 2]
- [2, 3]
spectral:
periods_s: [1.0, 2.0, 3.0, 5.0]
synthetics:
max_frequency_hz: 1.0
waveforms:
preprocessing:
lowpass_hz: 1.0
highpass_hz:
bandpass_low_hz:
bandpass_high_hz:
resample_hz:
filter_order: 4
qc:
automatic:
min_record_length_s: 60.0
min_end_after_origin_s: 60.0
snr_threshold: 3.0
spatial:
metric: all
field_mode: log2_residual
value_column: log2_residual
min_stations_per_event: 2
min_events_per_station: 1
moran_neighbors: 2
moran_permutations: 99
distance_bin_width_km: 20
cluster_min_k: 2
cluster_max_k: 4
pca_components: 2
# Compare average residuals between these geology class sets.
# The reported contrast is mean(geology_left_values) - mean(geology_right_values).
geology_group_column: mapped_region_type
geology_left_values: [Basin]
geology_right_values: [Mountains]
geology_statistic: mean
# The tutorial subset is small; use a larger value for full datasets.
geology_min_stations_per_group: 1
geology_bootstrap_samples: 100
random_seed: 42
quick_amplitude_check:
metrics:
groups: [amplitude]
transforms: [log2_residual]
outputs:
metrics: outputs/metrics/quick_amplitude_metrics.parquet
spectral_period_review:
metrics:
metrics: [PSA, FAS]
transforms: [log2_residual, anderson_2004_gof]
spectral:
periods_s: [1.0, 2.0, 3.0, 5.0, 7.5]
Choose Metric Groups Or Metrics
In the metrics section, choose either groups or metrics. Do not set
both for the same run.
groups: all calculates every available metric. You can also choose from
duration (arias_duration, energy_duration), amplitude (PGA,
PGV, PGD), spectral (PSA, FAS), intensity
(arias_intensity, energy_intensity, CAV), delay
(traveltime_delay), and cross_correlation (original_cc,
delay_corrected_cc).
metrics: all also calculates every available metric. Use metrics when
you want a short custom list, such as [PGA, PSA, original_cc].
Set Waveform Preprocessing
If your observed data and synthetics need the same preprocessing before they
are comparable, put that in waveforms.preprocessing. For example, the
tutorial LA Basin scenario lowpasses both observed and synthetic waveforms at
1 Hz:
waveforms:
preprocessing:
lowpass_hz: 1.0
highpass_hz:
bandpass_low_hz:
bandpass_high_hz:
resample_hz:
filter_order: 4
Leave the cutoff and resampling values empty when your input waveforms are
already filtered and sampled the way you want. Use either
bandpass_low_hz/bandpass_high_hz or separate highpass_hz and
lowpass_hz settings. resample_hz writes traces at a new sampling rate
after filtering.
For a full run, preprocess the waveform files once and point later QC, metric, dashboard, and figure steps at the saved processed files:
svtk io preprocess-waveforms \
--config spatial-vtk.yaml \
--run-scenario tutorial \
--records data/metadata/event_station_records.csv
The command writes processed waveforms, a preprocessing manifest, trace
metadata, and an updated event-station table under
outputs.preprocessed_waveforms. You can pass --output-root when you
want to write one run somewhere else. Spatial-VTK only filters or resamples
waveforms when your config, run scenario, Python call, or CLI override asks it
to.
Set Automatic QC Thresholds
Automatic waveform QC can use project-specific thresholds from qc.automatic.
These settings are used when you ask Spatial-VTK to inspect waveform files
before metric calculations:
qc:
automatic:
min_record_length_s: 60.0
min_end_after_origin_s: 60.0
snr_threshold: 3.0
Use these as starting values, then tune them for the record lengths, noise windows, and signal levels in your project.
Submit Heavy Work To Slurm
Long-running QC and metric workflows can run locally from notebooks, or you
can write a Slurm batch script from the same config. Put shared cluster
defaults in compute.slurm:
compute:
slurm:
python_command: python
environment_setup:
- module load mamba
- mamba activate spatial-vtk-py312
partition: main
account: my_account
walltime: "12:00:00"
memory: 16G
cpus_per_task: 1
max_concurrent: 10
log_dir: outputs/logs
Task-specific sections such as qc.slurm and metrics.slurm override
only the values you set there. For example, use more memory for QC inventory
generation without changing metric task arrays:
qc:
slurm:
job_name: svtk-qc
walltime: "24:00:00"
memory: 32G
metrics:
slurm:
job_name: svtk-metrics
walltime: "24:00:00"
memory: 32G
Write a QC inventory Slurm script:
svtk qc slurm \
--config spatial-vtk.yaml \
--event-stations outputs/tables/event_station_records.csv \
--output outputs/slurm/build_qc.slurm
Submit it in the same command when your login node can run sbatch:
svtk qc slurm \
--config spatial-vtk.yaml \
--event-stations outputs/tables/event_station_records.csv \
--output outputs/slurm/build_qc.slurm \
--submit
QC Slurm jobs call the same checkpointed builders used in Python, so rerunning
the job resumes from the saved qc_trace_summary and qc_inventory tables
when those checkpoint paths already exist.
Metric Slurm jobs run as task arrays from a manifest produced by the metric workflow. After creating the manifest, write or submit the array script:
svtk metrics slurm \
--config spatial-vtk.yaml \
--manifest outputs/metrics/metric_manifest.json \
--output outputs/slurm/run_metrics.slurm \
--submit
Show Or Hide Notebook Run Times
The tutorial notebooks register a Spatial-VTK cell timer once near the top of
the notebook. After that, each code cell prints one compact line, such as
Run time: 19.2 ms. You can turn those lines off in the config:
notebooks:
show_cell_timing: false
Leave this set to true when you want the notebooks to show how long each
step took on your machine.
Point Spatial-VTK At Your Config
Spatial-VTK looks for a config file in this order:
A path you pass directly, such as
--config spatial-vtk.yaml.The
SVTK_CONFIG_FILEenvironment variable.A standard filename in your current folder or one of its parent folders:
spatial-vtk.yaml,spatial-vtk.yml,svtk_config.yaml,svtk_config.yml,svtk.yaml, orsvtk.yml.
For a one-time command, pass the file directly:
svtk config show --config spatial-vtk.yaml
svtk metrics plan --config spatial-vtk.yaml --observed-inventory observed.parquet --synthetic-inventory synthetic.parquet --output outputs/tasks.csv
For a whole terminal session, set the environment variable:
export SVTK_CONFIG_FILE=/path/to/spatial-vtk.yaml
svtk config find
In a notebook or script, load the same file with Python:
from spatial_vtk.config import SpatialVTKConfig
cfg = SpatialVTKConfig.from_file("spatial-vtk.yaml")
metrics_path = cfg.path("outputs.metrics")
Use Configs In Python
Spatial-VTK supports three Python patterns. Use the one that best matches how you are working.
If you want short notebook cells, activate the config once near the top of the notebook. Plotting, table-reading, table-writing, and metric-setting helpers will use that active config when you do not pass paths or config objects.
from spatial_vtk.config import SpatialVTKConfig
from spatial_vtk.config.metrics import metrics_settings_from_config
from spatial_vtk.io import load_output_table
from spatial_vtk.io import prepare_station_metadata
from spatial_vtk.visualize.context import plot_record_coverage
cfg = SpatialVTKConfig.from_file("spatial-vtk.yaml", run_scenario="tutorial").activate()
stations = prepare_station_metadata()
metrics = load_output_table("metrics_enriched")
metric_settings = metrics_settings_from_config()
plot_record_coverage(record_coverage, showfig=True, savefig=True)
If you prefer each call to be self-contained, pass the config directly.
from spatial_vtk.config import SpatialVTKConfig
from spatial_vtk.config import resolve_output_path
cfg = SpatialVTKConfig.from_file("spatial-vtk.yaml", run_scenario="tutorial")
path = resolve_output_path("record_coverage", kind="figure", cfg=cfg)
For CLI workflows, set SVTK_CONFIG_FILE in your shell and let commands
discover it.
export SVTK_CONFIG_FILE=/path/to/spatial-vtk.yaml
svtk config find
Default Output Names
Keep your main config focused on folders:
outputs:
root: outputs
tables: outputs/tables
figures: outputs/figures
dashboards: outputs/dashboards
Spatial-VTK keeps default table and figure filenames in its package defaults,
so notebooks do not need long filename lists. For example,
plot_record_coverage(..., savefig=True) writes
outputs/figures/record_coverage.png when that key is not overridden.
Likewise, write_output_table("prepared_stations", stations) writes
outputs/tables/prepared_stations.csv.
When a workflow step creates several standard tables, write them by output key and let the next step read them the same way:
from spatial_vtk.io import load_output_table, write_output_tables
write_output_tables(
prepared_stations=stations,
prepared_events=events,
event_station_records=event_stations,
)
stations = load_output_table("prepared_stations")
You can still override a single output directly:
plot_record_coverage(record_coverage, savefig=True, outpath="figures/custom_record_coverage.png")
The explicit outpath wins over the active config and package defaults.
Use A Run Scenario
The main config file should hold the defaults you normally want. If you need a
repeatable variation, add it under run_scenarios. A scenario is a small
overlay: any section inside the scenario replaces or extends the same section
from the main config.
Use a scenario from the CLI:
svtk config show --config spatial-vtk.yaml --run-scenario quick_amplitude_check --section metrics
svtk metrics plan --config spatial-vtk.yaml --run-scenario quick_amplitude_check --observed-inventory observed.parquet --synthetic-inventory synthetic.parquet --output outputs/quick_tasks.csv
Use the same scenario in Python:
from spatial_vtk.config import SpatialVTKConfig
from spatial_vtk.io import metric_plan_from_config
cfg = SpatialVTKConfig.from_file("spatial-vtk.yaml", run_scenario="quick_amplitude_check")
plan = metric_plan_from_config(cfg)
Override One Run
You can also override one setting for a single CLI or notebook run. This is useful when you want to try something quickly without changing your config file.
For example, this CLI command uses the config file but calculates only PGA on the Z component for this run:
svtk metrics plan --config spatial-vtk.yaml --metric PGA --component Z --observed-inventory observed.parquet --synthetic-inventory synthetic.parquet --output outputs/pga_z_tasks.csv
And this map command uses a named bounds override from the config:
svtk map spatial station-metric --config spatial-vtk.yaml --bounds specific_area_of_interest --input metrics.parquet --output figures/station_metric_map.png
In a notebook, pass an override dictionary to the metric-plan helper:
cfg = SpatialVTKConfig.from_file("spatial-vtk.yaml")
plan = metric_plan_from_config(
cfg,
overrides={
"metrics": ["PGA"],
"components": ["Z"],
},
)
How Values Are Chosen
Most workflows use this precedence:
Explicit CLI arguments, notebook variables, or function arguments for the current run.
The selected
run_scenariosoverlay, if you selected one.Top-level sections such as
paths,outputs,metrics, andsynthetics.Package defaults.
If you pass a specific CLI argument such as --metric or set an override in
a notebook, that explicit value wins for that run. If you select
--run-scenario quick_amplitude_check, that scenario overrides the main
config defaults before the command runs.
Check The Current Settings
Use these quick checks when you are not sure which file or values are active:
# Show which config file Spatial-VTK found.
svtk config find
# Print the full config file after YAML parsing.
svtk config show
# Print one section.
svtk config show --section metrics
svtk config show --run-scenario quick_amplitude_check --section metrics
# List named map/station bounds from inline config and bounds CSV files.
svtk config bounds
For metric workflows, you can also inspect the resolved plan in Python:
from spatial_vtk.config import SpatialVTKConfig
from spatial_vtk.io import metric_plan_from_config
cfg = SpatialVTKConfig.from_file("spatial-vtk.yaml")
plan = metric_plan_from_config(cfg, command="metrics.calculate")
print(plan)
If a value appears in svtk config show, it came from your config file. If a
value appears only after you add --run-scenario, it came from that selected
scenario. If a value only appears in a command you typed or an override
dictionary you passed in a notebook, it is an override for that run. If none of
those places set it, Spatial-VTK falls back to its package defaults.