with application to geographic and seismic data

by C. Grillenzoni and D. Migliavacca


This WebGis performs 2D spatial clustering of 3D space-time point data (typically produced by Seismic catalogs, GPS and Laser reliefs).
The on-line software estimates the ridges (crests) of 2D point clouds using iterative principal component mean shift (PCMS) methods.
The mean shift (MS) moves the points toward the modes of the cloud while principal components (PC) deviate the points on the crests.

STEP 1 - Load the Data

The data file must be located on your PC. It must be in ASCII format (csv or txt), with data separated by colons or semicolons:
DATA = [x,y,z,m,t]; (at present, "t" is not necessary).

The rows represent the data units (i.e. the events). The columns provide the spatial features:
x = 1st coordinate (longitude)
y = 2nd coordinate (latitude)
z = 3rd coordinate (depth)
m = mark (magnitude)
t = time (occurrence)

Since the data file may be very large, the user may consider only a portion of it, by defining the minimum mark (magnitude) of the observations, e.g with >= 3.0.

STEP 2 - Select the Algorithm

The user may choose among CLASSICAL and BLURRING PCMS methods, both iteratively move the points toward the ridges of the cloud.
BLURRING is an accelerated version, which use original data only in the first iteration, then re-cluster previous estimates. It is efficient, but suffers from problems of asymptotic bias, hence its number of iterations must be small (less than 10).
The BANDWIDTH SELECTION option aims to select the bandwidths of PCMS algorithms in an automatic (data-driven) way, by minimizing a global fitting criterion based on the sum of Euclidean distances between estimates and data points, and the estimates themselves.
MIN = minimum bandwidth value, e.g. 0.5
MAX = maximum bandwidth value, e.g. 2
N = number of bandwidth points, e.g. 5

STEP 3 - Select the Coefficients

MAX ITERATIONS (1-30) : maximum number of iterations allowed to estimates for the BLURRING algorithm, it must be less than 10.
TOLERANCE (0.001) : maximum mean Euclidean distance between estimates allowed in two consecutive iterations. In the CLASSICAL algorithm it enables to minimize the computation time (e.g. 0.001).
BANDWIDTH SIZE (.5-2) : PCMS algorithms are weighted local means of data, the bandwidth "b" tunes the number of data in the mean. In the present program, it is designed as proportional to Silverman's rule (based on standard errors (SE) of data). In practice, it is given by b=a*SE/N^0.2, where 0<a<oo, if a=1,then b=Silverman's rule, recommended 0.5<a<2, a>1 reduces variance of estimates, but increases bias, a<1 provides multiple ridges, but increases the noise.
N. of BANDWIDTHS (1,2) : indicates single (1) or multiple (2) bandwidths. As the Silverman's rule is based on the SE of data x,y,z, 1 means the average SE, whereas 2 indicates individual SE. This option has proven to be useful in simulated data.
COVARIANCE TYPE (0,1,2) : indicates the type of variance-covariance matrix: (0) is global, (1) is local, (2) is intermediate. PCMS algorithms are based on the spectral factorization of the weighted covariance matrix of the data (x,y,z). Choose (0) if ridges have the same direction in space.

FILES (Sample data + User guide)



This website tool has been developed by Prof. Dr. Carlo Grillenzoni and Dr. Diego Migliavacca.