iuav

ON-LINE PRINCIPAL CURVE AND SURFACE ESTIMATION

with application to geographic and seismic data

USER GUIDE

This WebGis performs 2D and 3D spatial clustering of marked point data (typically produced by Seismic catalogs, GPS and Laser scanners).
The on-line software estimates the ridges (crests) of 2D point clouds using iterative principal component mean shift (PCMS) algorithms.
The mean shift (MS) moves the points toward the modes of the cloud while principal components (PC) deviate the points on the crests.

A) GET DATA: Download the Data

An on-line utility is available to download data from 2 seismic catalogs: INGV (Italy) - National institute of volcanology and seismology; NCEDC (USA) - Northern California earthquakes data-center.
To obtain data, go to the main page and click on "GET DATA", next:
- select the geographic zone of interest with the rectangular window (you may shift and zoom the map, or change the window size). On the right, the numerical values of Latitude and Longitude of the corners of the selected zone will be indicated;
- select the proper catalog (INGV, if the window is on Italy), the time-period of interest (starting and ending dates), the min-max depth and magnitude of the earthquakes.
The resulting data will be provided by the system in numerical format, while the corresponding INGV and NCEDC files contain many alphabetical informations which hinder their direct use in numerical softwares.
WARNING 1: a maximum of 15000 data can be downloaded. Thus, if necessary restrict the area of interest or the time-span, mangnitude and depth.

B) GEO DATA: PCMS Estimation

STEP 1 - Upload the Data

If data are geo-referred, then go in the "GEO DATA" section otherwise (e.g. sinthetic data), go in the "NON GEO" section, then upload the data from your PC location.
The data-file must be in CSV format (data separated by colons or semicolons):
DATA = [x,y,z,m,t]; (at the present, "t" is not necessary).

The rows represent the data units (i.e. the events). The columns provide the spatial features:
x = 1st coordinate (longitude)
y = 2nd coordinate (latitude)
z = 3rd coordinate (depth)
m = mark (magnitude)
t = time (occurrence time, in days, from the first event)

Since the data-files may be very large, the user may consider only a portion of it, by defining the minimum mark (magnitude) of the observations, e.g with >= 3.0.

STEP 2 - Select the Algorithm

The user may choose among CLASSICAL and BLURRING PCMS methods. Both iteratively move the points toward the ridges of the cloud.
BLURRING is an accelerated version, which use original data only in the first iteration, then re-cluster previous estimates. It is efficient, but suffers from problems of asymptotic bias, hence its number of iterations must be small (less than 10).
The BANDWIDTH SELECTION option aims to select the bandwidths of PCMS algorithms in an automatic (data-driven) way, by minimizing a global fitting criterion based on the sum of Euclidean distances between estimates and data points, and the estimates themselves.

MIN = minimum bandwidth value, e.g. 0.5
MAX = maximum bandwidth value, e.g. 2
N = number of bandwidth points, e.g. 5

STEP 3 - Select the Coefficients

MAX ITERATIONS (1-30) : maximum number of iterations allowed to estimates (for the BLURRING algorithm, it must be less than 10).
TOLERANCE (0.001) : maximum mean Euclidean distance between estimates allowed in two consecutive iterations. In the CLASSICAL algorithm it enables to minimize the computation time (e.g. 0.001).
BANDWIDTH SIZE (0.5-2) : PCMS algorithms are weighted local means of data, the bandwidth "b" tunes the number of data of the mean. In the present program, it is designed as proportional to Silverman's rule (based on standard errors (SE) of data). In practice, it is given by b=a*SE/N^0.2, where 0<a<oo, if a=1,then b=Silverman's rule, recommended 0.5<a<2, a>1 reduces variance of estimates, but increases bias, a<1 provides multiple ridges, but increases the noise.
N. of BANDWIDTHS (1,2) : indicates single (1) or multiple (2) bandwidths. As the Silverman's rule is based on the SE of data x,y,z, '1' means the average SE, whereas '2' indicates individual SE. This option has proven to be useful in simulated data.
COVARIANCE TYPE (0,1,2) : indicates the type of variance-covariance matrix: (0) is global, (1) is local, (2) is intermediate. PCMS algorithms are based on the spectral factorization of the weighted covariance matrix of the data (x,y,z). Choose (0) if ridges have the same direction in space.

C) 3D DATA: Sliced Surfaces

This section deals with planar "sliced" (2D) estimates of spatial 3D sufaces. The 3D point-cluod is vertically buffered into horizontal strata of data (which may overlap) and are subsequently smoothed with 2D PCMS methods. The resulting estimate is a vertical sequence of level curves of the surface.

USAGE:
- The first two steps are as above;
- Then click on "Sliced" and select:
   a) Numbers of level curves to be estimated;
   b) Size of the vertical rectangular bandwidth.


FILES (Sample data + User guide)

Amatrice.txt
California.txt

InfoWebGis.txt