Data Files

SERIES C
Applied Statistics

Modellling heterogeneous space-time occurrences of earthquakes and its residual analysis, by Y. Ogata et al.
Appl. Statist., 52 (2003), 499-509

OUTLINE of the DATA "AlljpM50.config":
This dataset consists of records of the earthquake occurrences associated with some geometrical and topological structures, as described below, in order to apply the Hierarchical Space-Time Epidemic-Type Aftershock Sequence (HIST-ETAS) model.
First, we list hypocentre data of earthquakes throughout Japan (within the rectangular region bounded by 128 degree E and 149 degree E meridians and 30 degree N and 47 degree N parallels) with magnitude (M) 5.0 or larger, and with depths shallower than 100 km, for the period ranging from 1926 through 1995, selected from the Hypocenter Catalogue of the Japan Metrological Agency (JMA).
Then, the mainshocks and their clusters of aftershocks are identified by the procedure in Ogata (1998), where a bivariate Normal distribution is assumed for each cluster of aftershocks. The location coordinate of the mainshock in the catalogue is replaced by the average (the centroid) of the locations of its cluster members in case where the former is rejected as the centroid. In addition, the coefficients of the 2x2 variance-covariance matrix is listed for significantly anisotropic spatial clusters of aftershocks; otherwise, the identity variance-covariance matrix is adopted for isotropic clusters including single events. Refer to Ogata (1998) for the statistical procedure using the Akaike Information Criterion (AIC).
Furthermore, the present dataset includes topological information associated with the Delaunay tessellation connecting the above locations of earthquakes together with some additionally placed points on the boundary and vertices of the whole rectangular region of Japan.

DATA FORMAT of "AlljpM50.config":
The first raw of the dataset gives the number of earthquakes (neqs=4573), the total number of points including the additional boundary points consisting of the vertices of Delaunay triangles (npt=4646; see Figure 1b of the paper), the number of Delaunay triangles (ndt=9217), and ranges of the whole region (xd=21.0 longitudinal degrees and yd=17 latitudinal degrees) relative to the origin at (128.0E, 30.0N) degree. Hereafter, we note that the longitude component of the distance, and therefore area of the Delaunay triangles, should be shrunk by the factor of cos(theta) in the computing program, where theta=38.5 degree is the latitudinal centre of the region.

The second to (neqs+1)-th rows of the dataset give the earthquake's code, its coordinate of longitude and latitude relative to the origin at (128.0E, 30.0N) degree, the magnitude, the occurrence time in days starting from the 1st January 1926; and the standard deviations of the clustering for longitude and latitude component, respectively, and the correlation, in the last three columns (cf., equation (6) in the paper).

The (neqs+2)-th to (npt+1)-th rows provide the code of boundary point followed after the last code of the earthquakes, and its location coordinates.
The (npt+2)-th to (npt+ndt+1)-th rows provide the code of Delaunay triangle, its three vertices in terms of point's code, and the calculated area of the triangle.
The (npt+ndt+2)-th to (2npt+ndt+1)-th rows indicate the code of each point in order, total number of its directly connected points of the greater codes through the tessellation, and the codes of such points if exist.
The bottom row shows the maximum number of such connected points of the larger codes among each point.

OUTLINE of the DATA "Alljpm50.3Dcfg3":
This dataset includes the records of the earthquake occurrences and some topological structure associated with the three dimensional Delaunay tessellation, in order to estimate the space-time ratio of the real seismicity rate to the theoretical rate due to the estimated HIST-ETAS model (cf., Section 5 in the present paper). The whole three-dimensional space-time volume is divided into the Delaunay tetrahedra; their vertices are the epicentre coordinates and origin times of earthquakes, and some additionally placed points on the boundary surface, edges and vertices of the whole space-time volume.

DATA FORMAT of "Alljpm50.3Dcfg3":
The first raw of the dataset gives the number of earthquakes (neqs=4573), the number of all points including earthquakes and boundary points for which the Delaunay tessellation is made (npt=5378), the number of Delaunay tetrahedra (ndt=31467), and the longitudinal and latitudinal range of the whole region (xd=21.0 degrees and yd=17 degrees, respectively) relative to the origin at (128.0E, 30.0N) degree, the scaled time range td= 18.89652623799, the scale of the time axis tscl=1353.0 so that td*scl=25567.0 days for the whole time span of 70 years, and the space-time origin 128.0E degree, 30.0N degree, and 0.0 day.
The second to (neqs+1) th rows indicate the earthquake's code, its coordinate of longitudinal and latitudinal coordinates relative to the origin, and the occurrence time in days/tscl starting from the 1st January 1926.

The (neqs+2)-th to (npt+1)-th rows indicate the code of boundary point followed after the last code of earthquakes, and its location coordinates.
The (npt+2)-th to (npt+ndt+1)-th rows indicate the code of Delaunay tetrahedra, its four vertices in terms of codes of points, and calculated volume of the tetrahedron.

The (npt+ndt+2)-th to (2npt+ndt+1)-th rows indicate the code of each point in order, total number of its directly connected points of the greater codes through the tessellation, and the codes of such points if exist.
The bottom row shows the maximum number of such connected points of the larger codes among each point.

REFERENCES:
Ogata, Y. (1998) Space-time point-process models for earthquake occurrences, Ann. Inst. Statist. Math., 50, 379-402.

Yosihiko Ogata
Institute of Statistical Mathematics
Minami-azabu 4-6-7
Minato-ku
Tokyo 106-8569
Japan
E-mail: ogata@ism.ac.jp

  • Dataset (alljp95m50.3Dcfg3, size - 930kb)
Journals

SERIES A
Statistics in Society

SERIES B
Statistical Methodology

SERIES C
Applied Statistics

SERIES D
The Statistician