two-dimensional garch model with application to anomaly detection

two-dimensional garch model with application to anomaly detection two-dimensional garch model with application to anomaly detection

signal.ee.bilkent.edu.tr
from signal.ee.bilkent.edu.tr More from this publisher
12.07.2015 Views

TWO-DIMENSIONAL GARCH MODEL WITH APPLICATION TO ANOMALY DETECTIONAmir Noiboar and Israel CohenDepartment of Electrical Engineering, Technion - Israel Institute of TechnologyTechnion City, Haifa 32000, Israelnoiboar@tx.technion.ac.il; icohen@ee.technion.ac.ilABSTRACTIn this paper, we introduce a two-dimensional Generalized AutoregressiveConditional Heteroscedasticity (GARCH) model forclutter modeling and anomaly detection. The one-dimensionalGARCH model is widely used for modeling financial time series.Extending the one-dimensional GARCH model into two dimensionsyields a novel clutter model which is capable of takinginto account important characteristics of natural clutter, namelyheavy tailed distribution and innovations clustering. We show thatthe two-dimensional GARCH model generalizes the casual GaussMarkov random field (GMRF) model, and develop a matched subspacedetector (MSD) for detecting anomalies in GARCH clutter.Experimental results demonstrate that a reduced false alarm ratecan be achieved without compromising the detection rate by usingan MSD under GARCH clutter modeling, rather than GMRFclutter modeling.1. INTRODUCTIONAnomaly detection is the process of detecting a portion of the data,which differs in some statistical sense from the background clutter.Anomaly detection is a well-studied problem with many practicalapplications including detection of targets in images [1–4], detectionof defects in silicon wafers [5], detection of faults in seismicdata [6], etc. The greatest factor in anomaly detection is thechoice of an adequate statistical model for the clutter, which wouldenable to discriminate between anomalies and the clutter components.Unfortunately, clutter modeling is often obtained by utilizinga Gauss Markov random field (GMRF) model [6, 7], whichis insufficient in its capability to model natural clutter. To overcomethis limitation, and in order to decrease the false alarm ratewhile retaining the desired detection rate, various methods havebeen proposed. Among these methods we find multiscale representationsfor detecting anomalies in different scales [5, 8], performingsegmentation as a preprocessing stage to anomaly detection[2], utilizing a priori information about the shape, size, andother characteristics of the anomaly [9], applying sliding windowsto the image through which the model estimation and anomaly detectioncan be carried out locally [10], etc.In this paper, we introduce a two-dimensional Generalized AutoregressiveConditional Heteroscedasticity (GARCH) model forclutter modeling and anomaly detection. The one-dimensionalGARCH model [11] is widely used for modeling financial timeseries. Extending the one-dimensional GARCH model into twodimensions yields a novel clutter model which is capable of takinginto account important characteristics of natural clutter, namelyheavy tailed distribution and innovations clustering. We show thatthe two-dimensional GARCH model generalizes the casual GMRFmodel, and develop a matched subspace detector (MSD) for detectinganomalies in GARCH clutter. The performance of the proposedanomaly detection method is evaluated on synthetic and realdata. Experimental results demonstrate that reduced false alarmrate can be achieved without compromising the detection rate byusing an MSD under GARCH clutter modeling, rather than GMRFclutter modeling.The paper is organized as follows: In Sec. 2 we introduce thetwo-dimensional GARCH model. In Sec. 3 we address the modelestimation problem. In Sec. 4 we develop an MSD for detectinganomalies in GARCH clutter. Finally, in Sec. 5 we demonstratethe advantage of the proposed anomaly detection approach overusing the GMRF clutter modeling.2. TWO DIMENSIONAL GARCH MODELLet q 1 , q 2 , p 1 , p 2 ≥ 0 denote the order of the GARCH model, andlet Γ 1 and Γ 2 denote two neighborhood sets which are defined byΓ 1 = {kl | 0 ≤ k ≤ q 1, 0 ≤ l ≤ q 2, (kl) ≠ (0, 0)}Γ 2 = {kl | 0 ≤ k ≤ p 1 , 0 ≤ l ≤ p 2 , (kl) ≠ (0, 0)} .Let ɛ ij represent a 2D stochastic process, and let h ij denoteits variance conditioned upon the information set ψ ij ={{ɛ i−k,j−l } kl∈Γ1 , {h i−k,j−l } kl∈Γ2 }. Define the 2D neighborhoodof location ij as: Γ = {kl | k ≤ i, l ≤ j} andiidlet η ij ∼ N(0, 1) be a stochastic 2D process independent ofh kl , ∀kl ∈ Γ. The 2D GARCH(p 1 , p 2 , q 1 , q 2 ) process is definedas:ɛ ij = √ h ij η ij (1)h ij = α 0 + ∑α kl ɛ 2 i−k,j−l + ∑β kl h i−k,j−l , (2)kl∈Γ 1 kl∈Γ 2and is therefore conditionally distributed as:ɛ ij | ψ ij ∼ N(0, h ij ) . (3)In order to guarantee a non-negative conditional variance we require:α 0 > 0α kl ≥ 0, kl ∈ Γ 1β kl ≥ 0, kl ∈ Γ 2 . (4)From (2) we see that at every spatial location, both the neighboringsample variances and the neighboring conditional variances playa role in the current conditional variance. This yields clustering

TWO-DIMENSIONAL GARCH MODEL WITH APPLICATION TO ANOMALY DETECTIONAmir Noiboar and Israel CohenDepartment of Electrical Engineering, Technion - Israel Institute of TechnologyTechnion City, Haifa 32000, Israelnoiboar@tx.technion.ac.il; icohen@ee.technion.ac.ilABSTRACTIn this paper, we introduce a <strong>two</strong>-<strong>dimensional</strong> Generalized Au<strong>to</strong>regressiveConditional Heteroscedasticity (GARCH) <strong>model</strong> forclutter <strong>model</strong>ing and <strong>anomaly</strong> <strong>detection</strong>. The one-<strong>dimensional</strong>GARCH <strong>model</strong> is widely used for <strong>model</strong>ing financial time series.Extending the one-<strong>dimensional</strong> GARCH <strong>model</strong> in<strong>to</strong> <strong>two</strong> dimensionsyields a novel clutter <strong>model</strong> which is capable of takingin<strong>to</strong> account important characteristics of natural clutter, namelyheavy tailed distribution and innovations clustering. We show thatthe <strong>two</strong>-<strong>dimensional</strong> GARCH <strong>model</strong> generalizes the casual GaussMarkov random field (GMRF) <strong>model</strong>, and develop a matched subspacedetec<strong>to</strong>r (MSD) for detecting anomalies in GARCH clutter.Experimental results demonstrate that a reduced false alarm ratecan be achieved <strong>with</strong>out compromising the <strong>detection</strong> rate by usingan MSD under GARCH clutter <strong>model</strong>ing, rather than GMRFclutter <strong>model</strong>ing.1. INTRODUCTIONAnomaly <strong>detection</strong> is the process of detecting a portion of the data,which differs in some statistical sense from the background clutter.Anomaly <strong>detection</strong> is a well-studied problem <strong>with</strong> many practical<strong>application</strong>s including <strong>detection</strong> of targets in images [1–4], <strong>detection</strong>of defects in silicon wafers [5], <strong>detection</strong> of faults in seismicdata [6], etc. The greatest fac<strong>to</strong>r in <strong>anomaly</strong> <strong>detection</strong> is thechoice of an adequate statistical <strong>model</strong> for the clutter, which wouldenable <strong>to</strong> discriminate between anomalies and the clutter components.Unfortunately, clutter <strong>model</strong>ing is often obtained by utilizinga Gauss Markov random field (GMRF) <strong>model</strong> [6, 7], whichis insufficient in its capability <strong>to</strong> <strong>model</strong> natural clutter. To overcomethis limitation, and in order <strong>to</strong> decrease the false alarm ratewhile retaining the desired <strong>detection</strong> rate, various methods havebeen proposed. Among these methods we find multiscale representationsfor detecting anomalies in different scales [5, 8], performingsegmentation as a preprocessing stage <strong>to</strong> <strong>anomaly</strong> <strong>detection</strong>[2], utilizing a priori information about the shape, size, andother characteristics of the <strong>anomaly</strong> [9], applying sliding windows<strong>to</strong> the image through which the <strong>model</strong> estimation and <strong>anomaly</strong> <strong>detection</strong>can be carried out locally [10], etc.In this paper, we introduce a <strong>two</strong>-<strong>dimensional</strong> Generalized Au<strong>to</strong>regressiveConditional Heteroscedasticity (GARCH) <strong>model</strong> forclutter <strong>model</strong>ing and <strong>anomaly</strong> <strong>detection</strong>. The one-<strong>dimensional</strong>GARCH <strong>model</strong> [11] is widely used for <strong>model</strong>ing financial timeseries. Extending the one-<strong>dimensional</strong> GARCH <strong>model</strong> in<strong>to</strong> <strong>two</strong>dimensions yields a novel clutter <strong>model</strong> which is capable of takingin<strong>to</strong> account important characteristics of natural clutter, namelyheavy tailed distribution and innovations clustering. We show thatthe <strong>two</strong>-<strong>dimensional</strong> GARCH <strong>model</strong> generalizes the casual GMRF<strong>model</strong>, and develop a matched subspace detec<strong>to</strong>r (MSD) for detectinganomalies in GARCH clutter. The performance of the proposed<strong>anomaly</strong> <strong>detection</strong> method is evaluated on synthetic and realdata. Experimental results demonstrate that reduced false alarmrate can be achieved <strong>with</strong>out compromising the <strong>detection</strong> rate byusing an MSD under GARCH clutter <strong>model</strong>ing, rather than GMRFclutter <strong>model</strong>ing.The paper is organized as follows: In Sec. 2 we introduce the<strong>two</strong>-<strong>dimensional</strong> GARCH <strong>model</strong>. In Sec. 3 we address the <strong>model</strong>estimation problem. In Sec. 4 we develop an MSD for detectinganomalies in GARCH clutter. Finally, in Sec. 5 we demonstratethe advantage of the proposed <strong>anomaly</strong> <strong>detection</strong> approach overusing the GMRF clutter <strong>model</strong>ing.2. TWO DIMENSIONAL GARCH MODELLet q 1 , q 2 , p 1 , p 2 ≥ 0 denote the order of the GARCH <strong>model</strong>, andlet Γ 1 and Γ 2 denote <strong>two</strong> neighborhood sets which are defined byΓ 1 = {kl | 0 ≤ k ≤ q 1, 0 ≤ l ≤ q 2, (kl) ≠ (0, 0)}Γ 2 = {kl | 0 ≤ k ≤ p 1 , 0 ≤ l ≤ p 2 , (kl) ≠ (0, 0)} .Let ɛ ij represent a 2D s<strong>to</strong>chastic process, and let h ij denoteits variance conditioned upon the information set ψ ij ={{ɛ i−k,j−l } kl∈Γ1 , {h i−k,j−l } kl∈Γ2 }. Define the 2D neighborhoodof location ij as: Γ = {kl | k ≤ i, l ≤ j} andiidlet η ij ∼ N(0, 1) be a s<strong>to</strong>chastic 2D process independent ofh kl , ∀kl ∈ Γ. The 2D GARCH(p 1 , p 2 , q 1 , q 2 ) process is definedas:ɛ ij = √ h ij η ij (1)h ij = α 0 + ∑α kl ɛ 2 i−k,j−l + ∑β kl h i−k,j−l , (2)kl∈Γ 1 kl∈Γ 2and is therefore conditionally distributed as:ɛ ij | ψ ij ∼ N(0, h ij ) . (3)In order <strong>to</strong> guarantee a non-negative conditional variance we require:α 0 > 0α kl ≥ 0, kl ∈ Γ 1β kl ≥ 0, kl ∈ Γ 2 . (4)From (2) we see that at every spatial location, both the neighboringsample variances and the neighboring conditional variances playa role in the current conditional variance. This yields clustering


of variations, which is an important characteristic of the GARCHprocess. Note that if q 1 = q 2 = p 1 = p 2 = 0 then ɛ ij is simplywhite Gaussian noise (WGN).It is shown in [11] that a sufficient condition for wide sensestationarity of the 1D GARCH process is that the sum of all parametersis smaller than one. A similar result is obtained in the 2Dcase as we prove in the following theorem.Theorem 1: The GARCH(p 1, p 2, q 1, q 2) process as definedin (1) and (2) is wide-sense stationary <strong>with</strong>:E(ɛ ij ) = 0var(ɛ ij ) = α 0⎡cov(ɛ ij , ɛ kl ) = 0, ∀(ij) ≠ (kl) ,∑if and only if α kl +∑ β kl < 1.kl∈Γ 1 kl∈Γ 2Proof: Substituting (1) in<strong>to</strong> (2) yields:h ij = α 0⎤⎣1 − ∑α kl − ∑β kl⎦kl∈Γ 1 kl∈Γ 2∑ ∞g=0where M(i, j, g) involves all terms of the form:forandIn general∏α a klklkl∈Γ 1∑a kl + ∑kl∈Γ 1∏β b klklkl∈Γ 1−1M(i, j, g) , (5)∏nkl∈Γ 2b kl = g ;r=1η 2 (ij)−s r,∑kl∈Γ 1a kl = n0 ≤ |s 1| ≤ |s 2| ≤ · · · ≤ |s n|s r ≡ (s ri , s rj )s ri ≤ max {kq 1 , (g − 1)q 1 + p 1 }s rj ≤ max {kq 2, (g − 1)q 2 + p 2} .M(i, j, g) = ∑kl∈Γ 1α kl η 2 i−k,j−lM(i − k, j − l, g) ++ ∑kl∈Γ 2β kl M(i − k, j − l, g) . (6)Since η ij is i.i.d., the moments of M(i, j, g) are independent of(ij), and in particularE {M(i, j, g)} = E {M(k, l, g)} ∀ ijklg . (7)From (6) and (7) we have:⎡⎤E {M(i, j, g + 1)} = ⎣ ∑α kl + ∑β kl⎦kl∈Γ 1 kl∈Γ 2Finally by (5) and (8):E { ɛ 2 }ij⎡⎤= α 0⎣1 − ∑α kl − ∑β kl⎦kl∈Γ 1 kl∈Γ 2g+1−1,. (8)if and only if∑α kl + ∑kl∈Γ 1kl∈Γ 2β kl < 1 .E(ɛ ij ) = 0 and cov(ɛ ij , ɛ kl ) = 0, ∀(ij) ≠ (kl) follows immediately.3. MODEL ESTIMATIONIn this section we find a maximum likelihood estimate for theGARCH <strong>model</strong>. We let ɛ ij be innovations of a 2D linear regression,where y ij is the dependent variable, x ij a vec<strong>to</strong>r of explana<strong>to</strong>ryvariables and b a vec<strong>to</strong>r of unknown parameters:ɛ ij = y ij − x T ijb , (9)If ɛ ij in (9) is WGN the regression <strong>model</strong> is a casual GMRF. Thisis a special case of the GARCH process. Using (9) we can write(2) as:h ij = α 0 + ∑k,l∈Γ 1α kl (y i−k,j−l − x T i−k,j−lb) 2 ++ ∑kl∈Γ 2β kl h i−k,j−l . (10)The conditional distribution of y ij is Gaussian <strong>with</strong> mean x T ijb andvariance h ij:()1 −(y ij − x T ijb) 2f(y ij | x ij, ψ ij) = √ exp. (11)2πh ij 2h ijLetz T ij = [1, ɛ 2 i−1,j, . . . , ɛ 2 i−q 1 ,j−q 2, h i−1,j , . . . , h i−p1 ,j−p 2] =and let= [1, (y i−1,j − x T i−1,jb) 2 , . . . , (y i−q1 ,j−q 2− x T i−q 1 ,j−q 2b) 2 ,h i−1,j , . . . , h i−p1 ,j−p 2]δ T = [α 0 , α 0,1 , . . . , α q1 ,q 2, β 0,1 , . . . , β p1 ,p 2] ,then (10) can be written as:h ij = [z ij(b)] T δ . (12)The unknown parameters are collected in<strong>to</strong> a column vec<strong>to</strong>r θ =[b T , δ T ] T . Define the sample space Ω as a <strong>two</strong> <strong>dimensional</strong> latticeof size N × M: Ω = {ij | 1 ≤ i ≤ N, 1 ≤ j ≤ M}. Theconditional sample log likelihood is:L(θ) = ∑log f(y ij | x ij, ψ ij) =ij∈Ω= − 1 ∑[(N + M)log(2π) + log([z ij (b)] T δ) +2ij∈Ω+ ∑(y ij − x T ijb) 2 /([z ij (b)] T δ)] . (13)ij∈ΩThe parameter vec<strong>to</strong>r θ is found by numerically solving a constrainedmaximization problem on the log likelihood function <strong>with</strong>respect <strong>to</strong> the unknown parameters (see for example [12]). Theconstraints used are those presented in (4) and in Theorem 1. Tosolve the maximization problem knowledge of values of ɛ ij and


(a)(b)(a)(b)(c)Fig. 1. Anomaly <strong>detection</strong> in synthetic GARCH clutter: (a) Originalimage <strong>with</strong> embedded rectangular <strong>anomaly</strong>; (b) GLR based onGMRF clutter <strong>model</strong>ing; (c) GLR based on GARCH clutter <strong>model</strong>ing;3. Calculate the conditional variance field h(s) using (12).4. Calculate the GARCH innovation fields ɛ k (s), k ∈ {0, 1}.for every spatial location s, using (17),(18)5. Find the GLR for every spatial location s using (20).6. Compare the GLR <strong>to</strong> the chosen threshold <strong>to</strong> achieve the<strong>anomaly</strong> <strong>detection</strong> image.5. EXPERIMENTAL RESULTS AND DISCUSSIONIn this section we demonstrate the performance of our <strong>anomaly</strong> <strong>detection</strong>approach on synthetic and real data, and show the advantageof using GARCH clutter <strong>model</strong>ing compared <strong>to</strong> using GMRF<strong>model</strong>ing.5.1. Synthetic DataWe use synthetic image data, which was generated using aGARCH(1,1,1,1) <strong>model</strong> <strong>with</strong> the following parameters: α 0 =0.002, α 0,1 = 0.3, α 1,0 = 0.25, α 1,1 = 0.1, β 0,1 =0.03, β 1,0 = 0.02, β 1,1 = 0.04, b = [0.04, 0.03, 0.05] T andx ij = [y i,j−1 , y i−1,j , y i−1,j−1 ] T . A 5 × 5 <strong>anomaly</strong> is plantedin the synthetic image. Figure 1(a) shows the synthetic image <strong>with</strong>the inserted <strong>anomaly</strong>. The <strong>anomaly</strong> does not stand out as much asthe clustered variations <strong>to</strong> its upper left. We set the <strong>anomaly</strong> size <strong>to</strong>K = L = 7 and create an <strong>anomaly</strong> subspace using a single imagechip. No interference subspace is assumed. We perform parameterestimation as described in Section 3 and <strong>anomaly</strong> <strong>detection</strong> asdetailed in Section 4. Figure 1(b) shows the GLR when performingan MSD based <strong>anomaly</strong> <strong>detection</strong> in a strongly casual GMRFclutter. Figure 1(c) shows the GLR obtained by (20) when <strong>model</strong>ingthe image as a GARCH process. Typically, as demonstrated inFigure 1, the MSD under GARCH <strong>model</strong>ing yields a lower falsealarm rate than under GMRF <strong>model</strong>ing.5.2. Real DataWe now demonstrate the performance of our <strong>anomaly</strong> <strong>detection</strong>approach on a real silicon wafer image. Figure 2(a) shows an imageof a silicon wafer containing a manufacturing defect. Figures2(b) and 2(c) show the <strong>detection</strong> results using a GMRF clutter<strong>model</strong> and a GARCH clutter <strong>model</strong>, respectively. We used an(c)Fig. 2. Silicon wafer defect <strong>detection</strong>: (a) Original image; (b) GLRbased on GMRF clutter <strong>model</strong>ing; (c) GLR based on GARCH clutter<strong>model</strong>ing;<strong>anomaly</strong> subspace constructed from a single image chip of size5 × 5. Note that we did not use training images of typical defects<strong>to</strong> create the <strong>anomaly</strong> subspace. The advantage of GARCH<strong>model</strong>ing over GMRF <strong>model</strong>ing is clearly evident.6. REFERENCES[1] I. S. Reed and X. Yu, “Adaptive multiple-band CFAR <strong>detection</strong> ofan optical pattern <strong>with</strong> unknown spectral distribution,” IEEE Trans.Acoust., Speech, Signal Processing, vol. 38, no. 10, pp. 1760–1770,Oc<strong>to</strong>ber 1990.[2] E. A. Ash<strong>to</strong>n, “Detection of subpixel anomalies in multispectral infraredimagery using an adaptive Bayesian classifier,” IEEE Trans.Geosci. Remote Sensing, vol. 36, no. 2, pp. 506–517, March 1998.[3] D. W. J. Stein, S. G. Beaven, L. E. Hoff, E. M. Winter, A. P. Schaumand A. D. S<strong>to</strong>cker, “Anomaly <strong>detection</strong> from hyperspectral imagery,”IEEE Signal Processing Mag., vol. 19, no. 1, pp. 58–69, January 2002.[4] D. Manolakis and G. Shaw, “Detection algorithms for hyperspectralimaging <strong>application</strong>s,” IEEE Signal Processing Mag., vol. 19, no. 1, pp.29–43, January 2002.[5] A. Goldman and I. Cohen, “Anomaly subspace <strong>detection</strong> based on amulti-scale Markov random field <strong>model</strong>,” <strong>to</strong> appear in Signal Processing.[6] A. Noiboar and I. Cohen, “Anomaly <strong>detection</strong> in three <strong>dimensional</strong>data based on Gauss Markov random field <strong>model</strong>ing,” in Proc. 23rdIEEE Convention of Electrical and Electronics Engineers, Israel, Sep.2004, pp. 448–451.[7] S. M. Schweizer and J. M. F. Moura, “Hyperspectral imagery: Clutteradaptation in <strong>anomaly</strong> <strong>detection</strong>,” IEEE Trans. Inform. Theory, vol. 46,no. 5, pp. 1855–1871, August 2000.[8] R. N. Strickland and H. I. Hahn, “Wavelet transform methods forobject <strong>detection</strong> and recovery,” IEEE Trans. Image Processing, vol. 6,no. 5, pp. 724–735, May 1997.[9] L. L. Scharf and B. Friedlander, “Matched subspace detec<strong>to</strong>rs,” IEEETrans. Signal Processing, vol. 42, no. 8, pp. 2146–2157, August 1994.[10] S. M. Schweizer and J. M. F. Moura, “Efficient <strong>detection</strong> in hyperspectralimagery,” IEEE Trans. Image Processing, vol. 10, no. 4, pp.584–597, April 2001.[11] T. Bollerslev, “Generalized au<strong>to</strong>regressive conditional heteroscedasticity,”Journal of Econometrics, vol. 31, pp. 307–327, 1986.[12] E. K. Berndt, B. H. Hall, R. E. Hall and J. A. Hausman, “Estimationinference in nonlinear structural <strong>model</strong>s,” Annals of Economics andSocial Measurments, vol. 4, pp. 653–655, 1974.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!