Skip to contents

The purpose of this vignette is to present the calculations of the costs for various distribution.

In the follow variables identified by Greek letters are considered known a priori.

Univariate Independent Gaussian

Data belongs to group kk whose time stamps are the set TkT_{k} can have additive mean anomaly mkm_{k} and multiplicative variance anomaly sks_{k} which are common for tTkt \in T_{k}. For tTkt \in T_{k} this gives P(yt|μt,mk,σk,sk)=12πσtskexp(12σtsk(ytμtmk)2) P\left(y_t \left| \mu_t,m_k,\sigma_k,s_k\right.\right) = \frac{1}{\sqrt{2\pi\sigma_{t}s_{k}}}\exp\left(-\frac{1}{2\sigma_{t}s_{k}}\left(y_{t} - \mu_t - m_{k}\right)^2\right)

with the likelihood of of the nkn_{k} observations ytTky_{t \in T_{k}} being L(ytTk|μt,mk,σk,sk)=(2πsk)nk/2tTkσt1/2exp(12sktTk(ytμtmk)2σt) L\left(y_{t \in T_{k}} \left| \mu_t,m_k,\sigma_k,s_k\right.\right) = \left(2\pi s_{k}\right)^{-n_{k}/2} \prod\limits_{t \in T_{k}} \sigma_{t}^{-1/2} \exp\left(-\frac{1}{2s_{k}}\sum\limits_{t \in T_{k}} \frac{\left(y_{t} - \mu_t - m_{k}\right)^2}{\sigma_{t}}\right)

The log-likelihood of ytTky_{t \in T_{k}} is then l(ytTk|μt,mk,σk,sk)=nk2log(2πsk)12tTklog(σt)12sktTk(ytμtmk)2σt l\left(y_{t \in T_{k}} \left| \mu_t,m_k,\sigma_k,s_k\right.\right) = -\frac{n_{k}}{2} \log\left(2\pi s_{k}\right) -\frac{1}{2}\sum\limits_{t \in T_{k}} \log\left(\sigma_{t}\right) -\frac{1}{2s_{k}}\sum\limits_{t \in T_{k}} \frac{\left(y_{t} - \mu_t - m_{k}\right)^2}{\sigma_{t}}

with the cost being twice the negative log likelihood plus a penalty β\beta giving

C(ytTk|μt,mk,σk,sk)=nklog(2πsk)+tTklog(σt)+1sktTk(ytμtmk)2σt+β C\left(y_{t \in T_{k}} \left| \mu_t,m_k,\sigma_k,s_k\right.\right) = n_{k} \log\left(2\pi s_{k}\right) +\sum\limits_{t \in T_{k}} \log\left(\sigma_{t}\right) +\frac{1}{s_{k}}\sum\limits_{t \in T_{k}} \frac{\left(y_{t} - \mu_t - m_{k}\right)^2}{\sigma_{t}} + \beta

Anomaly in mean and varinace

Estimates m̂\hat{m} of mm and σ̂\hat{\sigma} of σ\sigma can be selected to minimise the cost by taking m̂k=(tTkytμtσt)(tTk1σt)1 \hat{m}_{k} = \left( \sum\limits_{t \in T_k} \frac{y_t-\mu_t}{\sigma_t} \right)\left( \sum\limits_{t \in T_k} \frac{1}{\sigma_t}\right)^{-1} and ŝk=1nktTk(ytμtm̂k)2σt \hat{s}_{k} = \frac{1}{n_{k}} \sum\limits_{t \in T_k} \frac{ \left(y_t-\mu_t - \hat{m}_{k}\right)^2}{\sigma_t}

Subsituting these into the cost gives CMV(ytTk|μt,m̂k,σk,ŝk)=nklog(2πŝk)+tTklog(σt)+nk+β C_{MV}\left(y_{t \in T_{k}} \left| \mu_t,\hat{m}_k,\sigma_k,\hat{s}_k\right.\right) = n_{k} \log\left(2\pi \hat{s}_{k}\right) +\sum\limits_{t \in T_{k}} \log\left(\sigma_{t}\right) +n_{k} + \beta

Anomaly in Mean

There is no change in variance so sk=1s_{k}=1. Estimate of m̂k\hat{m}_{k} is unchanged from that for an anomaly in mean and variance so the cost

CM(ytTk|μt,m̂k,σk)=nklog(2π)+tTklog(σt)+tTk(ytμtm̂k)2σt+β C_{M}\left(y_{t \in T_{k}} \left| \mu_t,\hat{m}_k,\sigma_k\right.\right) = n_{k} \log\left(2\pi\right) +\sum\limits_{t \in T_{k}} \log\left(\sigma_{t}\right) +\sum\limits_{t \in T_k} \frac{ \left(y_t-\mu_t - \hat{m}_{k}\right)^2}{\sigma_t} + \beta

can be written as

CM(ytTk|μt,m̂k,σk)=nklog(2π)+tTklog(σt)+tTk(ytμt)2σtm̂2tTk1σt+β C_{M}\left(y_{t \in T_{k}} \left| \mu_t,\hat{m}_k,\sigma_k\right.\right) = n_{k} \log\left(2\pi\right) +\sum\limits_{t \in T_{k}} \log\left(\sigma_{t}\right) +\sum\limits_{t \in T_k} \frac{ \left(y_t-\mu_t\right)^2}{\sigma_t} -\hat{m}^{2} \sum\limits_{t \in T_k} \frac{ 1}{\sigma_t} + \beta

Anomaly in Variance

These is no mean anomaly so mk=0m_{k}=0. Estimate of ŝk\hat{s}_{k} therfore changes to

ŝk=1nktTk(ytμt)2σt \hat{s}_{k} = \frac{1}{n_{k}} \sum\limits_{t \in T_k} \frac{ \left(y_t-\mu_t\right)^2}{\sigma_t} and cost is CV(ytTk|μt,σk,ŝk)=nklog(2πŝk)+tTklog(σt)+nk+β C_{V}\left(y_{t \in T_{k}} \left| \mu_t,\sigma_k,\hat{s}_k\right.\right) = n_{k} \log\left(2\pi \hat{s}_{k}\right) +\sum\limits_{t \in T_{k}} \log\left(\sigma_{t}\right) +n_{k} + \beta

No Anomaly (Baseline)

Here mk=0m_{k}=0 and sk=1s_{k}=1 and there is no penalty so

CB(ytTk|μt,σt)=nklog(2π)+tTklog(σt)+tTk(ytμt)2σt C_{B}\left(y_{t \in T_{k}} \left| \mu_t,\sigma_t\right.\right) = n_{k} \log\left(2\pi \right) +\sum\limits_{t \in T_{k}} \log\left(\sigma_{t}\right) +\sum\limits_{t \in T_{k}} \frac{\left(y_{t} - \mu_t\right)^2}{\sigma_{t}}

Point anomaly

A point anomaly at time tt is treated as a sinle time step with a variance anomaly. Naively the cost could be computed using the formulea for a variance anomaly as Cp(yt|μt,σt)=log(2π(ytμt)2σt)+log(σt)+1+β=log(2π)+log((ytμt)2)+1+β C_{p}\left(y_{t}\left| \mu_{t},\sigma_{t}\right.\right) = \log\left(2\pi \frac{ \left(y_t-\mu_t\right)^2}{\sigma_t}\right) + \log\left(\sigma_{t}\right) + 1 + \beta = \log\left(2\pi\right) + \log \left( \left(y_t-\mu_t\right)^2 \right) + 1 + \beta

However the cost of the point anomaly should be higher then the background cost when yty_{t} is, in some sense, close to the background.

Follow Fisch et al. in intorducing a term γ\gamma to control this. Using the standardised variable zt=ytμtσtz_{t} = \frac{y_t-\mu_t}{\sqrt{\sigma_{t}}} the modified cost of a point anomaly is expressed as Cp(yt|μt,σt)=log(2π)+log(σt)+log(γ+zt2)+1+β C_{p}\left(y_{t}\left| \mu_{t},\sigma_{t}\right.\right) = \log\left(2\pi\right) + \log\left(\sigma_{t}\right) + \log \left( \gamma + z_{t}^{2} \right) + 1 + \beta

Relating this to the background cost we see that point anomalies may be accepted in the capa search when f(zt2,γ,β)=Cp(yt|μt,σt)CB(yt|μt,σt)=log(γ+zt2)+1+βzt2<0 f\left(z_{t}^2,\gamma,\beta\right) = C_{p}\left(y_{t}\left| \mu_{t},\sigma_{t}\right.\right) - C_{B}\left(y_{t}\left| \mu_{t},\sigma_{t}\right.\right) = \log \left( \gamma + z_{t}^{2} \right) + 1 + \beta - z_{t}^{2} < 0

This implies that γ\gamma should be selected such that f(0,γ,β)0f\left(0,\gamma,\beta\right) \geq 0

The gradient wrt zt2z_{t}^2 is zt2f(zt2,γ,β)=1γ+zt21 \frac{\partial}{\partial z_{t}^2} f\left(z_{t}^2,\gamma,\beta\right) = \frac{1}{\gamma + z_{t}^{2}} - 1

Requiring γ<1\gamma < 1; which maintains a positive gradient for small z_{t}^2; indicates that there will no anomalies near 0.

Consider three different definitions of γ\gamma.

  • The non corection of γ0=0\gamma_{0} = 0 which allows point anomalies as z_{t}^2 approaches 0
  • The correction γ1=exp(β)\gamma_{1} = \exp\left(-\beta\right) proposed by Fisch et al.
  • The minimal correction γ2=exp((1+β))\gamma_{2} = \exp\left(-\left(1+\beta\right)\right) for which f(0,γ2,β)=0f\left(0,\gamma_{2},\beta\right) = 0.

To see the impact of the correct for small zz the following figure shows f(zt2,γ,β)f\left(z_{t}^2,\gamma,\beta\right) for β=10\beta=10 for the three options.

It is clear that the difference become small as zz increases. This is supported by the plot below shows the value of ztz_{t} at which an point anomaly might occur as β\beta varies. Area above the line are potential anomaly values.

Poisson

Data belongs to group kk whose time stamps are the set TkT_{k} can have a multiplicative rate anomaly rkr_{k} which is common for tTkt \in T_{k}. For tTkt \in T_{k} this gives P(yt|λt,rk)=λtytrkytexp(rkλt)yt! P\left(y_t \left| \lambda_t,r_k\right.\right) = \frac{\lambda_{t}^{y_{t}} r_{k}^{y_{t}} \exp\left(-r_{k}\lambda_{t}\right)}{y_{t}!}

with the likelihood of of the nkn_{k} observations ytTky_{t \in T_{k}} being L(ytTk|λt,rk)=rktTkytexp(rktTkλt)tTkλtytyt! L\left(y_{t \in T_{k}} \left| \lambda_t,r_k\right.\right) = r_{k}^{\sum\limits_{t \in T_{k}} y_{t}} \exp\left(-r_{k} \sum\limits_{t \in T_{k}} \lambda_{t}\right) \prod\limits_{t \in T_{k}} \frac{ \lambda_{t}^{y_{t}} }{ y_{t}!}

The log-likelihood of ytTky_{t \in T_{k}} is then l(ytTk|λt,rk)=tTkytlog(rk)rktTkλt+tTkytlog(λt)tTklog(yt!) l\left(y_{t \in T_{k}} \left| \lambda_t,r_k\right.\right) = \sum\limits_{t \in T_{k}} y_{t} \log\left(r_{k}\right) - r_{k} \sum\limits_{t \in T_{k}} \lambda_{t} + \sum\limits_{t \in T_{k}} y_{t} \log\left(\lambda_{t}\right) - \sum\limits_{t \in T_{k}} \log\left( y_{t}! \right)

with the cost being twice the negative log likelihood plus a penalty β\beta giving

C(ytTk|λt,rk)=2rktTkλt2tTkytlog(rk)2tTkytlog(λt)+2tTklog(yt!)+β C\left(y_{t \in T_{k}} \left| \lambda_t,r_k\right.\right) = 2 r_{k} \sum\limits_{t \in T_{k}} \lambda_{t} - 2 \sum\limits_{t \in T_{k}} y_{t} \log\left(r_{k}\right) - 2 \sum\limits_{t \in T_{k}} y_{t} \log\left(\lambda_{t}\right) + 2 \sum\limits_{t \in T_{k}} \log\left( y_{t}! \right) +\beta

Anomaly in rate

Estimates r̂k\hat{r}_{k} of rkr_{k} can be selected to minimise the cost by taking r̂k=tTkyttTkλt \hat{r}_{k} = \frac{ \sum\limits_{t \in T_{k}} y_{t} }{\sum\limits_{t \in T_{k}} \lambda_{t}}

This gives a cost of C(ytTk|λt,rk)=2tTkyt2tTkytlog(r̂k)2tTkytlog(λt)+2tTklog(yt!)+β C\left(y_{t \in T_{k}} \left| \lambda_t,r_k\right.\right) = 2 \sum\limits_{t \in T_{k}} y_{t} - 2 \sum\limits_{t \in T_{k}} y_{t} \log\left(\hat{r}_{k}\right) - 2 \sum\limits_{t \in T_{k}} y_{t} \log\left(\lambda_{t}\right) + 2 \sum\limits_{t \in T_{k}} \log\left( y_{t}! \right) + \beta

No Anomaly (Baseline)

Here rk=1r_{k}=1 and there is no penalty so

CB(ytTk|λt,rk)=2tTkλt2tTkytlog(λt)+2tTklog(yt!) C_{B}\left(y_{t \in T_{k}} \left| \lambda_t,r_k\right.\right) = 2 \sum\limits_{t \in T_{k}} \lambda_{t} - 2 \sum\limits_{t \in T_{k}} y_{t} \log\left(\lambda_{t}\right) + 2 \sum\limits_{t \in T_{k}} \log\left( y_{t}! \right)

Point anomaly

A point anomaly at time tt is treated as a single time step rate anomaly. Naively the cost could be computed using the formulea for a rate anomaly with r̂k=yt/λt\hat{r}_{k} = y_{t}/\lambda_{t} giving

CP(yt|λt,rk)=2yt2ytlog(yk)+2log(yt!)+β C_{P}\left(y_{t} \left| \lambda_t,r_k\right.\right) = 2 y_{t} - 2 y_{t} \log\left(y_{k}\right) + 2 \log\left( y_{t}! \right) + \beta

However the cost of the point anomaly should be higher then the background cost when rkr_{k} is, in some sense, close to the background value of 1.

Comparing to the baseline cost for a single point shows that a point anomaly will exist when

2yt2ytlog(yk)+2log(yt!)+β<2λt2ytlog(λt)+2log(yt!) 2 y_{t} - 2 y_{t} \log\left(y_{k}\right) + 2 \log\left( y_{t}! \right) + \beta < 2 \lambda_{t} - 2 y_{t} \log\left(\lambda_{t}\right) + 2 \log\left( y_{t}! \right)

Rearrangement gives

$$\begin{equation} \beta < 2 \left( \lambda_{t} - y_{t} \right) + 2 y_{t} \left( \log\left(y_{t}\right) - \log\left(\lambda_{t}\right) \right) \\ < 2 \lambda_{t} \left( 1 - r_{k} \right) + 2 y_{t} \log\left(r_{k}\right) \\ < 2 \lambda_{t} \left( 1 - r_{k} + r_{k} \log\left(r_{k}\right) \right) \end{equation}$$

The follwoing plot shows that selection of β\beta can be selected as a multiple of λ\lambda to avoid rr being to close to 1.

Multivariate Gaussian

A vector of data 𝐲\mathbf{y} of length pp follows a multivariate Gaussian with mean 𝛍\mathbf{\mu} and precision 𝚲\mathbf{\Lambda}.

Background

Without any anomalous periods the log liklihood is given by

l(𝐲|𝛍,𝚲)=p2log(2π)12log|𝚲1|12(𝐲𝛍)T𝚲(𝐲𝛍) l\left(\mathbf{y}\left| \mathbf{\mu}, \mathbf{\Lambda} \right. \right) = -\frac{p}{2}\log\left(2\pi\right) - \frac{1}{2}\log\left|\mathbf{\Lambda}^{-1}\right| - \frac{1}{2} \left(\mathbf{y}-\mathbf{\mu}\right)^{T} \mathbf{\Lambda} \left(\mathbf{y}-\mathbf{\mu}\right)

or as a cost

C(𝐲|𝛍,𝚺)=plog(2π)log|𝚲|+(𝐲𝛍)T𝚲(𝐲𝛍) C\left(\mathbf{y}\left| \mathbf{\mu}, \mathbf{\Sigma} \right. \right) = p\log\left(2\pi\right) - \log\left|\mathbf{\Lambda}\right| + \left(\mathbf{y}-\mathbf{\mu}\right)^{T} \mathbf{\Lambda} \left(\mathbf{y}-\mathbf{\mu}\right)

Modelling anomalies

Consider the iith of KK anomalies occurs of consecuative time steps starting at time s[i]s^\left[i\right] and finished at e[i]e^\left[i\right]. Anomalies do not overlap so e[i]<s[j]e^\left[i\right] < s^\left[j\right] for all 0<i<j<K0<i<j<K

To model change in variance let Ω\Omega be a p×pp \times p diagonal matrix where Ωt,t=ω[i]\Omega_{t,t} = \omega^{\left[i\right]} if there exists ii such that s[i]te[i]s^\left[i\right] \leq t \leq e^\left[i\right] and 1 otherwise.

Let δ\delta be a length KK vector of mean changes and 𝐗\mathbf{X} a p×Kp \times K matrix where 𝐗t,i=1\mathbf{X}_{t,i}=1 if s[i]te[i]s^\left[i\right] \leq t \leq e^\left[i\right] and zero otherwise.

Decomposing Λ\Lambda such that 𝚲=𝐔T𝐔\mathbf{\Lambda} = \mathbf{U}^{T}\mathbf{U} the costs including anomalies becomes

C(𝐲|𝛍,𝚺,δ,Ω)=plog(2π)log|𝐔TΩ𝐔|+(𝐲𝛍𝐗δ)T𝐔TΩ𝐔(𝐲𝛍𝐗δ) C\left(\mathbf{y}\left| \mathbf{\mu}, \mathbf{\Sigma}, \delta, \Omega \right. \right) = p\log\left(2\pi\right) - \log\left| \mathbf{U}^{T}\Omega\mathbf{U} \right| + \left(\mathbf{y}-\mathbf{\mu}-\mathbf{X}\delta\right)^{T} \mathbf{U}^{T}\Omega\mathbf{U} \left(\mathbf{y}-\mathbf{\mu}-\mathbf{X}\delta\right)

Solving for ω\omega and δ\delta

Let 𝐝=𝐔(𝐲=μ)\mathbf{d} = \mathbf{U}\left(\mathbf{y}=\mu\right) and 𝐃=𝐔𝐗\mathbf{D} = \mathbf{U} \mathbf{X}. Using this the cost is C(𝐲|𝛍,𝚺,δ,Ω)=plog(2π)log|𝐔TΩ𝐔|+(𝐝𝐃δ)TΩ(𝐝𝐃δ) C\left(\mathbf{y}\left| \mathbf{\mu}, \mathbf{\Sigma}, \delta, \Omega \right. \right) = p\log\left(2\pi\right) - \log\left| \mathbf{U}^{T}\Omega\mathbf{U} \right| + \left(\mathbf{d} - \mathbf{D}\delta\right)^{T} \Omega \left(\mathbf{d} - \mathbf{D}\delta\right)

Since Ω\Omega is diagonal (𝐝𝐃δ)TΩ(𝐝𝐃δ)=t=1:pΩt,t(𝐝t𝐃t,.δ)2 \left(\mathbf{d} - \mathbf{D}\delta\right)^{T} \Omega \left(\mathbf{d} - \mathbf{D}\delta\right) = \sum\limits_{t=1:p} \Omega_{t,t}\left(\mathbf{d}_{t} - \mathbf{D}_{t,.}\delta\right)^{2}

For an non-anoalous time step tTt \in T^{\prime} we see that Ωt,t=1\Omega_{t,t}=1 and 𝐃t,.=𝟎\mathbf{D}_{t,.}=\mathbf{0} allowing the summation to be written t=1:pΩt,t(𝐝t𝐃t,.δ)2=tT𝐝t2+k=1:Kωkt=s[k]:e[k](𝐝t𝐃t,.δ)2 \sum\limits_{t=1:p} \Omega_{t,t}\left(\mathbf{d}_{t} - \mathbf{D}_{t,.}\delta\right)^{2}= \sum\limits_{t \in T^{\prime}} \mathbf{d}_{t}^{2} + \sum\limits_{k=1:K} \omega_{k} \sum\limits_{t=s^{\left[k\right]}:e^{\left[k\right]}} \left(\mathbf{d}_{t} - \mathbf{D}_{t,.}\delta\right)^{2}

Using the identities for determinants gives log|𝐔TΩ𝐔|=log|𝐔T𝐔|+log|Ω| \log\left| \mathbf{U}^{T}\Omega\mathbf{U} \right| = \log\left| \mathbf{U}^{T}\mathbf{U} \right| + \log\left| \Omega \right| and log|Ω|=k=1:K(s[k]e[k]+1)logωk \log\left| \Omega \right| = \sum\limits_{k=1:K} \left(s^{\left[k\right]}-e^{\left[k\right]}+1\right) \log \omega_{k}

Subsituation of these terms into the cost function gives ωkC(𝐲|𝛍,𝚺,δ,Ω)=(s[k]e[k]+1)ωk+t=s[k]:e[k](𝐝t𝐃t,.δ)2 \frac{\partial}{\partial \omega_{k}} C\left(\mathbf{y}\left| \mathbf{\mu}, \mathbf{\Sigma}, \delta, \Omega \right. \right) = - \frac{\left(s^{\left[k\right]}-e^{\left[k\right]}+1\right)}{\omega_{k}} + \sum\limits_{t=s^{\left[k\right]}:e^{\left[k\right]}} \left(\mathbf{d}_{t} - \mathbf{D}_{t,.}\delta\right)^{2}