Skip to contents

The purpose of this vignette is to present the calculations of the costs for the univariate Gaussian distribution.

Each time step tt belongs to group kk whose time stamps are the set TkT_{k}. A group can have additive mean anomaly μk\mu_{k} and multiplicative variance anomaly σk\sigma_{k} which are common for tTkt \in T_{k}. Assuming the {} known mean mtm_{t} and variance sts_{t} of the data generating distribution gives for tTkt \in T_{k}

P(yt|mt,st,μk,σk)=12πσtskexp(12σkst(ytmtμk)2) P\left(y_t \left| m_{t},s_{t}, \mu_k,\sigma_k\right.\right) = \frac{1}{\sqrt{2\pi\sigma_{t}s_{k}}}\exp\left(-\frac{1}{2\sigma_{k}s_{t}}\left(y_{t} - m_t - \mu_{k}\right)^2\right)

The cost is computed as twice the negative log likelhiood plus a penalty term β\beta giving

C(ytTk|mtTk,stTk,μk,σk)=nklog(2πσk)+tTklog(st)+1σktTk(ytmtμk)2st+β C\left(y_{t \in T_{k}} \left| m_{t \in T_{k}}, s_{t \in T_{k}}, \mu_k,\sigma_k\right.\right) = n_{k} \log\left(2\pi \sigma_{k}\right) +\sum\limits_{t \in T_{k}} \log\left(s_{t}\right) +\frac{1}{\sigma_{k}}\sum\limits_{t \in T_{k}} \frac{\left(y_{t} - m_t - \mu_{k}\right)^2}{s_{t}} + \beta

No Anomaly (Baseline)

Here μk=0\mu_{k}=0 and σk=1\sigma_{k}=1 and there is no penalty so the cost is

CB(ytTk|mtTk,stTk)=nklog(2π)+tTklog(st)+tTk(ytmt)2st C_{B}\left(y_{t \in T_{k}} \left| m_{t \in T_{k}}, s_{t \in T_{k}} \right.\right) = n_{k} \log\left(2\pi \right) +\sum\limits_{t \in T_{k}} \log\left(s_{t}\right) +\sum\limits_{t \in T_{k}} \frac{\left(y_{t} - m_t\right)^2}{s_{t}}

Collective Anomalies

Collective anomalies last more then a single timestep and chnage the mean and/or variance.

Anomaly in Mean and Variance

Estimates μ̂\hat{\mu} of μ\mu and σ̂\hat{\sigma} of σ\sigma can be selected to minimise the cost by taking

μ̂k=(tTkytmtst)(tTk1st)1 \hat{\mu}_{k} = \left( \sum\limits_{t \in T_k} \frac{y_t-m_t}{s_t} \right)\left( \sum\limits_{t \in T_k} \frac{1}{s_t}\right)^{-1} and σ̂k=1nktTk(ytmtμ̂k)2st \hat{\sigma}_{k} = \frac{1}{n_{k}} \sum\limits_{t \in T_k} \frac{ \left(y_t-m_t - \hat{\mu}_{k}\right)^2}{s_t}

Subsituting these into the cost gives CMV(ytTk|mtTk,stTk,μ̂k,σ̂k)=nklog(2πσ̂k)+tTklog(st)+nk+β C_{MV}\left(y_{t \in T_{k}} \left| m_{t \in T_{k}}, s_{t \in T_{k}}, \hat{\mu}_k,\hat{\sigma}_k\right.\right) = n_{k} \log\left(2\pi \hat{\sigma}_{k}\right) +\sum\limits_{t \in T_{k}} \log\left(s_{t}\right) +n_{k} + \beta

Anomaly in Mean

There is no change in variance so σk=1\sigma_{k}=1. The Estimate of μ̂k\hat{\mu}_{k} is unchanged from that for an anomaly in mean and variance so the cost is

CM(ytTk|mtTk,stTk,μ̂k)=nklog(2π)+tTklog(st)+tTk(ytmtμ̂k)2st+β C_{M}\left(y_{t \in T_{k}} \left| m_{t \in T_{k}}, s_{t \in T_{k}}, \hat{\mu}_k\right.\right) = n_{k} \log\left(2\pi\right) +\sum\limits_{t \in T_{k}} \log\left(s_{t}\right) +\sum\limits_{t \in T_k} \frac{ \left(y_t - m_t - \hat{\mu}_{k}\right)^2}{s_t} + \beta

can be written as

CM(ytTk|mtTk,stTk,μ̂k)=nklog(2π)+tTklog(st)+tTk(ytmt)2stμ̂2tTk1st+β C_{M}\left(y_{t \in T_{k}} \left| m_{t \in T_{k}}, s_{t \in T_{k}}, \hat{\mu}_k\right.\right) = n_{k} \log\left(2\pi\right) +\sum\limits_{t \in T_{k}} \log\left(s_{t}\right) +\sum\limits_{t \in T_k} \frac{ \left(y_t - m_t\right)^2}{s_t} -\hat{\mu}^{2} \sum\limits_{t \in T_k} \frac{ 1}{s_t} + \beta

Anomaly in Variance

These is no mean anomaly so μk=0\mu_{k}=0. Estimate of σ̂k\hat{\sigma}_{k} therfore changes to

σ̂k=1nktTk(ytmt)2st \hat{\sigma}_{k} = \frac{1}{n_{k}} \sum\limits_{t \in T_k} \frac{ \left(y_t-m_t\right)^2}{s_t}

and cost is

CV(ytTk|mtTk,stTk,σ̂k)=nklog(2πσ̂k)+tTklog(st)+nk+β C_{V}\left(y_{t \in T_{k}} \left| m_{t \in T_{k}}, s_{t \in T_{k}}, \hat{\sigma}_k\right.\right) = n_{k} \log\left(2\pi \hat{\sigma}_{k}\right) +\sum\limits_{t \in T_{k}} \log\left(s_{t}\right) +n_{k} + \beta

Point anomaly

A point anomaly at time tt is treated as a single time step with an change in mean or variance. However the cost of the point anomaly should be higher then the background cost when yty_{t} is, in some sense, close to the background.

The cost of a point anomaly in mean is expressed as

CPM(yt|mt,st,μ̂k)=log(2πst)+β C_{P_{M}}\left(y_{t}\left| m_{t},s_{t},\hat{\mu}_{k}\right.\right) = \log\left(2\pi s_{t}\right) + \beta

while it’s value relative to the baseline cost is can be expressed using the standardised variable zt=ytmtstz_{t} = \frac{y_t-m_t}{\sqrt{s_{t}}} as

CPM(yt|mt,st,μ̂k)CB(yt|mt,st)=βzt2 C_{P_{M}}\left(y_{t}\left| m_{t},s_{t},\hat{\mu}_{k}\right.\right) - C_{B}\left(y_{t}\left| m_{t},s_{t}\right.\right) = \beta - z_{t}^{2}

The penaly value in this case can then be clearly linked to the number of standard deviations away from the mean at which to declare a point anomaly.

In the case of a point anomaly in variance a naive computation of the cost gives

CPV(yt|mt,st,σ̂k)=log(2πst)+log(zt2)+1+β C_{P_{V}}\left(y_{t}\left| m_{t},s_{t},\hat{\sigma}_{k}\right.\right)= \log\left(2\pi s_{t} \right) + \log\left(z_{t}^{2}\right) + 1 + \beta

and

CPV(yt|mt,st,σ̂k)CB(yt|mt,st)=log(zt2)+1+βzt2 C_{P_{V}}\left(y_{t}\left| m_{t},s_{t},\hat{\sigma}_{k}\right.\right) - C_{B}\left(y_{t}\left| m_{t},s_{t}\right.\right) = \log\left(z_{t}^{2}\right) + 1 + \beta - z_{t}^2

Since lim(zt2)\lim\left(z_{t}^{2}\right) \rightarrow \infty as zt20z_{t}^{2} \rightarrow 0 the niave definition of a point anomaly in variance will always produce point anomalies when ztz_{t} is close to 0. Fisch et al. introduce a term γ\gamma to control this. The modified cost of a point anomaly in variance is expressed as

CPV(yt|mt,st,σ̂k,γ)=log(2πst)+log(γ+zt2)+1+β C_{P_{V}}\left(y_{t}\left| m_{t},s_{t},\hat{\sigma}_{k},\gamma\right.\right)= \log\left(2\pi s_{t} \right) + \log\left(\gamma + z_{t}^{2}\right) + 1 + \beta

Relating this to the background cost we see that point anomalies may be accepted in the capa search when f(zt,γ,β)=CPV(yt|mt,st,σ̂k,γ)CB(yt|mt,st)=log(γ+zt2)+1+βzt2<0 f\left(z_{t},\gamma,\beta\right) = C_{P_{V}}\left(y_{t}\left| m_{t},s_{t},\hat{\sigma}_{k},\gamma\right.\right) - C_{B}\left(y_{t}\left| m_{t},s_{t}\right.\right) = \log\left(\gamma + z_{t}^{2}\right) + 1 + \beta - z_{t}^2 < 0

To ensure that anomalies are not declared when ztz_{t} is close to 0 this implies that γ\gamma should be selected such that

  1. f(0,γ,β)0f\left(0,\gamma,\beta\right) \geq 0

  2. γ<1\gamma < 1 so the gradient zt2f(zt,γ,β)=1γ+zt21>0 \frac{\partial}{\partial z_{t}^2} f\left(z_{t},\gamma,\beta\right) = \frac{1}{\gamma + z_{t}^{2}} - 1 > 0 for ztz_{t} close to zero.

The following plot shows the impact for small zz of three different choices of γ\gamma:

  • The non correction of γ0=0\gamma_{0} = 0 which allows point anomalies as ztz_{t} approaches 0
  • The correction γ1=exp(β)\gamma_{1} = \exp\left(-\beta\right) proposed by Fisch et al.
  • The minimal correction γ2=exp((1+β))\gamma_{2} = \exp\left(-\left(1+\beta\right)\right) for which f(0,γ2,β)=0f\left(0,\gamma_{2},\beta\right) = 0.

It is clear that the difference become small as zz increases. This is supported by the plot below which shows the value of ztz_{t} at which an point anomaly might occur as β\beta varies. Area above the line are potential anomaly values.