Kalman filter is one type of statistical filtering. It was invented in 1970's and has been widely used till now. There are different variations of Kalman. We focus on vector Kalman filter here.
For system model, we have two equations:
\[s[n]=\textbf{A}s[n-1]+\textbf{B}u[n] \qquad (1)\]
\[x[n]=\textbf{H}[n]s[n]+w[n] \qquad (2)\]
\(s[n], x[n], u[n], w[n]\) are vectors; \(\textbf{A}, \textbf{B}, \textbf{H}[n]\) are matrices. \(s[n]\) is the signal and what we want to detect. Equation (1) shows that the signal in time n and time n-1 are correlated. \(s[n]\) is generated by \(s[n-1]\) plus a noise \(u[n]\). \(u[n]\) is white noise with \(N(0, \textbf{Q})\). But equation (1) alone does not give a good estimate of \(s[n]\). As time goes by, \(u[n]\) keeps adding noise to the estimate and eventually will render it unusable. That is where equation (2) comes into the picture. \(x[n]\) in equation (2) is observation. In each moment, the observation of \(x[n]\) is used to refine the estimate of \(s[n]\). \(x[n]\) is also corruptedby noise \(w[n]\), which has spectral density of \(N(0,\textbf{C}[n])\).
Now the problem comes: how should we utilize the observation? If we just have equation (2) but not equation (1), then MMSE or LS estimators are natural choices. But with equation (1), these estimators will be sub-optimal since they fail to consider the previous results of s. Kalman filter was invented to fully utilize the information given in both equations. Under Kalman filter, the estimate of s, s_hat, is derived from equation (1) but then corrected by the observation in equation (2).
The first step of Kalman filter is prediction:
\[\hat{s}[n|n-1]=\textbf{A}\hat{s}[n-1|n-1]\qquad(3)\]
The new MSE of \(\hat{s}[n|n-1]\) becomes
\[\textbf{M}[n|n-1]=\textbf{AM}[n-1|n-1]\textbf{A}'+\textbf{BQB}'\qquad(4)\]
where \(\textbf{A}\)' is the transpose of conjugate of the matrix \(\textbf{A}\). Equation (4) shows that without correction, MSE will keep growing due to \(\textbf{BQB}'\).
For correlation, we uses MMSE filter, which is the best known filter to reduce Mean Square Error. The formula for MMSE filter is
\[\textbf{K}[n]=E(x[n]s[n]')(E(x[n]x[n]')^{-1}\qquad(5)\]
where \((.)^{-1}\) means matrix inverse. Thus
\[\textbf{K}[n]=\textbf{M}[n|n-1]\textbf{H}[n]'(\textbf{H}[n]\textbf{M}[n|n-1]\textbf{H}[n]'+\textbf{C}[n])^{-1}\qquad(6)\]
Then the correction formula for Kalman becomes
\[\hat{s}[n|n]=\hat{s}[n|n-1]+\textbf{K}[n](x[n]-\textbf{H}[n]\hat{s}[n|n-1])\qquad(7)\]
\(x[n]\) is supposed to be a good estimation of \(\textbf{H}[n]\hat{s}[n|n-1]\). The difference between these two are used to correct \(\hat{s}\).
Correction should reduce MSE. The new MSE is
\[E\big((s[n]-\textbf{K}[n]x[n])(s[n]-\textbf{K}[n]x[n])'\big)\]
Based on the orthogonality rule of MMSE filter,
\[E((s[n]-\textbf{K}[n]x[n])x[n]')=0\]
Therefore, MSE can be modified to be
E((s[n]-K[n]x[n])(s[n]-K[n]x[n])')
=E((s[n]-K[n]x[n])s[n]')
=E(s[n]s[n]')-K[n]E(x[n]s[n]')
= (I-K[n]H[n])M[n|n-1]
=M[n|n]
I don't want to drive people to sleep but this is how things are. Kalman filtering is not easy. To use Kalman, you need to come out with the models of A, B, C, Q, H. Depending on the size of s and x, you may also need to inverse a large matrix.
Steven Kay's classic book: Fundamentals of statistical signal processing -- estimation theory, has one chapter dedicated to Kalman filter. You can refer to that book for more knowledge of Kalman filtering.