A power iteration is a simple mechanism to calculate the largest Eigenvalue
and corresponding Eigenvector of a diagonalizable matrix - also known as
a Von Mises iteration. This short blog post should give a simple introduction
on what a power iteration does and how one can implement them - with a little
bit less mathematical terms than usually used during such an introduction. Note that
the formalities usually used are important so this should only be used to grasp
the idea before looking at this stuff more formally. If youāre a mathematician
or already know what the power iteration does this blog post will be too inaccurate
and really useless to you.
So first - what is an Eigenvector and an Eigenvalue? Basically itās an vector $\vec{v}$
and a scalar value $\lambda$ that fulfills the condition $A * \vec{v} = \lambda * \vec{v}$.
This characteristic vector thus is a vector into which the linear transformation
that the matrix $A$ represents is only stretched but not rotated into. One can imagine
this as being some principal basis of the vector space that the matrix projects
into. Note that for negative eigenvalues the direction of the vector is changed - itās
pointing exactly into the opposite direction on each iteration.
Eigenvalues and Eigenvector can be used to study principal axis of rigid body motion,
detecting main propagation directions in graph models, are used heavily in quantum
mechanics, are a utility used in diagonalization of matrices, are used in many
big data analysis projects (such as social network analysis or ranking search
engine results), may be used to characterize objects (face recognition, computational
linguistics, fingerprints, etc.).
One of the simplest methods to calculate an Eigenvector is based on the projective
transformation - but only works in case Eigenvectors are clearly separated. One can
imagine that a vector that is transformed by a linear transformation is basically
projected onto the principal components - i.e. the eigenvectors - of a transformation.
This projection includes stretching or contraction along these components. If one
now multiplies an vector with a matrix, transforms it and then renormalizes it
again the vector will gain length into the direction of the vector with the largest
eigenvalue the most. If one repeats that step the vector again gets mainly projected
onto the most major principal component - renormalization will then again ensure
the effect affects the main vector at most. If one applies this iteratively one can
see that the vector converges in case the matrix fulfills diagonalizable property
and the vector has contained at least some components that are not orthogonal to
the most major principal vector.
This is in fact what a power iteration is - simply generate a random initial vector,
apply it to the transformation matrix, normalize the vector and repeat the process
till it converges:
[
\vec{v_1} = A * \vec{v_0} \\
\vec{v_2} = A * \vec{v_1} = A * A * \vec{v_0} \\
\vec{v_3} = A * \vec{v_2} = A * A * A * \vec{v_0} \\
\ldots
\vec{v}_{i} = \overbrace{A * A * \ldots * A}^{i times} * \vec{v_0} \\
\vec{v}_{i} = (A)^i * \vec{v_0}
]
From this notation one can easily see why itās called the power method. Instead
of computing $A^n$ directly which would involve at least $log(n)$ matrix multiplications
one can directly calculate the matrix vector product $A * v$ in a tight loop.
Even if this now looks more cumbersome it usually requires way less operations (especially
for sparse matrices) and makes this algorithm really usable for large scale problems.
For a $m \times m$ matrix each multiplication requires $m^3$ multiplications and $m^3 - m^2$
additions. When one always multiplies the vector one only requires $m^2$ multiplications
and $m^2 - m$ additions.

On the other hand one can also renormalize the vector after each iteration:
[
\vec{v}_{i,0} = \frac{1}{\mid \vec{v}_i \mid} * \vec{v}_i \\
\vec{v}_{i+1} = A * \vec{v}_{i,0}
]
To check for convergence one can simply determine the quadratic distance between
two iteration steps:
[
e^2 = (\vec{v}_{i,0} - \vec{v}_{i-1,0})^2
]
As soon as the distance falls below a given threshold the algorithm aborts and has
hopefully found the first eigenvector. Note that there is a special case that one
has to handle - in case a negative Eigenvalue is present the vector will swap itās
sign every time. There are two ways to deal with this - do the distance calculation
every two iteration steps or simply checking for this special situation explicitly.
Another even better method to check for convergence is just looking at the squared
sine of the angle between the two vectors:
[
cos(\alpha) = \frac{v_i}{\mid v_i \mid} * \frac{v_{i+1}}{\mid v_{i+1} \mid} \\
sin^2(\alpha) = 1 - cos^2(\alpha) \\
sin^2(\alpha) = 1 - \frac{(v_i * v_{i+1})^2}{(\mid v_{i} \mid * \mid v_{i+1} \mid)^2}
]
One could of course also look at $sin(\alpha)$ directly but using the square
allows one to speed up the calculation a little bit.
Since the eigenvalue is defined by
[
A * \vec{v} = \lambda * \vec{v}
]
one might calculate the eigenvalue by simply using
[
\underbrace{A * \vec{v}_{i,0}}_{\vec{v}_{i+1}} = \lambda * \vec{v}_{i,0} \\
\vec{v}_{i+1} = \lambda * \vec{v}_{i,0} \\
\to v_{i+1,n} = \lambda * \vec{v}_{i,0,n} \forall n \\
\to \lambda = \frac{v_{i+1,n}}{\vec{v}_{i,0,n}} \forall n
]
In this case one can simply select one of the non zero components of the
vectors $\vec{v}_{i+1}$ and $\vec{v}_{i}$ and directly calculate $\lambda$. Usually
itās a better idea to use the Rayleigh quotient to calculate the associated
Eigenvalue though:
[
R(A,\vec{v}) = \frac{\vec{v}^{* } A * \vec{v}}{\vec{v}^{* } * \vec{v}}
]
To imagine what this quotient does one can recall how one would calculate the
length of a vector $\mid \vec{v} \mid = \sqrt{ \sum_i v_i * v_i } = \sqrt{ \vec{v} * \vec{v} }$.
In case the vector $\vec{v}$ is an Eigenvector the operation $A * \vec{v}$
just stretches the vector. Thus the upper side and the lower side only
differ by the factor of the Eigenvalue (i.e. the factor that the vector
gets stretched or contracted).
Of course the method described here is only half of the reality for many scientific
applications but itās sufficient as a basis to develop a sophisticated implementation.
Deflation (Hotelling deflation)
Now that one has determined the first Eigenvector and Eigenvalue one might extend
the method to determine the remaining ones. Since the Eigenvectors span the space
that the linear transform works on one can imagine moving into a projection subspace
that does not contain the already known component. This is done by rotating the
whole coordinate system so that the largest Eigenvalue gets to be the smallest
one by a simple transformation.
[
B = A - \lambda_i * (v_i * v_i^T)
]
This transformation is sometimes called Hotelling deflation. In practice this
method gets unstable due to rounding errors for larger matrices - a good alternative
is either using one of the composition methods instead a power iteration with
deflation or using something like the Wielandt deflation. The main problem with
this transformation though is that it only works for symmetric real matrices. This
is based on the fact that for real symmetric matrices Eigenvectors are orthogonal.
A sample
How does such a calculation look in practice? To demonstrate that this method works
intuitively one can compare an analytic result for a small matrix with the numerical
results. To do this Iāll illustrate the steps taken by the algorithm for the matrix
[
\begin{pmatrix}
2 & -3 & 1 \\
-3 & 1 & 3 \\
1 & 3 & -4
\end{pmatrix}
]
This is a pretty simple example since one can calculate the Eigenvalues simply
by evaluating the characteristic polynomial $-\lambda^3 - \lambda^2 + 2\lambda = 0$
which yields the Eigenvalues $\lambda_1 = -6.04428, \lambda_2 = 4.72944, \lambda_3 = 0.314839$. Solving
the simple equation $(A - \lambda_i * \mathbb{1}) * v_i = 0$ one gets the three
associated Eigenvectors:
Eigenvalue |
Eigenvector |
Normalized Eigenvector |
$\lambda_1 = -6.04428 $ |
$\begin{pmatrix} -0.336597 \ -0.569277 \ 1 \end{pmatrix}$ |
$\begin{pmatrix} -0.28075362483 \ -0.47483067669 \ 0.83409425762 \end{pmatrix}$ |
$\lambda_2 = 4.72944 $ |
$\begin{pmatrix} -4.46933 \ 4.39959 \ 1 \end{pmatrix}$ |
$\begin{pmatrix} -0.7037546818 \ 0.69277320326 \ 0.15746312798 \end{pmatrix}$ |
$\lambda_3 = 0.314839 $ |
$\begin{pmatrix} 1.2345 \ 1.02678 \ 1 \end{pmatrix}$ |
$\begin{pmatrix} 0.65261146214 \ 0.54280145573 \ 0.52864435977 \end{pmatrix}$ |
Running the implementation presented in the listing below one sees that the
first Eigenvector converges pretty fast. After only 54 iterations the vector
only deviates less than $1.3 * 10^{-13}$. Of course the threshold is one of
the tuning parameters for this type of algorithm.

[
v_{1,1} = \begin{pmatrix} -0.019532\\0.755822\\-0.654486\\\end{pmatrix} \\
v_{1,2} = \begin{pmatrix} -0.509576\\-0.197744\\0.837394\\\end{pmatrix} \\
v_{1,3} = \begin{pmatrix} 0.069789\\0.651825\\-0.755152\\\end{pmatrix} \\
v_{1,4} = \begin{pmatrix} -0.432158\\-0.306421\\0.848142\\\end{pmatrix} \\
v_{1,5} = \begin{pmatrix} 0.150906\\0.590611\\-0.792720\\\end{pmatrix} \\
v_{1,6} = \begin{pmatrix} -0.376686\\-0.372944\\0.847951\\\end{pmatrix} \\
v_{1,7} = \begin{pmatrix} 0.201524\\0.548225\\-0.811688\\\end{pmatrix} \\
v_{1,8} = \begin{pmatrix} -0.340514\\-0.413166\\0.844597\\\end{pmatrix} \\
v_{1,9} = \begin{pmatrix} 0.232468\\0.520612\\-0.821536\\\end{pmatrix} \\
v_{1,10} = \begin{pmatrix} -0.317680\\-0.437397\\0.841287\\\end{pmatrix} \\
v_{1,11} = \begin{pmatrix} 0.251303\\0.503147\\-0.826855\\\end{pmatrix} \\
v_{1,12} = \begin{pmatrix} -0.303477\\-0.452038\\0.838787\\\end{pmatrix} \\
v_{1,13} = \begin{pmatrix} 0.262773\\0.492261\\-0.829837\\\end{pmatrix} \\
v_{1,14} = \begin{pmatrix} -0.294708\\-0.460918\\0.837079\\\end{pmatrix} \\
v_{1,15} = \begin{pmatrix} 0.269766\\0.485529\\-0.831558\\\end{pmatrix} \\
v_{1,16} = \begin{pmatrix} -0.289314\\-0.466321\\0.835968\\\end{pmatrix} \\
v_{1,17} = \begin{pmatrix} 0.274036\\0.481382\\-0.832572\\\end{pmatrix} \\
v_{1,18} = \begin{pmatrix} -0.286002\\-0.469616\\0.835262\\\end{pmatrix} \\
v_{1,19} = \begin{pmatrix} 0.276646\\0.478834\\-0.833177\\\end{pmatrix} \\
v_{1,20} = \begin{pmatrix} -0.283971\\-0.471629\\0.834821\\\end{pmatrix} \\
v_{1,21} = \begin{pmatrix} 0.278242\\0.477271\\-0.833543\\\end{pmatrix} \\
v_{1,22} = \begin{pmatrix} -0.282726\\-0.472859\\0.834548\\\end{pmatrix} \\
v_{1,23} = \begin{pmatrix} 0.279219\\0.476313\\-0.833764\\\end{pmatrix} \\
v_{1,24} = \begin{pmatrix} -0.281964\\-0.473611\\0.834379\\\end{pmatrix} \\
v_{1,25} = \begin{pmatrix} 0.279816\\0.475726\\-0.833899\\\end{pmatrix} \\
v_{1,26} = \begin{pmatrix} -0.281497\\-0.474072\\0.834275\\\end{pmatrix} \\
v_{1,27} = \begin{pmatrix} 0.280182\\0.475366\\-0.833981\\\end{pmatrix} \\
v_{1,28} = \begin{pmatrix} -0.281211\\-0.474353\\0.834212\\\end{pmatrix} \\
v_{1,29} = \begin{pmatrix} 0.280406\\0.475146\\-0.834032\\\end{pmatrix} \\
v_{1,30} = \begin{pmatrix} -0.281036\\-0.474526\\0.834173\\\end{pmatrix} \\
v_{1,31} = \begin{pmatrix} 0.280543\\0.475011\\-0.834062\\\end{pmatrix} \\
v_{1,32} = \begin{pmatrix} -0.280929\\-0.474632\\0.834149\\\end{pmatrix} \\
v_{1,33} = \begin{pmatrix} 0.280627\\0.474929\\-0.834081\\\end{pmatrix} \\
v_{1,34} = \begin{pmatrix} -0.280863\\-0.474696\\0.834134\\\end{pmatrix} \\
v_{1,35} = \begin{pmatrix} 0.280678\\0.474878\\-0.834093\\\end{pmatrix} \\
v_{1,36} = \begin{pmatrix} -0.280823\\-0.474736\\0.834125\\\end{pmatrix} \\
v_{1,37} = \begin{pmatrix} 0.280710\\0.474847\\-0.834100\\\end{pmatrix} \\
v_{1,38} = \begin{pmatrix} -0.280798\\-0.474760\\0.834120\\\end{pmatrix} \\
v_{1,39} = \begin{pmatrix} 0.280729\\0.474828\\-0.834104\\\end{pmatrix} \\
v_{1,40} = \begin{pmatrix} -0.280783\\-0.474775\\0.834116\\\end{pmatrix} \\
v_{1,41} = \begin{pmatrix} 0.280741\\0.474817\\-0.834107\\\end{pmatrix} \\
v_{1,42} = \begin{pmatrix} -0.280774\\-0.474784\\0.834114\\\end{pmatrix} \\
v_{1,43} = \begin{pmatrix} 0.280748\\0.474809\\-0.834108\\\end{pmatrix} \\
v_{1,44} = \begin{pmatrix} -0.280768\\-0.474789\\0.834113\\\end{pmatrix} \\
v_{1,45} = \begin{pmatrix} 0.280752\\0.474805\\-0.834109\\\end{pmatrix} \\
v_{1,46} = \begin{pmatrix} -0.280765\\-0.474793\\0.834112\\\end{pmatrix} \\
v_{1,47} = \begin{pmatrix} 0.280755\\0.474802\\-0.834110\\\end{pmatrix} \\
v_{1,48} = \begin{pmatrix} -0.280763\\-0.474795\\0.834112\\\end{pmatrix} \\
v_{1,49} = \begin{pmatrix} 0.280757\\0.474801\\-0.834110\\\end{pmatrix} \\
v_{1,50} = \begin{pmatrix} -0.280761\\-0.474796\\0.834111\\\end{pmatrix} \\
v_{1,51} = \begin{pmatrix} 0.280758\\0.474800\\-0.834110\\\end{pmatrix} \\
v_{1,52} = \begin{pmatrix} -0.280761\\-0.474797\\0.834111\\\end{pmatrix} \\
v_{1,53} = \begin{pmatrix} 0.280758\\0.474799\\-0.834111\\\end{pmatrix} \\
v_{1,54} = \begin{pmatrix} -0.280760\\-0.474797\\0.834111\\\end{pmatrix}
]
As one can see the first normalized Eigenvector is determined to be
[
v_{1} = \begin{pmatrix} -0.280760 \\ -0.474797 \\ 0.834111 \end{pmatrix}
]
This is equal to the largest Eigenvalue (after applying the absolute function).
Calculating the Eigenvalue using the Rayleigh coefficient:
[
\lambda_1 = -6.044248
]
Now comes the deflation step. In this simple sample the matrix is modified to be
[
B = A - \lambda_1 * (v_1 \otimes v_1) \\ B = \begin{pmatrix} 2 & -3 & 1 \\ -3 & 1 & 3 \\ 1 & 3 & -4 \end{pmatrix} - (-6.044248) * \left( \begin{pmatrix} -0.280760 \\ -0.474797 \\ 0.834111 \end{pmatrix} \otimes \begin{pmatrix} -0.280760 \\ -0.474797 \\ 0.834111 \end{pmatrix} \right) \\
B = \begin{pmatrix} 2.476445 & -2.194277 & -0.415472\\-2.194277 & 2.362571 & 0.606274\\-0.415472 & 0.606274 & 0.205232\\ \end{pmatrix}
]
Running the same iteration again yields the sequence (again start vector is of
course random):
[
v_{2,0} = \begin{pmatrix} 0.163004\\0.592286\\0.789068\\\end{pmatrix} \\
v_{2,1} = \begin{pmatrix} -0.610858\\0.758720\\0.226266\\\end{pmatrix} \\
v_{2,2} = \begin{pmatrix} -0.697907\\0.697586\\0.162173\\\end{pmatrix} \\
v_{2,3} = \begin{pmatrix} -0.703368\\0.693094\\0.157778\\\end{pmatrix} \\
v_{2,4} = \begin{pmatrix} -0.703730\\0.692794\\0.157485\\\end{pmatrix} \\
v_{2,5} = \begin{pmatrix} -0.703754\\0.692774\\0.157466\\\end{pmatrix} \\
v_{2,6} = \begin{pmatrix} -0.703755\\0.692772\\0.157464\\\end{pmatrix} \\
v_{2,7} = \begin{pmatrix} -0.703755\\0.692772\\0.157464\\\end{pmatrix} \\
v_{2,8} = \begin{pmatrix} -0.703755\\0.692772\\0.157464\\\end{pmatrix}
]
Calculating the Eigenvalue using the Rayleigh coefficient again yields
[
\lambda_2 = 4.729438
]
As one can see there is starting to be some numerical rounding error to
show up when comparing to the analytic $\lambda_2 = 4.72944$ and normalized
Eigenvector $\begin{pmatrix} -0.7037546818 & 0.69277320326 & 0.15746312798 \end{pmatrix}^T$.
The next deflation step yields
[
C = \begin{pmatrix} 0.134089 & 0.111524 & 0.108626\\0.111524 & 0.092755 & 0.090354\\0.108626 & 0.090354 & 0.087966\\\end{pmatrix}
]
Running the power iteration again and calculating the Eigenvalue using the
Rayleigh coefficient a last time yields the third Eigenvector.
[
v_{3,0} = \begin{pmatrix} 0.239342\\0.923750\\0.298999\\\end{pmatrix} \\
v_{3,1} = \begin{pmatrix} 0.652604\\0.542786\\0.528669\\\end{pmatrix} \\
v_{3,2} = \begin{pmatrix} 0.652613\\0.542800\\0.528644\\\end{pmatrix} \\
v_{3,3} = \begin{pmatrix} 0.652613\\0.542800\\0.528644\\\end{pmatrix} \\
v_{3,4} = \begin{pmatrix} 0.652613\\0.542800\\0.528644\\\end{pmatrix} \\
\lambda_3 = 0.314839
]
Now rounding error even gets more significant (again comparing with $\lambda_3 = 0.314839$
and the Eigenvector $\begin{pmatrix} 0.65261146214 & 0.54280145573 & 0.52864435977 \end{pmatrix}^T$.
Pitfalls
Since the power iteration is one of the simplest iterative numerical methods to
calculate Eigenvalues one can guess that there is a number if potential pitfalls:
- First itās not the fastest or most efficient method. Based on the power iteration
one can use Krylov subspace methods (A Krylov subspace is the space spawned
by the vectors $\left(v_{0}, A * v_{0}, A^2 * v_{0}, A^3 * v_{0}, \ldots, A^n * v_{0} \right)$.
One can use a modified Gram-Schmidt method to generate an orthonormal basis
for this Krylov subspace - the basis vectors are then a good approximation to
the real Eigenvectors for example (this is usually called Arnoldi iteration).
The most commonly used algorithm today is the Francis iteration.
- If two Eigenvalues lie close convergence is slow - or even impossible. To resolve
that problem one could add a stochastic momentum term or apply some kind of restart.
- Since one uses random numbers for the initial vector the system might not converge
or miss an Eigenvector. To counter that one might have to implement timeouts
and restarts - or shifts in case one has a good guess for the Eigenvalues oneās
searching for.
- The deflation method used above only works for symmetric real matrices. The
QR method / Francis iteration is a way better idea to be used in case one
works with non symmetric matrices.
- The power iteration has a huge problem with accumulating rounding errors - especially
when determining more than only the largest Eigenvalue / Eigenvector, though it
can still be pretty useful as the basis for some large scale distributed
Eigenvector algorithms like ranking search engine results ā¦
This article is tagged: Math, Programming, How stuff works, Machine learning, Computational linear algebra