A less mathematical introduction to power iterations

03 Jul 2021 - tsp
Last update 02 Jan 2022
Reading time 12 mins

A power iteration is a simple mechanism to calculate the largest Eigenvalue and corresponding Eigenvector of a diagonalizable matrix - also known as a Von Mises iteration. This short blog post should give a simple introduction on what a power iteration does and how one can implement them - with a little bit less mathematical terms than usually used during such an introduction. Note that the formalities usually used are important so this should only be used to grasp the idea before looking at this stuff more formally. If you’re a mathematician or already know what the power iteration does this blog post will be too inaccurate and really useless to you.

So first - what is an Eigenvector and an Eigenvalue? Basically it’s an vector $\vec{v}$ and a scalar value $\lambda$ that fulfills the condition $A * \vec{v} = \lambda * \vec{v}$. This characteristic vector thus is a vector into which the linear transformation that the matrix $A$ represents is only stretched but not rotated into. One can imagine this as being some principal basis of the vector space that the matrix projects into. Note that for negative eigenvalues the direction of the vector is changed - it’s pointing exactly into the opposite direction on each iteration.

Eigenvalues and Eigenvector can be used to study principal axis of rigid body motion, detecting main propagation directions in graph models, are used heavily in quantum mechanics, are a utility used in diagonalization of matrices, are used in many big data analysis projects (such as social network analysis or ranking search engine results), may be used to characterize objects (face recognition, computational linguistics, fingerprints, etc.).

One of the simplest methods to calculate an Eigenvector is based on the projective transformation - but only works in case Eigenvectors are clearly separated. One can imagine that a vector that is transformed by a linear transformation is basically projected onto the principal components - i.e. the eigenvectors - of a transformation. This projection includes stretching or contraction along these components. If one now multiplies an vector with a matrix, transforms it and then renormalizes it again the vector will gain length into the direction of the vector with the largest eigenvalue the most. If one repeats that step the vector again gets mainly projected onto the most major principal component - renormalization will then again ensure the effect affects the main vector at most. If one applies this iteratively one can see that the vector converges in case the matrix fulfills diagonalizable property and the vector has contained at least some components that are not orthogonal to the most major principal vector.

This is in fact what a power iteration is - simply generate a random initial vector, apply it to the transformation matrix, normalize the vector and repeat the process till it converges:

[ \vec{v_1} = A * \vec{v_0} \\ \vec{v_2} = A * \vec{v_1} = A * A * \vec{v_0} \\ \vec{v_3} = A * \vec{v_2} = A * A * A * \vec{v_0} \\ \ldots \vec{v}_{i} = \overbrace{A * A * \ldots * A}^{i times} * \vec{v_0} \\ \vec{v}_{i} = (A)^i * \vec{v_0} ]

From this notation one can easily see why it’s called the power method. Instead of computing $A^n$ directly which would involve at least $log(n)$ matrix multiplications one can directly calculate the matrix vector product $A * v$ in a tight loop. Even if this now looks more cumbersome it usually requires way less operations (especially for sparse matrices) and makes this algorithm really usable for large scale problems.

For a $m \times m$ matrix each multiplication requires $m^3$ multiplications and $m^3 - m^2$ additions. When one always multiplies the vector one only requires $m^2$ multiplications and $m^2 - m$ additions.

Comparison of matrix and vector multiplications

On the other hand one can also renormalize the vector after each iteration:

[ \vec{v}_{i,0} = \frac{1}{\mid \vec{v}_i \mid} * \vec{v}_i \\ \vec{v}_{i+1} = A * \vec{v}_{i,0} ]

To check for convergence one can simply determine the quadratic distance between two iteration steps:

[ e^2 = (\vec{v}_{i,0} - \vec{v}_{i-1,0})^2 ]

As soon as the distance falls below a given threshold the algorithm aborts and has hopefully found the first eigenvector. Note that there is a special case that one has to handle - in case a negative Eigenvalue is present the vector will swap it’s sign every time. There are two ways to deal with this - do the distance calculation every two iteration steps or simply checking for this special situation explicitly. Another even better method to check for convergence is just looking at the squared sine of the angle between the two vectors:

[ cos(\alpha) = \frac{v_i}{\mid v_i \mid} * \frac{v_{i+1}}{\mid v_{i+1} \mid} \\ sin^2(\alpha) = 1 - cos^2(\alpha) \\ sin^2(\alpha) = 1 - \frac{(v_i * v_{i+1})^2}{(\mid v_{i} \mid * \mid v_{i+1} \mid)^2} ]

One could of course also look at $sin(\alpha)$ directly but using the square allows one to speed up the calculation a little bit.

Since the eigenvalue is defined by

[ A * \vec{v} = \lambda * \vec{v} ]

one might calculate the eigenvalue by simply using

[ \underbrace{A * \vec{v}_{i,0}}_{\vec{v}_{i+1}} = \lambda * \vec{v}_{i,0} \\ \vec{v}_{i+1} = \lambda * \vec{v}_{i,0} \\ \to v_{i+1,n} = \lambda * \vec{v}_{i,0,n} \forall n \\ \to \lambda = \frac{v_{i+1,n}}{\vec{v}_{i,0,n}} \forall n ]

In this case one can simply select one of the non zero components of the vectors $\vec{v}_{i+1}$ and $\vec{v}_{i}$ and directly calculate $\lambda$. Usually it’s a better idea to use the Rayleigh quotient to calculate the associated Eigenvalue though:

[ R(A,\vec{v}) = \frac{\vec{v}^{* } A * \vec{v}}{\vec{v}^{* } * \vec{v}} ]

To imagine what this quotient does one can recall how one would calculate the length of a vector $\mid \vec{v} \mid = \sqrt{ \sum_i v_i * v_i } = \sqrt{ \vec{v} * \vec{v} }$. In case the vector $\vec{v}$ is an Eigenvector the operation $A * \vec{v}$ just stretches the vector. Thus the upper side and the lower side only differ by the factor of the Eigenvalue (i.e. the factor that the vector gets stretched or contracted).

Of course the method described here is only half of the reality for many scientific applications but it’s sufficient as a basis to develop a sophisticated implementation.

Deflation (Hotelling deflation)

Now that one has determined the first Eigenvector and Eigenvalue one might extend the method to determine the remaining ones. Since the Eigenvectors span the space that the linear transform works on one can imagine moving into a projection subspace that does not contain the already known component. This is done by rotating the whole coordinate system so that the largest Eigenvalue gets to be the smallest one by a simple transformation.

[ B = A - \lambda_i * (v_i * v_i^T) ]

This transformation is sometimes called Hotelling deflation. In practice this method gets unstable due to rounding errors for larger matrices - a good alternative is either using one of the composition methods instead a power iteration with deflation or using something like the Wielandt deflation. The main problem with this transformation though is that it only works for symmetric real matrices. This is based on the fact that for real symmetric matrices Eigenvectors are orthogonal.

A sample

How does such a calculation look in practice? To demonstrate that this method works intuitively one can compare an analytic result for a small matrix with the numerical results. To do this I’ll illustrate the steps taken by the algorithm for the matrix

[ \begin{pmatrix} 2 & -3 & 1 \\ -3 & 1 & 3 \\ 1 & 3 & -4 \end{pmatrix} ]

This is a pretty simple example since one can calculate the Eigenvalues simply by evaluating the characteristic polynomial $-\lambda^3 - \lambda^2 + 2\lambda = 0$ which yields the Eigenvalues $\lambda_1 = -6.04428, \lambda_2 = 4.72944, \lambda_3 = 0.314839$. Solving the simple equation $(A - \lambda_i * \mathbb{1}) * v_i = 0$ one gets the three associated Eigenvectors:

Eigenvalue	Eigenvector	Normalized Eigenvector
$\lambda_1 = -6.04428 $	$\begin{pmatrix} -0.336597 \ -0.569277 \ 1 \end{pmatrix}$	$\begin{pmatrix} -0.28075362483 \ -0.47483067669 \ 0.83409425762 \end{pmatrix}$
$\lambda_2 = 4.72944 $	$\begin{pmatrix} -4.46933 \ 4.39959 \ 1 \end{pmatrix}$	$\begin{pmatrix} -0.7037546818 \ 0.69277320326 \ 0.15746312798 \end{pmatrix}$
$\lambda_3 = 0.314839 $	$\begin{pmatrix} 1.2345 \ 1.02678 \ 1 \end{pmatrix}$	$\begin{pmatrix} 0.65261146214 \ 0.54280145573 \ 0.52864435977 \end{pmatrix}$

Running the implementation presented in the listing below one sees that the first Eigenvector converges pretty fast. After only 54 iterations the vector only deviates less than $1.3 * 10^{-13}$. Of course the threshold is one of the tuning parameters for this type of algorithm.

Convergence speed of the first Eigenvector

[ v_{1,1} = \begin{pmatrix} -0.019532\\0.755822\\-0.654486\\\end{pmatrix} \\ v_{1,2} = \begin{pmatrix} -0.509576\\-0.197744\\0.837394\\\end{pmatrix} \\ v_{1,3} = \begin{pmatrix} 0.069789\\0.651825\\-0.755152\\\end{pmatrix} \\ v_{1,4} = \begin{pmatrix} -0.432158\\-0.306421\\0.848142\\\end{pmatrix} \\ v_{1,5} = \begin{pmatrix} 0.150906\\0.590611\\-0.792720\\\end{pmatrix} \\ v_{1,6} = \begin{pmatrix} -0.376686\\-0.372944\\0.847951\\\end{pmatrix} \\ v_{1,7} = \begin{pmatrix} 0.201524\\0.548225\\-0.811688\\\end{pmatrix} \\ v_{1,8} = \begin{pmatrix} -0.340514\\-0.413166\\0.844597\\\end{pmatrix} \\ v_{1,9} = \begin{pmatrix} 0.232468\\0.520612\\-0.821536\\\end{pmatrix} \\ v_{1,10} = \begin{pmatrix} -0.317680\\-0.437397\\0.841287\\\end{pmatrix} \\ v_{1,11} = \begin{pmatrix} 0.251303\\0.503147\\-0.826855\\\end{pmatrix} \\ v_{1,12} = \begin{pmatrix} -0.303477\\-0.452038\\0.838787\\\end{pmatrix} \\ v_{1,13} = \begin{pmatrix} 0.262773\\0.492261\\-0.829837\\\end{pmatrix} \\ v_{1,14} = \begin{pmatrix} -0.294708\\-0.460918\\0.837079\\\end{pmatrix} \\ v_{1,15} = \begin{pmatrix} 0.269766\\0.485529\\-0.831558\\\end{pmatrix} \\ v_{1,16} = \begin{pmatrix} -0.289314\\-0.466321\\0.835968\\\end{pmatrix} \\ v_{1,17} = \begin{pmatrix} 0.274036\\0.481382\\-0.832572\\\end{pmatrix} \\ v_{1,18} = \begin{pmatrix} -0.286002\\-0.469616\\0.835262\\\end{pmatrix} \\ v_{1,19} = \begin{pmatrix} 0.276646\\0.478834\\-0.833177\\\end{pmatrix} \\ v_{1,20} = \begin{pmatrix} -0.283971\\-0.471629\\0.834821\\\end{pmatrix} \\ v_{1,21} = \begin{pmatrix} 0.278242\\0.477271\\-0.833543\\\end{pmatrix} \\ v_{1,22} = \begin{pmatrix} -0.282726\\-0.472859\\0.834548\\\end{pmatrix} \\ v_{1,23} = \begin{pmatrix} 0.279219\\0.476313\\-0.833764\\\end{pmatrix} \\ v_{1,24} = \begin{pmatrix} -0.281964\\-0.473611\\0.834379\\\end{pmatrix} \\ v_{1,25} = \begin{pmatrix} 0.279816\\0.475726\\-0.833899\\\end{pmatrix} \\ v_{1,26} = \begin{pmatrix} -0.281497\\-0.474072\\0.834275\\\end{pmatrix} \\ v_{1,27} = \begin{pmatrix} 0.280182\\0.475366\\-0.833981\\\end{pmatrix} \\ v_{1,28} = \begin{pmatrix} -0.281211\\-0.474353\\0.834212\\\end{pmatrix} \\ v_{1,29} = \begin{pmatrix} 0.280406\\0.475146\\-0.834032\\\end{pmatrix} \\ v_{1,30} = \begin{pmatrix} -0.281036\\-0.474526\\0.834173\\\end{pmatrix} \\ v_{1,31} = \begin{pmatrix} 0.280543\\0.475011\\-0.834062\\\end{pmatrix} \\ v_{1,32} = \begin{pmatrix} -0.280929\\-0.474632\\0.834149\\\end{pmatrix} \\ v_{1,33} = \begin{pmatrix} 0.280627\\0.474929\\-0.834081\\\end{pmatrix} \\ v_{1,34} = \begin{pmatrix} -0.280863\\-0.474696\\0.834134\\\end{pmatrix} \\ v_{1,35} = \begin{pmatrix} 0.280678\\0.474878\\-0.834093\\\end{pmatrix} \\ v_{1,36} = \begin{pmatrix} -0.280823\\-0.474736\\0.834125\\\end{pmatrix} \\ v_{1,37} = \begin{pmatrix} 0.280710\\0.474847\\-0.834100\\\end{pmatrix} \\ v_{1,38} = \begin{pmatrix} -0.280798\\-0.474760\\0.834120\\\end{pmatrix} \\ v_{1,39} = \begin{pmatrix} 0.280729\\0.474828\\-0.834104\\\end{pmatrix} \\ v_{1,40} = \begin{pmatrix} -0.280783\\-0.474775\\0.834116\\\end{pmatrix} \\ v_{1,41} = \begin{pmatrix} 0.280741\\0.474817\\-0.834107\\\end{pmatrix} \\ v_{1,42} = \begin{pmatrix} -0.280774\\-0.474784\\0.834114\\\end{pmatrix} \\ v_{1,43} = \begin{pmatrix} 0.280748\\0.474809\\-0.834108\\\end{pmatrix} \\ v_{1,44} = \begin{pmatrix} -0.280768\\-0.474789\\0.834113\\\end{pmatrix} \\ v_{1,45} = \begin{pmatrix} 0.280752\\0.474805\\-0.834109\\\end{pmatrix} \\ v_{1,46} = \begin{pmatrix} -0.280765\\-0.474793\\0.834112\\\end{pmatrix} \\ v_{1,47} = \begin{pmatrix} 0.280755\\0.474802\\-0.834110\\\end{pmatrix} \\ v_{1,48} = \begin{pmatrix} -0.280763\\-0.474795\\0.834112\\\end{pmatrix} \\ v_{1,49} = \begin{pmatrix} 0.280757\\0.474801\\-0.834110\\\end{pmatrix} \\ v_{1,50} = \begin{pmatrix} -0.280761\\-0.474796\\0.834111\\\end{pmatrix} \\ v_{1,51} = \begin{pmatrix} 0.280758\\0.474800\\-0.834110\\\end{pmatrix} \\ v_{1,52} = \begin{pmatrix} -0.280761\\-0.474797\\0.834111\\\end{pmatrix} \\ v_{1,53} = \begin{pmatrix} 0.280758\\0.474799\\-0.834111\\\end{pmatrix} \\ v_{1,54} = \begin{pmatrix} -0.280760\\-0.474797\\0.834111\\\end{pmatrix} ]

As one can see the first normalized Eigenvector is determined to be

[ v_{1} = \begin{pmatrix} -0.280760 \\ -0.474797 \\ 0.834111 \end{pmatrix} ]

This is equal to the largest Eigenvalue (after applying the absolute function). Calculating the Eigenvalue using the Rayleigh coefficient:

[ \lambda_1 = -6.044248 ]

Now comes the deflation step. In this simple sample the matrix is modified to be

[ B = A - \lambda_1 * (v_1 \otimes v_1) \\ B = \begin{pmatrix} 2 & -3 & 1 \\ -3 & 1 & 3 \\ 1 & 3 & -4 \end{pmatrix} - (-6.044248) * \left( \begin{pmatrix} -0.280760 \\ -0.474797 \\ 0.834111 \end{pmatrix} \otimes \begin{pmatrix} -0.280760 \\ -0.474797 \\ 0.834111 \end{pmatrix} \right) \\ B = \begin{pmatrix} 2.476445 & -2.194277 & -0.415472\\-2.194277 & 2.362571 & 0.606274\\-0.415472 & 0.606274 & 0.205232\\ \end{pmatrix} ]

Running the same iteration again yields the sequence (again start vector is of course random):

[ v_{2,0} = \begin{pmatrix} 0.163004\\0.592286\\0.789068\\\end{pmatrix} \\ v_{2,1} = \begin{pmatrix} -0.610858\\0.758720\\0.226266\\\end{pmatrix} \\ v_{2,2} = \begin{pmatrix} -0.697907\\0.697586\\0.162173\\\end{pmatrix} \\ v_{2,3} = \begin{pmatrix} -0.703368\\0.693094\\0.157778\\\end{pmatrix} \\ v_{2,4} = \begin{pmatrix} -0.703730\\0.692794\\0.157485\\\end{pmatrix} \\ v_{2,5} = \begin{pmatrix} -0.703754\\0.692774\\0.157466\\\end{pmatrix} \\ v_{2,6} = \begin{pmatrix} -0.703755\\0.692772\\0.157464\\\end{pmatrix} \\ v_{2,7} = \begin{pmatrix} -0.703755\\0.692772\\0.157464\\\end{pmatrix} \\ v_{2,8} = \begin{pmatrix} -0.703755\\0.692772\\0.157464\\\end{pmatrix} ]

Calculating the Eigenvalue using the Rayleigh coefficient again yields

[ \lambda_2 = 4.729438 ]

As one can see there is starting to be some numerical rounding error to show up when comparing to the analytic $\lambda_2 = 4.72944$ and normalized Eigenvector $\begin{pmatrix} -0.7037546818 & 0.69277320326 & 0.15746312798 \end{pmatrix}^T$.

The next deflation step yields

[ C = \begin{pmatrix} 0.134089 & 0.111524 & 0.108626\\0.111524 & 0.092755 & 0.090354\\0.108626 & 0.090354 & 0.087966\\\end{pmatrix} ]

Running the power iteration again and calculating the Eigenvalue using the Rayleigh coefficient a last time yields the third Eigenvector.

[ v_{3,0} = \begin{pmatrix} 0.239342\\0.923750\\0.298999\\\end{pmatrix} \\ v_{3,1} = \begin{pmatrix} 0.652604\\0.542786\\0.528669\\\end{pmatrix} \\ v_{3,2} = \begin{pmatrix} 0.652613\\0.542800\\0.528644\\\end{pmatrix} \\ v_{3,3} = \begin{pmatrix} 0.652613\\0.542800\\0.528644\\\end{pmatrix} \\ v_{3,4} = \begin{pmatrix} 0.652613\\0.542800\\0.528644\\\end{pmatrix} \\ \lambda_3 = 0.314839 ]

Now rounding error even gets more significant (again comparing with $\lambda_3 = 0.314839$ and the Eigenvector $\begin{pmatrix} 0.65261146214 & 0.54280145573 & 0.52864435977 \end{pmatrix}^T$.

Pitfalls

Since the power iteration is one of the simplest iterative numerical methods to calculate Eigenvalues one can guess that there is a number if potential pitfalls:

First it’s not the fastest or most efficient method. Based on the power iteration one can use Krylov subspace methods (A Krylov subspace is the space spawned by the vectors $\left(v_{0}, A * v_{0}, A^2 * v_{0}, A^3 * v_{0}, \ldots, A^n * v_{0} \right)$. One can use a modified Gram-Schmidt method to generate an orthonormal basis for this Krylov subspace - the basis vectors are then a good approximation to the real Eigenvectors for example (this is usually called Arnoldi iteration). The most commonly used algorithm today is the Francis iteration.
If two Eigenvalues lie close convergence is slow - or even impossible. To resolve that problem one could add a stochastic momentum term or apply some kind of restart.
Since one uses random numbers for the initial vector the system might not converge or miss an Eigenvector. To counter that one might have to implement timeouts and restarts - or shifts in case one has a good guess for the Eigenvalues one’s searching for.
The deflation method used above only works for symmetric real matrices. The QR method / Francis iteration is a way better idea to be used in case one works with non symmetric matrices.
The power iteration has a huge problem with accumulating rounding errors - especially when determining more than only the largest Eigenvalue / Eigenvector, though it can still be pretty useful as the basis for some large scale distributed Eigenvector algorithms like ranking search engine results …