In a previous post I talked about the Dynamic Mode Decomposition (DMD) and how to implement it in Python. I mentioned near the end of that post that there exist a number of variants of the DMD. Some variants attempt to resolve certain shortcomings of the DMD; others attempt to extend the core concepts in interesting ways. One such variant is the sparsitypromoting DMD (spDMD) developed by Jovanović et al.^{1}
The spDMD is mostly motivated by the question of how to find the best modes for a system. It is a nontrivial question. Typically, it is necessary for a human expert to be “intheloop”, making a judgment on which DMD modes to use. This is great (ideal even) when the application is just a onetime analysis. But what about when the application of the DMD needs to be automated?
There are several approaches to automating rankreduction in a principled manner. Previously, I posted on an approach called the optimal SVHT. It is an easy to implement method that computes an optimal cutoff value for truncating singular values. It can applied to any SVDbased method – including the DMD.
The spDMD is another approach that applies specifically to the DMD. It uses some clever optimization tricks to try to reconstruct the original data with as few DMD modes as possible. The phrase “sparsitypromoting” in the method’s name refers to the concept of preferring DMD solutions with sparse sets of nonzero modes.
Here I introduce the core concepts of the spDMD and provide a rudimentary implementation in Python.
We Start with the DMD
As per usual, let’s start the investigation by constructing a toy example. Here is code for building a relatively simple data set consisting of one strong mode, one weak mode, and some normally distributed noise. The resulting data is centered around the origin, so we won’t worry about removing the mean and variance (normalizing the data). From left to right, the four images below show each mode matrix, the noise matrix, and the final summation. The summed data matrix is what we’ll pass into the DMD algorithm.
import numpy as np import matplotlib as mpl import matplotlib.pyplot as plt from mpl_toolkits.mplot3d import Axes3D from numpy import dot, multiply, diag from numpy.linalg import inv, eig, pinv, norm, solve, cholesky from scipy.linalg import svd, svdvals from scipy.sparse import csc_matrix as sparse from scipy.sparse import vstack as spvstack from scipy.sparse import hstack as sphstack from scipy.sparse.linalg import spsolve # define time and space domains x = np.linspace(10, 10, 60) t = np.linspace(0, 20, 80) Xm,Tm = np.meshgrid(x, t) # create data D1 = 5 * (1/np.cosh(Xm/2)) * np.tanh(Xm/2) * np.exp((0.8j)*Tm) # strong primary mode D2 = 0.2 * np.sin(2 * Xm) * np.exp(2j * Tm) # weak secondary mode D3 = 0.1 * np.random.randn(*Xm.shape) # noise D = (D1 + D2 + D3).T
Now we call the DMD routine to find the DMD modes
def dmd(X, Y, truncate=None): U2,Sig2,Vh2 = svd(X, False) # SVD of input matrix r = len(Sig2) if truncate is None else truncate # rank truncation U = U2[:,:r] Sig = diag(Sig2)[:r,:r] V = Vh2.conj().T[:,:r] Atil = dot(dot(dot(U.conj().T, Y), V), inv(Sig)) # build A tilde mu,W = eig(Atil) Phi = dot(dot(dot(Y, V), inv(Sig)), W) # build DMD modes return mu, Phi # extract inputoutput matrices X = D[:,:1] Y = D[:,1:] # do dmd r = 30 # new rank mu,Phi = dmd(X, Y, r)
The $b$ Vector
The next step in the DMD workflow is to reconstruct a matrix corresponding to the time evolution of the system. Specifically, we want to find
Normally when computing the DMD, we’d simply use the system’s initial conditions
# compute time evolution (verbose way) b = dot(pinv(Phi), X[:,0]) Psi = np.zeros([r, len(t)], dtype='complex') dt = t[2]  t[1] for i,_t in enumerate(t): Psi[:,i] = multiply(np.power(mu, _t/dt), b)
But there’s a more concise method to calculate for
loop and then do a simple product. The product can be written analytically like so:
where
# compute time evolution (concise way) b = dot(pinv(Phi), X[:,0]) Vand = np.vander(mu, len(t), True) Psi = (Vand.T * b).T # equivalently, Psi = dot(diag(b), Vand)
Then the DMD reconstruction in Python would become
D_dmd = dot(Phi, Psi)
Now here’s the critical insight: we’re not restricted to using
Basically, we want to find a solution to the optimization problem
where
where
An asterisk denotes the complex conjugate transpose, an overline denotes the complex conjugate, and
# vars for the objective function U,sv,Vh = svd(D, False) Vand = np.vander(mu, len(t), True) P = multiply(dot(Phi.conj().T, Phi), np.conj(dot(Vand, Vand.conj().T))) q = np.conj(diag(dot(dot(Vand, (dot(dot(U, diag(sv)), Vh)).conj().T), Phi))) s = norm(diag(sv), ord='fro')**2
The objective function is minimized where its gradient is zero. The solution can be determined analytically:
Finding the optimal
# the optimal solution b_opt = solve(P, q)
Promoting Sparsity in $b_\text{optimal}$
Finding
The spDMD incorporates the alternating direction method of multipliers (ADMM) algorithm^{4}^{5} to search for candidate
Basically, we are optimizing the same problem as before, but with an
# find optimum solutions gamma_vec = np.logspace(np.log10(0.05), np.log10(200), 150) answer = admm_for_dmd(P, q, s, gamma_vec)
The spDMD is complete. Now we must analyze the information it yielded. Keep in mind that we are mostly interested in understanding how to gain as much accuracy as we can with as few DMD modes as possible. Take a look at the admm_for_dmd
implementation in the following section to see what the return variable looks like.
First thing we should do is look to see how the different gamma values relate to the number of nonzero elements in
We can also take a look at the different
Finally, for some variety, here’s a 3D version of the data. The dotted surface represents the original data (two modes plus noise), while the smooth grid of blue lines represents the 1mode reconstruction corresponding to a large
ADMM
The ADMM^{4}^{5} is very advanced and well beyond the scope of this short post. Suffice it to say that it is very generalizable and can be applied in many contexts; the spDMD is but one of its many applications.
Here I list the code for the admm_for_dmd
method used earlier. It is a direct Python translation of the dmdsp
MATLAB function used by Jovanović et al.^{2}. Of course, it is not optimized for speed and is highly specialized to the spDMD problem. Furthermore, it has only been tested with the toy problem above. Perhaps in a future post I’ll tackle the problem of making the code productionquality.
def admm_for_dmd(P, q, s, gamma_vec, rho=1, maxiter=10000, eps_abs=1e6, eps_rel=1e4): # blank return value answer = type('ADMMAnswer', (object,), {})() # check input vars P = np.squeeze(P) q = np.squeeze(q)[:,np.newaxis] gamma_vec = np.squeeze(gamma_vec) if P.ndim != 2: raise ValueError('invalid P') if q.ndim != 2: raise ValueError('invalid q') if gamma_vec.ndim != 1: raise ValueError('invalid gamma_vec') # number of optimization variables n = len(q) # identity matrix I = np.eye(n) # allocate memory for gammadependent output variables answer.gamma = gamma_vec answer.Nz = np.zeros([len(gamma_vec),]) # number of nonzero amplitudes answer.Jsp = np.zeros([len(gamma_vec),]) # square of Frobenius norm (before polishing) answer.Jpol = np.zeros([len(gamma_vec),]) # square of Frobenius norm (after polishing) answer.Ploss = np.zeros([len(gamma_vec),]) # optimal performance loss (after polishing) answer.xsp = np.zeros([n, len(gamma_vec)], dtype='complex') # vector of amplitudes (before polishing) answer.xpol = np.zeros([n, len(gamma_vec)], dtype='complex') # vector of amplitudes (after polishing) # Cholesky factorization of matrix P + (rho/2)*I Prho = P + (rho/2) * I Plow = cholesky(Prho) Plow_star = Plow.conj().T # sparse P (for KKT system) Psparse = sparse(P) for i,gamma in enumerate(gamma_vec): # initial conditions y = np.zeros([n, 1], dtype='complex') # Lagrange multiplier z = np.zeros([n, 1], dtype='complex') # copy of x # Use ADMM to solve the gammaparameterized problem for step in range(maxiter): # xminimization step u = z  (1/rho) * y # x = solve((P + (rho/2) * I), (q + rho * u)) xnew = solve(Plow_star, solve(Plow, q + (rho/2) * u)) # zminimization step a = (gamma/rho) * np.ones([n, 1]) v = xnew + (1/rho) * y # softthresholding of v znew = multiply(multiply(np.divide(1  a, np.abs(v)), v), (np.abs(v) > a)) # primal and dual residuals res_prim = norm(xnew  znew, 2) res_dual = rho * norm(znew  z, 2) # Lagrange multiplier update step y += rho * (xnew  znew) # stopping criteria eps_prim = np.sqrt(n) * eps_abs + eps_rel * np.max([norm(xnew, 2), norm(znew, 2)]) eps_dual = np.sqrt(n) * eps_abs + eps_rel * norm(y, 2) if (res_prim < eps_prim) and (res_dual < eps_dual): break else: z = znew # record output data answer.xsp[:,i] = z.squeeze() # vector of amplitudes answer.Nz[i] = np.count_nonzero(answer.xsp[:,i]) # number of nonzero amplitudes answer.Jsp[i] = ( np.real(dot(dot(z.conj().T, P), z))  2 * np.real(dot(q.conj().T, z)) + s) # Frobenius norm (before polishing) # polishing of the nonzero amplitudes # form the constraint matrix E for E^T x = 0 ind_zero = np.flatnonzero(np.abs(z) < 1e12) # find indices of zero elements of z m = len(ind_zero) # number of zero elements if m > 0: # form KKT system for the optimality conditions E = I[:,ind_zero] E = sparse(E, dtype='complex') KKT = spvstack([ sphstack([Psparse, E], format='csc'), sphstack([E.conj().T, sparse((m, m), dtype='complex')], format='csc'), ], format='csc') rhs = np.vstack([q, np.zeros([m, 1], dtype='complex')]) # stack vertically # solve KKT system sol = spsolve(KKT, rhs) else: sol = solve(P, q) # vector of polished (optimal) amplitudes xpol = sol[:n] # record output data answer.xpol[:,i] = xpol.squeeze() # polished (optimal) leastsquares residual answer.Jpol[i] = ( np.real(dot(dot(xpol.conj().T, P), xpol))  2 * np.real(dot(q.conj().T, xpol)) + s) # polished (optimal) performance loss answer.Ploss[i] = 100 * np.sqrt(answer.Jpol[i]/s) print(i) return answer
References

Jovanović, Mihailo R., Peter J. Schmid, and Joseph W. Nichols. “Sparsitypromoting dynamic mode decomposition.” Physics of Fluids (1994present) 26.2 (2014): 024103. ↩

Link: http://people.ece.umn.edu/users/mihailo/software/dmdsp/ ↩

Tu, Jonathan H., et al. “On dynamic mode decomposition: theory and applications.” arXiv preprint arXiv:1312.0041 (2013). ↩

Boyd, Stephen, et al. “Distributed optimization and statistical learning via the alternating direction method of multipliers.” Foundations and Trends® in Machine Learning 3.1 (2011): 1122. ↩

Wikipedia contributors. “Vandermonde matrix.” Wikipedia, The Free Encyclopedia. Wikipedia, The Free Encyclopedia, 21 Jun. 2016. Web. 2 Aug. 2016. ↩