1、5/4/2019,CS267, Yelick,1,CS 267: Applications of Parallel Computers Lecture 19: Graph Partitioning Part II,Kathy Yelickhttp:/www-inst.eecs.berkeley.edu/cs267,5/4/2019,CS267, Yelick,2,Recap of Last Lecture,Partitioning with nodal coordinates: Inertial method Projection onto a sphere Algorithms are ef
2、ficient Rely on graphs having nodes connected (mostly) to “nearest neighbors” in space Partitioning without nodal coordinates: Breadth-First Search simple, but not great partition Kernighan-Lin good corrector given reasonable partition Spectral Method good partitions, but slowToday: Spectral methods
3、 revisited Multilevel methods,5/4/2019,CS267, Yelick,3,Basic Definitions,Definition: The Laplacian matrix L(G) of a graph G(N,E) is an |N| by |N| symmetric matrix, with one row and column for each node. It is defined by L(G) (i,i) = degree of node I (number of incident edges) L(G) (i,j) = -1 if i !=
4、 j and there is an edge (i,j) L(G) (i,j) = 0 otherwise,2 -1 -1 0 0 -1 2 -1 0 0 -1 -1 4 -1 -1 0 0 -1 2 -1 0 0 -1 -1 2,1,2,3,4,5,G =,L(G) =,5/4/2019,CS267, Yelick,4,Properties of Laplacian Matrix,Theorem 1: Given G, L(G) has the following properties (proof on web page) L(G) is symmetric. This means th
5、e eigenvalues of L(G) are real and its eigenvectors are real and orthogonal. Rows of L sum to zero: Let e = 1,1T, i.e. the column vector of all ones. Then L(G)*e=0. The eigenvalues of L(G) are nonnegative: 0 = l1 = l2 = = ln The number of connected components of G is equal to the number of li equal
6、to 0. Definition: l2(L(G) is the algebraic connectivity of G The magnitude of l2 measures connectivity In particular, l2 != 0 if and only if G is connected.,5/4/2019,CS267, Yelick,5,Spectral Bisection Algorithm,Spectral Bisection Algorithm: Compute eigenvector v2 corresponding to l2(L(G) For each no
7、de n of G if v2(n) 0 put node n in partition N- else put node n in partition N+ Why does this make sense? Recall l2(L(G) is the algebraic connectivity of G Theorem (Fiedler): Let G1(N,E1) be a subgraph of G(N,E), so that G1 is “less connected” than G. Then l2(L(G) = l2(L(G) , i.e. the algebraic conn
8、ectivity of G1 is less than or equal to the algebraic connectivity of G. (proof on web page),5/4/2019,CS267, Yelick,6,Motivation for Spectral Bisection,Vibrating string has modes of vibration, or harmonics Modes computable as follows Model string as masses connected by springs (a 1D mesh) Write down
9、 F=ma for coupled system, get matrix A Eigenvalues and eigenvectors of A are frequencies and shapes of modes Label nodes by whether mode - or + to get N- and N+ Same idea for other graphs (eg planar graph trampoline),5/4/2019,CS267, Yelick,7,Eigenvectors of L(1D mesh),Eigenvector 1(all ones),Eigenve
10、ctor 2,Eigenvector 3,5/4/2019,CS267, Yelick,8,2nd eigenvector of L(planar mesh),5/4/2019,CS267, Yelick,9,Computing v2 and l2 of L(G) using Lanczos,Given any n-by-n symmetric matrix A (such as L(G) Lanczos computes a k-by-k “approximation” T by doing k matrix-vector products, k nApproximate As eigenv
11、alues/vectors using Ts,Choose an arbitrary starting vector r b(0) = |r| j=0 repeatj=j+1q(j) = r/b(j-1) scale a vector r = A*q(j) matrix vector multiplication, the most expensive stepr = r - b(j-1)*v(j-1) “saxpy”, or scalar*vector + vectora(j) = v(j)T * r dot productr = r - a(j)*v(j) “saxpy”b(j) = |r
12、| compute vector norm until convergence details omitted,T = a(1) b(1)b(1) a(2) b(2)b(2) a(3) b(3) b(k-2) a(k-1) b(k-1)b(k-1) a(k),5/4/2019,CS267, Yelick,10,Spectral Bisection: Summary,Laplacian matrix represents graph connectivity Second eigenvector gives a graph bisection Roughly equal “weights” in
13、 two parts Weak connection in the graph will be separator Implementation via the Lanczos Algorithm To optimize sparse-matrix-vector multiply, we graph partition To graph partition, we find an eigenvector of a matrix associated with the graph To find an eigenvector, we do sparse-matrix vector multipl
14、yHave we made progress? The first matrix-vector multiplies are slow, but use them to learn how to make the rest faster,5/4/2019,CS267, Yelick,11,Introduction to Multilevel Partitioning,If we want to partition G(N,E), but it is too big to do efficiently, what can we do? 1) Replace G(N,E) by a coarse
15、approximation Gc(Nc,Ec), and partition Gc instead 2) Use partition of Gc to get a rough partitioning of G, and then iteratively improve it What if Gc still too big? Apply same idea recursively,5/4/2019,CS267, Yelick,12,Multilevel Partitioning - High Level Algorithm,(N+,N- ) = Multilevel_Partition( N
16、, E ) recursive partitioning routine returns N+ and N- where N = N+ U N-if |N| is small (1) Partition G = (N,E) directly to get N = N+ U N-Return (N+, N- )else (2) Coarsen G to get an approximation Gc = (Nc, Ec) (3) (Nc+ , Nc- ) = Multilevel_Partition( Nc, Ec ) (4) Expand (Nc+ , Nc- ) to a partition
17、 (N+ , N- ) of N (5) Improve the partition ( N+ , N- )Return ( N+ , N- )endif,(2,3),(2,3),(2,3),(1),(4),(4),(4),(5),(5),(5),How do weCoarsen?Expand?Improve?,“V - cycle:”,5/4/2019,CS267, Yelick,13,Multilevel Kernighan-Lin,Coarsen graph and expand partition using maximal matchings Improve partition us
18、ing Kernighan-Lin,5/4/2019,CS267, Yelick,14,Maximal Matching,Definition: A matching of a graph G(N,E) is a subset Em of E such that no two edges in Em share an endpoint Definition: A maximal matching of a graph G(N,E) is a matching Em to which no more edges can be added and remain a matching A simpl
19、e greedy algorithm computes a maximal matching:,let Em be empty mark all nodes in N as unmatched for i = 1 to |N| visit the nodes in any orderif i has not been matchedmark i as matchedif there is an edge e=(i,j) where j is also unmatched, add e to Emmark j as matchedendifendif endfor,5/4/2019,CS267,
20、 Yelick,15,Maximal Matching: Example,5/4/2019,CS267, Yelick,16,Coarsening using a maximal matching,1) Construct a maximal matching Em of G(N,E) for all edges e=(j,k) in Em 2) collapse matches nodes into a single onePut node n(e) in NcW(n(e) = W(j) + W(k) gray statements update node/edge weights for
21、all nodes n in N not incident on an edge in Em 3) add unmatched nodesPut n in Nc do not change W(n) Now each node r in N is “inside” a unique node n(r) in Nc 4) Connect two nodes in Nc if nodes inside them are connected in E for all edges e=(j,k) in Em for each other edge e=(j,r) in E incident on j
22、Put edge ee = (n(e),n(r) in Ec W(ee) = W(e)for each other edge e=(r,k) in E incident on kPut edge ee = (n(r),n(e) in EcW(ee) = W(e)If there are multiple edges connecting two nodes in Nc, collapse them,adding edge weights,5/4/2019,CS267, Yelick,17,Example of Coarsening,5/4/2019,CS267, Yelick,18,Expan
23、ding a partition of Gc to a partition of G,5/4/2019,CS267, Yelick,19,Multilevel Spectral Bisection,Coarsen graph and expand partition using maximal independent sets Improve partition using Rayleigh Quotient Iteration,5/4/2019,CS267, Yelick,20,Maximal Independent Sets,Definition: An independent set o
24、f a graph G(N,E) is a subset Ni of N such that no two nodes in Ni are connected by an edge Definition: A maximal independent set of a graph G(N,E) is an independent set Ni to which no more nodes can be added and remain an independent set A simple greedy algorithm computes a maximal independent set:,
25、let Ni be empty for k = 1 to |N| visit the nodes in any orderif node k is not adjacent to any node already in Niadd k to Niendif endfor,5/4/2019,CS267, Yelick,21,Coarsening using Maximal Independent Sets, Build “domains” D(k) around each node k in Ni to get nodes in Nc Add an edge to Ec whenever it
26、would connect two such domains Ec = empty set for all nodes k in NiD(k) = ( k, empty set ) first set contains nodes in D(k), second set contains edges in D(k) unmark all edges in E repeatchoose an unmarked edge e = (k,j) from Eif exactly one of k and j (say k) is in some D(m)mark eadd j and e to D(m
27、)else if k and j are in two different D(m)s (say D(mi) and D(mj)mark eadd edge (mk, mj) to Ecelse if both k and j are in the same D(m)mark eadd e to D(m)elseleave e unmarkedendif until no unmarked edges,5/4/2019,CS267, Yelick,22,Example of Coarsening,- encloses domain Dk = node of Nc,5/4/2019,CS267,
28、 Yelick,23,Expanding a partition of Gc to a partition of G,Need to convert an eigenvector vc of L(Gc) to an approximate eigenvector v of L(G) Use interpolation:,For each node j in Nif j is also a node in Nc, thenv(j) = vc(j) use same eigenvector componentelsev(j) = average of vc(k) for all neighbors
29、 k of j in Ncend if endif,5/4/2019,CS267, Yelick,24,Example: 1D mesh of 9 nodes,5/4/2019,CS267, Yelick,25,Improve eigenvector: Rayleigh Quotient Iteration,j = 0 pick starting vector v(0) from expanding vc repeatj=j+1r(j) = vT(j-1) * L(G) * v(j-1) r(j) = Rayleigh Quotient of v(j-1) = good approximate
30、 eigenvaluev(j) = (L(G) - r(j)*I)-1 * v(j-1) expensive to do exactly, so solve approximately using an iteration called SYMMLQ, which uses matrix-vector multiply (no surprise)v(j) = v(j) / | v(j) | normalize v(j) until v(j) converges Convergence is very fast: cubic,5/4/2019,CS267, Yelick,26,Example o
31、f convergence for 1D mesh,5/4/2019,CS267, Yelick,27,Available Implementations,Multilevel Kernighan/Lin METIS (www.cs.umn.edu/metis) ParMETIS - parallel version Multilevel Spectral Bisection S. Barnard and H. Simon, “A fast multilevel implementation of recursive spectral bisection ”, Proc. 6th SIAM C
32、onf. On Parallel Processing, 1993 Chaco (www.cs.sandia.gov/CRF/papers_chaco.html) Hybrids possible Ex: Using Kernighan/Lin to improve a partition from spectral bisection,5/4/2019,CS267, Yelick,28,Comparison of methods,Compare only methods that use edges, not nodal coordinates CS267 webpage and KK95a
33、 (see below) have other comparisons Metrics Speed of partitioning Number of edge cuts Other application dependent metrics Summary No one method best Multi-level Kernighan/Lin fastest by far, comparable to Spectral in the number of edge cuts www-users.cs.umn.edu/karypis/metis/publications/mail.html s
34、ee publications KK95a and KK95b Spectral give much better cuts for some applications Ex: image segmentation www.cs.berkeley.edu/jshi/Grouping/overview.html see “Normalized Cuts and Image Segmentation”,5/4/2019,CS267, Yelick,29,Number of edges cut for a 64-way partition,Graph144 4ELT ADD32 AUTO BBMAT
35、 FINAN512 LHR10 MAP1 MEMPLUS SHYY161 TORSO,# of Nodes1446491560649604486953874474752106722672411775876480201142,# ofEdges1074393458789462 331461199348126112020909333493154196152002 1479989,Description3D FE Mesh 2D FE Mesh 32 bit adder 3D FE Mesh 2D Stiffness M. Lin. Prog. Chem. Eng. Highway Net. Mem
36、ory circuit Navier-Stokes 3D FE Mesh,# Edges cutfor 64-way partition 8880629656751944365575311388587841388178944365117997,Expected # cuts for 2D mesh642721111190113203326462017468736225246747579,Expected # cuts for 3D mesh318057208335767647132152048155954788778562079639623,Expected # cuts for 64-way
37、 partition of 2D mesh of n nodes n1/2 + 2*(n/2)1/2 + 4*(n/4)1/2 + + 32*(n/32)1/2 17 * n1/2Expected # cuts for 64-way partition of 3D mesh of n nodes = n2/3 + 2*(n/2)2/3 + 4*(n/4)2/3 + + 32*(n/32)2/3 11.5 * n2/3,For Multilevel Kernighan/Lin, as implemented in METIS (see KK95a),5/4/2019,CS267, Yelick,
38、30,Speed of 256-way partitioning (from KK95a),Graph144 4ELT ADD32 AUTO BBMAT FINAN512 LHR10 MAP1 MEMPLUS SHYY161 TORSO,# of Nodes1446491560649604486953874474752106722672411775876480201142,# ofEdges1074393458789462 331461199348126112020909333493154196152002 1479989,Description3D FE Mesh 2D FE Mesh 32
39、 bit adder 3D FE Mesh 2D Stiffness M. Lin. Prog. Chem. Eng. Highway Net. Memory circuit Navier-Stokes 3D FE Mesh,MultilevelSpectral Bisection607.325.018.72214.2474.2311.0142.6850.2117.9130.01053.4,Multilevel Kernighan/Lin48.13.11.6179.225.518.08.144.84.310.163.9,Partitioning time in seconds,Kernigha
40、n/Lin much faster than Spectral Bisection!,5/4/2019,CS267, Yelick,31,Coordinate-Free Partitioning: Summary,Several techniques for partitioning without coordinates Breadth-First Search simple, but not great partition Kernighan-Lin good corrector given reasonable partition Spectral Method good partiti
41、ons, but slow Multilevel methods Used to speed up problems that are too large/slow Coarsen, partition, expand, improve Can be used with K-L and Spectral methods and others Speed/quality For load balancing of grids, multi-level K-L probably best For other partitioning problems (vision, clustering, et
42、c.) spectral may be better Good software available,5/4/2019,CS267, Yelick,32,Is Graph Partitioning a Solved Problem?,Myths of partitioning due to Bruce Hendrickson Edge cut = communication cost Simple graphs are sufficient Edge cut is the right metric Existing tools solve the problem Key is finding
43、the right partition Graph partitioning is a solved problemSlides and myths based on Bruce Hendricksons:“Load Balancing Myths, Fictions & Legends”,5/4/2019,CS267, Yelick,33,Myth 1: Edge Cut = Communication Cost,Myth1: The edge-cut deceitedge-cut = communication cost Not quite true: #vertices on bound
44、ary is actual communication volume Do not communicate same node value twice Cost of communication depends on # of messages too (a term) Congestion may also affect communication costWhy is this OK for most applications? Mesh-based problems match the model: cost is edge cuts Other problems (data minin
45、g, etc.) do not,5/4/2019,CS267, Yelick,34,Myth 2: Simple Graphs are Sufficient,Graphs often used to encode data dependencies Do X before doing YGraph partitioning determines data partitioning Assumes graph nodes can be evaluated in parallel Communication on edges can also be done in parallel Only de
46、pendence is between sweeps over the graphMore general graph models include: Hypergraph: nodes are computation, edges are communication, but connected to a set (= 2) of nodes Bipartite model: use bipartite graph for directed graph Multi-object, Multi-Constraint model: use when single structure may in
47、volve multiple computations with differing costs,5/4/2019,CS267, Yelick,35,Myth 3: Partition Quality is Paramount,When structure are changing dynamically during a simulation, need to partition dynamically Speed may be more important than quality Partitioner must run fast in parallel Partition should
48、 be incremental Change minimally relative to prior one Must not use too much memory Example from Touheed, Selwood, Jimack and Bersins 1 M elements with adaptive refinement on SGI Origin Timing data for different partitioning algorithms: Repartition time from 3.0 to 15.2 secs Migration time : 17.8 to
49、 37.8 secs Solve time: 2.54 to 3.11 secs,5/4/2019,CS267, Yelick,36,Load Balancing in General,In some communities, load balancing is equated with graph partitioning Some load balancing problems do not fit this model Made several assumptions about the problem Task costs (node weights) are known Communication volumes (edge weights) are known Dependencies are known For basic partitioning techniques covered in class, the dependencies were only between iterationsWhat if we have less information?,