From: rusin@vesuvius.math.niu.edu (Dave Rusin) Newsgroups: sci.math Subject: Re: Graph Theory Date: 9 Jan 1998 05:28:19 GMT In article <34B119C6.4A84@info.bt.co.uk>, Michael Schroeder wrote: >Given a table of distances betwenn cities, I want to draw a map >for these cities. > >If one uses euclidean distance, the problem amounts to solving >a system of quadratic equations. When do solutions exist? >How to compute them? How to approximate solutions if none exists? You've asked about a very pretty thread of ideas. It isn't really Graph Theory as you suggest (the complete graph on n vertices is pretty boring!) but it does lead to a lot of other areas of mathematics. Allow me to take you for the ten-cent tour. Let's see, you have n cities, and an array M of the n^2 distances, M_ij = dist(city_i, city_j). You want to know if there are points P_i in the plane (I guess) which have dist(P_i,P_j) = M_ij. Let's first see if there are such points in a bigger Euclidean space, namely R^n. If there are, we may shift them all over so that the center of mass is at the origin; then the distances are easily related to the inner products. Indeed, using the assumption Sum(P_i) = 0, one may show that the inner products are = A_ij, where C_i = (1/n) Sum(M_ij^2, j=1..n) D = (1/n^2) Sum(M_ij^2, i,j=1..n) A_ij = (-1/2) ( M_ij^2 - C_i - C_j + D) Conversely, if vectors P_i in R^n can be found whose inner products are the A_ij, we readily compute that dist(P_i, P_j) = M_ij. (Well, the squares are equal; I'm assuming that your pairwise distances at least follow two of the axioms for a metric: M_ij = 0 if i=j and M_ij >0 otherwise M_ij = M_ji for all i,j I haven't yet used the third axiom, the triangle inequality: M_ij <= M_ik+M_ki for all i, j, and k.) So now you ask, given a symmetric matrix A, how do we decide if there are vectors whose inner products are the entries of A? How do we know if they lie in a plane? How do we find them if they exist, and fudge it if they don't? It turns out these are all really the same question. Putting the coordinates of the vectors into a matrix X, we readily see the problem is precisely to decide if there is an X with X^t * X = A. If you want the points to be in the plane, you want X to be a 2 x n matrix. A necessary condition is clear: A must be positive semi-definite, since if v is any vector in R^n, v^t A v = (X v)^t (X v) >= 0. It turns out that this is equivalent to the triangle inequality. But conversely, if A is a positive-definite symmetric matrix, it has an orthonormal basis of real eigenvectors v_i, and corresponding positive eigenvalues l_i; moreover, A = Sum( l_i v_i * v_i^t ), as is checked by comparing the action of each side on the basis v_i. In other words, A = X^t * X if X is the matrix whose i-th ROW is sqrt(l_i) v_i. It's easy to see that X and A have the same kernel; in particular, the minimum number of rows of X is the rank of A. So here's your complete procedure: From the distance matrix M compute the symmetric matrix A as above. Compute the eigensystem {v_i, l_i; i=1..n} (and hence X) If some l_i <0, there is no embedding at all (not a metric space); Else, if 3 or more l_i>0, there's only an embedding into R^k. Else, the columns of X give the coordinates of the P_i. But, wait, there's more! 1. It's easy to argue (using rotations, say) that such an X exists iff an upper-triangular one exists, that is, we may assume P_1 lies on one axis, P_2 lies in a plane containing that axis, etc. Then the corresponding X is upper-triangular, and may be assumed (using reflections) to have all diagonal entries positive. Then X is unique (by geometry). The factorization A = X^t X is the Cholesky decomposition of X. It's readily computed in linear algebra computer libraries such as netlib. 2. In the previous paragraph, the n-volume of the convex hull of the points is the product of the diagonal entries of X, hence it's det(X)=sqrt(det(A)). Since A in turn is computed using only the lengths of the edges of this simplex (that's what's in M), we have a formula for the n-volume of a n-simplex in terms of its edge lengths. For n=2 this is Heron's formula; the corresponding formula for n=3 shows up from time to time in this newsgroup. (I'm glossing over a single dimension drop since Sum(P_i)=0). 3. If some l_i <0, we can still play the game: the only consequence is that since "dist" isn't really a metric, what we obtain in exactly this way is a set of points in R^n whose "distances" are as desired when R^n is given a metric (in the differential geometry sense) of mixed signs, e.g. the Lorentz metric of relativity. 4. If too many l_i >0, then our points lie in some R^k (k=rank(A)). Think of a galaxy of stars in R^3. But some galaxies (like ours!), while contained in R^3, are pretty close to planar. There's a good choice of projection to a plane which will be overall close to preserving interpoint distances. You can make this precise in the following way: Look for the projection to R^2 (or any R^m with k < m < n) which minimizes the residuals in the least squares sense. Know what? You've already found it! Just arrange the l_i in decreasing order l_1 > l_2 > ... and form X from the first m of the v_i. Moreover, this gives you a "goodness of fit" criterion: if you have n points and find all n eigenvalues positive, say, you can still say the set of points is "basically m-dimensional" if there is a noticeable drop in the magnitude of the l_i after i=m. (You could also ask for the rank-m symmetric matrix closest to A; this comes to the same conclusion). 5. Did you ever really say the distance matrix M was symmetric? In some settings it may not be. There's still hope. Note that if we let U be the orthogonal matrix whose rows are exactly the v_i, and D the diagonal matrix of eigenvalues, then our decomposition may be written A = U^t D U. If A is not symmetric, there's no chance for this factorization, but it turns out every matrix has a (essentially unique) decomposition A = U^t D V, U and V orthogonal and D diagonal. This is the Singular Value Decomposition (SVD). Mimicking the symmetric discussion, we may project to the subspace spanned by the first few rows of V (or U, take your pick). That is, the columns of this submatrix give reasonable suggestions of where to locate the cities to be as consistent as possible with incompatible distance information. (Of course in the symmetric case, SVD and Cholesky are essentially the same). The SVD is even available in Maple. 6. This process is used all the time in Statistics and applications. You'll see references to "Multidimensional Scaling" and "Reduction of dimension". Did you ever take the Myers-Briggs personality inventory? That's this game! Your responses to n questions make you a vector in R^n; your "ISTJ" or whatever simply checks your coordinates in the directions of the four largest eigenvectors. Ever use a Web search engine? Did you wonder how it worked so fast and managed to find the "Cars of Asia" page when you asked for "Automobiles in India"? A site's use of words puts it in some huge Euclidean space R^(# of English words). But with a comparatively small sample of sites one can already suggest say 100 dominant directions (that is, one splits the whole web into 2^100 kinds of sites, rather than the 2^4 kinds of human personalities...). The search engine people only record that vector in R^100 for each site, a tremendous savings in time and space when matching your request for a site "pointing in a certain direction". Some people come perilously close to mysticism touting the virtues of this process, but it certainly is clear that this circle of ideas is very useful when there really is something low-dimensional to which you are close, e.g. if you're pretty close to having the correct distances between some cities in your country or region. dave