============================================================================== Notes to myself following a talk I heard on "multidimensional scaling", which engendered some discussion among the faculty regarding the role of the triangle inequality. File was dated 1996/9/23. -- rusin@math.niu.edu ============================================================================== The problem: a set of points is given with the distance between all pair of points known. How to find a distance-preserving embedding into Euclidean space? Remark: in general, one can only embed into Euclidean spaces of dimensions which are very high (e.g. equal to the number of points). For example if d(x, y) = 1 for all x<>y. Thus we may need to approximate the embedding. Also note that if the triangle inequality is not satisfied then no embedding is possible; however, in that case we will see there is an embedding into R^n with a non-positive-definite metric. If an embedding exists it is unique up to isometry (translation, reflection, and proper isometry (rotation) in SO(n)). We will choose the embedding which has the center of mass at the origin and the axes oriented in the directions of principal separation of the points. Theorem: (Schoenberg, Householder, and Young) Given a symmetric matrix B (of squares of distances) we can compute a matrix A of inner products, e.g. a_ij = (-1/2)*(b_ij - c_i - c_j + d) where c_i = (1/n) Sum(b_ij, j); d = (1/n^2) Sum(b_ij, i,j). [Mohsen: Ernst Young ca 1900, if I heard him right] Note that if there is an isometric embedding of the points to points v_i in an inner-product space, then b_ij := d(v_i, v_j)^2 = = = + -2; if the embedding has center-of-mass at the origin, then n c_i = Sum(b_ij, j) = n |v_i|^2 + Sum( |v_j|^2 , j) - 2 = n |v_i|^2 + Sum( |v_j|^2 , j). Then n^2 d = 2n Sum( | v_i|^2, i), so a_ij = (-1/2)(b_ij - c_i - c_j + d) = . Thus A represents a quadratic form. Polarize this form: that is, find an orthogonal symmetric U so that A = U D U^t with D real diagonal. We may assume the eigenvalues of D are in decreasing order. Then A = d1 u1 u1^t + d2 u2 u2^t + ... where u_i is the ith column of U. In particular, if all d_i are positive and d_k >> d_(k+1), then A' = d1 u1 u1^t + ... + dk uk uk^t is a good approximation of A. Taking w_i = (d_i)^(1/2)*u_i for the columns of a matrix W (having only k columns) we have A' = W W^t. Thus the rows of W are n vectors in R^k whose inner products are roughly the same as those of the n given points. Equivalently, the u_i give coordinate functions for the n points: w_i = (x_1(P_i), ..., x_k(P_i) ) where the coordinate functions are linear (x_i(P) = v_i * u_i) Remark: A' is the best approximation to A in the least squares sense among matrices of rank <= k. Mohsen: recent luck using nonlinear functions x_i. Example: 3 points with distances 3, 4, 5: B = ((0,9,16),(9,0,25),(25,16,0)) A = (1/9)((25,-2,-23),(-2, 52, -50), (-23, -50, 73)) e'values: 13.0, 3.7, 0 (exact) Note that the method tends to place points with well-defined axes (in the directions of the greatest eigenvalues' eigenvectors, evidently the directions most useful for separating points -- the longest directions of the eventual cloud of points.) These axes are well-defined to the extent the eigenvalues are really distinct. Application: given points whose distances are defined according to several possibly related criteria, one can label the principal axes according to properties evident when comparing points for which x_i(P) is very large to those where x_i(P) is very negative.