<< Chapter < Page Chapter >> Page >
MDS at work. A matrix of interpoint distances D is used to compute a set of Euclidean coordinates for each of the input points.

If the distance measure between points has the metric properties as explained above, then n Euclidean coordinates can always be found for a set of n points, such that the Euclidean distance between them matches the original distances. In order to obtain these Euclidean coordinates, such coordinates are assumed to have a mean of zero, i.e., they are centered. In this way, the matrix of pairwise distances D can be converted into a matrix of dot products by squaring the distances and performing a "double centering" on them to produce the matrix B of pairwise dot products:

Let's assume that n Euclidean coordinates exist for each data point and that such coordinates can be placed in a matrix X . Then, multiplying X by its transpose should equal the matrix B of dot products computed before. Finally, in order to retrieve the coordinates in the unknown matrix X , we can perform an SVD of B which can be expressed as:

Where the left and right singular vectors coincide because B is a symmetric matrix. The diagonal matrix of eigenvalues can be split into two identical matrices, each having the square root of the eigenvalues, and by doing this a solution for X can be found:

Since X has been computed through an SVD, the coordinates are ordered by data variance like in PCA, so the first component of X is the most important, and so on. Keeping this in mind, the dimensionality reduction can be performed by ignoring the higher dimensions in X and only keeping the first few, like in PCA.

Geodesic distances. The notion of "geodesic" distance was originally defined as the length of a geodesic path, where the geodesic path between two points on the surface of the Earth is considered the "shortest path" since one is constrained to travel on the Earth's surface. The concept of geodesic distance can be generalized to any mathematical surface, and defined as "the length of the shortest path between two points that lie on a surface, when the path is constrained to lie on the surface." Figure 10 shows a schematic of this generalized concept of geodesic path and distance.

Geodesic distance. The geodesic distance between the two red points is the length of the geodesic path, which is the shortest path between the points, that lies on the surface.

The Isomap algorithm. The Isomap method augments classical MDS with the notion of geodesic distance, in order to capture the non-linearity of a data set. To achieve its goal, Isomap uses MDS to compute few Euclidean coordinates that best preserve pairwise geodesic distances , rather than direct distances . Since the coordinates computed by MDS are Euclidean, these can be plotted on a Cartesian set of axes. The effect of Isomap is similar to "unrolling" the non-linear surface into a natural parameterization, as depicted in figure 11. Isomap approximates the geodesic distances from the data itself, by first building a neighborhood graph for the data. A neighborhood graph consists of the original set of points, together with a connection between "neighboring" points (neighboring points are defined as the closest points to a given point, as determined by the distance measure used) as shown in figure 11-b. After the neighborhood graph has been built, it can be used to approximate the geodesic distance between all pairs of points as the shortest path distance along the graph (red path in figure 11-b). Naturally, the sampling of the data set has to be enough to capture the inherent topology of the non-linear space for this approximation to work. MDS then takes these geodesic distances and produces the Euclidean coordinates for the set. Figure 11-c shows two Euclidean coordinates for each point in the example.

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, Geometric methods in structural computational biology. OpenStax CNX. Jun 11, 2007 Download for free at http://cnx.org/content/col10344/1.6
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Geometric methods in structural computational biology' conversation and receive update notifications?

Ask