Visualizing Keyword Lists and other High-Dimensional Binary Response Variables
Keywords: keywords, data visualization, high-dimensionality, binary data
Abstract: Keyword lists provide efficient and highly informative information for clustering documents, marketed drugs, and other complex objects, and for analyzing relationships amongst such objects. Each list can be viewed as a compact representation of the bits that are turned on in a particular outcome of a very high-dimensional binary response vector. By lexicographically ordering the keyword-list representations, one can construct a one-dimensional histogram of the keyword-list outcomes. This provides a "forest" view of the structure in the associated high-dimensional point cloud. This forest view often exposes the dominant high-dimensional interactions. By coloring the bars according to the number of keywords in each list, subsetting the data by important keywords, and relating an ordered keyword-list representation to other ordered classifications of the objects, one obtains more detailed insight into the structure of this highly complex data. These ideas will be illustrated by examining the pharmacological profiles of 1800 compounds taken from an early version of the Derwent Standard Drug File. The visual operations will be executed using Spotfire, a dynamic data-visualization package.