Interface 1999 Invited Session

Robust and Visualization Methods

Organizer: Ehsan S. Soofi, esoofi@csd.uwm.edu


Saturday, June 12, 10:30 a.m. - 12:15 a.m.

Speakers

10:30 a.m.
Deep Regression
Peter J. Rousseeuw, Department of Mathematics and Computing, University of Antwerp
(with Stefan Van Aelst)

In this talk we introduce the notion of depth in the regression setting. It provides the `rank' of any line (fit), rather than ranks of observations or residuals. In simple regression we can compute the depth of any line by an $O(n \log n)$ algorithm. For any bivariate data set $Z_n$ of size $n$ there exists a line with depth at least $n/3$. The largest depth in $Z_n$ can be used as a measure of linearity versus convexity. In both simple and multiple settings we consider the deepest regression, which generalizes the univariate median and is equivariant for monotone transformations of the response. Throughout, the model is semiparametric: the functional form is the (parametric) linear model, whereas the distributional model of the error term is nonparametric. For instance, the errors may be skewed and non-identically distributed (e.g. heteroskedastic). Making use of computational geometry, we consider algorithms to compute regression depth and the deepest regression. In view of computational scalability, also approximate algorithms are of interest.

11:00 a.m.
Robust Analysis of Large Data Sets
Peter J. Rousseeuw, Department of Mathematics and Computing, University of Antwerp
(with Katrien Van Driessen)

Data mining aims to extract previously unknown patterns or substructures from large databases. In statistics, this is what methods of robust estimation and outlier detection were constructed for. We will focus on least trimmed squares (LTS) regression (Rousseeuw 1984), which is based on the subset of h cases (out of n) whose least squares fit possesses the smallest sum of squared residuals. The coverage h may be set between n/2 and n. The computation time of existing LTS algorithms grows too much with the size of the data set, precluding their use for data mining. Here we develop a new algorithm called FAST-LTS. The basic ideas are an inequality involving order statistics and sums of squared residuals, and techniques which we call `selective iteration' and `nested extensions'. We also use an intercept adjustment technique to improve the precision. For small data sets FAST-LTS typically finds the exact LTS, whereas for larger data sets it gives more accurate results than existing algorithms for LTS and is faster by orders of magnitude. This allows us to apply FAST-LTS to large databases.

11:30 a.m.
Visualization Via Figural Animation
Kurt Pflughoeft, Ehsan S. Soofi and Mariam (Fatemeh) Zahedi, School of Business Administration, University of Wisconsin - Milwaukee

Due to the proliferation of computer technology, organizational decision makers have massive amounts of data on large number of variables for their use. This situation calls for developing technologies that are capable of displaying massive amount of multidimensional data and their summaries through objects that are interesting for non-mathematically oriented users and are sufficiently powerful to engage the viewer and keep the attention for period of time needed to detect important features of the data. We report on a visualization technique that we are developing and refer to as {\em Figural Animation (FA)}. The FA technique combines size, shape, orientation, color, and sound to display a multidimensional data point or summaries of the multivariate data distribution. A {\em figural} refers to a familiar object such as a human being. In each static position, a figural represents a single multidimensional point. Each feature of the figural represents one to four dimensions of the data point. Traversing through data points animates the figural. The dynamism of the animation of a colorful figural engages the viewer with the data. Outlying data points can be shown by disappearance or unusual enlargement of some features of the figural.

Temporal and spatial dynamics, hyper-linear structures, and clusters can be explored by the figural animation. In this presentation, we will show animations that reveal various types of bivariate relationships.


Take me back to the main conference page