| Organizers: | Arnold Goodman (Chair),
agoodman@uci.edu John Elder, elder@datamininglab.com |
Speakers
8:15 a.m.
An Overview of KDD-98
John Elder, Elder Research
The International Conference on Knowledge Discovery and Data Mining is probably the gathering most like the Interface Symposium, and shares its goal of pursuing data-based inference applications at the juncture of several fields. KDD has been held with the cooperation of other societies (ACM, AAAI, VLDB) and this year (August in San Diego) is cooperating with the ASA Statistical Computing Section and the Interface. KDD may co-locate with the Interface in 2001 (if our cultural differences can be set aside!).KDD has grown rapidly, and has attracted an admirable amount of participation from industry. The majority of technical contributors are from outside Statistics -- principally, Computer Science -- and often have a different perspective on problems of common interest. I will highlight some of the key strands of research presented at last year's 4th KDD conference (in New York), and discuss what we might <gasp> learn from their distinctives.
8:45 a.m.
The State of Boosting
Greg Ridgeway,
University of Washington
Keywords: boosting, prediction, generalized linear model
In many problem domains, combining the predictions of several models often results in a model with improved predictive performance. Boosting is one such method that has shown great promise.On the applied side, empirical studies have shown that combining models using boosting methods produces more accurate classification and regression models. In particular, applying boosting to the naive Bayes classifier creates a scalable, accurate, and interpretable procedure. Boosting has also produced a new class of non- parametric regression procedures. These methods are extendible to the generalized linear model including models for survival analysis.
I will introduce boosting, discuss the current state of boosting, and show how these methods connect to more standard statistical practice.
9:15 a.m.
Occam's Two Razors: The Sharp and the Blunt
Pedro Domingos,
Instituto Superior Técnico, Portugal
Occam's razor has been the subject of much controversy. This is partly because it has been interpreted in two quite different ways, the first of which (simplicity is a goal in itself) is essentially correct, while the second (simplicity leads to greater accuracy) is not. A critical review of the theoretical arguments for and against the "second razor" shows that it is unfounded as a universal principle, and demonstrably false. A review of empirical evidence shows that it also fails as a practical heuristic. In particular, I will build on the case of (Schaffer, 1993) and (Webb, 1996) by considering additional theoretical arguments and recent empirical evidence that the second razor fails in most domains. I will propose a more appropriate version of the first razor, and argue that continuing to apply the second razor risks causing significant opportunities to be missed.