Visualization-based Model Analysis

Penny Rheingans and Marie desJardins

Discovering interesting information in large, high-dimensional data space s is a challenging problem. Using inductive machine learning techniques to construct classification models has proven to be one useful approach for solving this problem. A typical machine learning application involves a great deal of manual effort to iteratively construct a representation of the domain (feature engineering), set the parameters of the learning algorithm, induce a set of models, and analyze the resulting models. To support this process, we are developing a set of visualization methods with the goal of improving a user's ability to evaluate the quality of learned models, including techniques for high-dimensional data space projection, display of probabilistic predictions, variable/class correlation, and instance mapping.

Traditional model analysis methods primarily consist of numerical and statistical tools for assessing the quality of a learned model. These tools include classification accuracy, confusion matrices, and receiver operating characteristic (ROC) curves. Our visualization techniques provide a richer representation of the information that the statistical tools summarize by a single number or curve, and are meant to augment, not replace, these statistical tools. To that end, we discuss in this paper how the visualization methods can be used to gain insights into how the behavior of the model varies across the data space. These insights could be used to guide the application development process by pinpointing, for example, regions of the data space (groups of individuals) with high misclassification rates, thus helping the user to determine what additional data to gather, or how to modify the set of features to improve differentiation.

Publications

Penny Rheingans and Marie desJardins(2000). Visualizing High-Dimensional Predictive Model Quality. Proceedings of IEEE Visualization '00, (to appear).

Marie desJardins and Penny Rheingans (2000). Visualization of High-Dimensional Model Characteristics. Proceedings of New Paradigms in Information Visualization, ACM Press, pp. 6-9.

© 2000