\magnification=1200 \baselineskip=20pt \nopagenumbers \font\big=cmr12 scaled \magstep2 \centerline{\bf STANFORD UNIVERSITY} \centerline{\bf DEPARTMENT OF STATISTICS} \centerline{\big DEPARTMENTAL SEMINAR} \bigskip \baselineskip=12pt \centerline{4:15 p.m., Tuesday, December 4, 2001} \centerline{Sequoia Hall Room 200} \centerline{(Cookies at 3:45 in 1st Floor Lounge)} \bigskip \baselineskip=15pt \centerline{\sl Robert Tibshirani} \centerline{\sl Department of Statistics} \centerline{\sl Stanford University} \centerline{\sl Stanford, CA 94305} \bigskip \centerline{\bf Cluster Validation by Prediction Strength} \bigskip We propose a new quantity for assessing the number of groups or clusters in a dataset. The key idea is to view clustering as a supervised classification problem, in which we must also estimate the ``true'' class labels. The resulting ``prediction strength'' measure assesses how many groups can be predicted from the data, and how well. In the process, we develop novel notions of bias and variance for unlabelled data. Prediction strength performs well in simulation studies, and we apply it to clusters of breast cancer samples from a DNA microarray study. Finally, some consistency properties of the method are established. (This is joint work with Guenther Walther, Pat Brown and David Botstein) \bye