\documentclass[11pt]{article} \setlength{\oddsidemargin}{0.0truein} \setlength{\evensidemargin}{0.0truein} \setlength{\textwidth}{6.5truein} \setlength{\topmargin}{0.0truein} \setlength{\textheight}{9.0truein} \setlength{\headsep}{0.0truein} \setlength{\headheight}{0.0truein} \setlength{\topskip}{10.0pt} \setlength{\parskip}{5mm} \usepackage{url} \begin{document} \begin{center} \textbf{\textsc{STANFORD UNIVERSITY}}\\[5pt] \textbf{\textsc{DEPARTMENT OF STATISTICS}}\\[5pt] \Large{\textbf\textsc{{DEPARTMENTAL SEMINAR}}} \end{center} \begin{center} 4:15 p.m., Tuesday, March 14, 2006\\ Sequoia Hall Room 200\\ (Cookies at 3:45 in 1st Floor Lounge) \end{center} \begin{center} \textsl{Lutz Duembgen}\\ University of Berne \end{center} \begin{center} \textbf{P-Values for Classification} \end{center} \noindent Abstract: Let $(X,Y)$ be a complete observation consisting of a feature vector $X$ and a class label $Y \in \{1,2,\ldots,K\}$. Classification means to predict the unobserved class label $Y$ from the observed feature vector $X$. The joint distribution of $(X,Y)$ is typically unknown and estimated from certain training data. We propose to replace classifiers (i.e. point predictors) or the (estimated) posterior distribution $L(Y \vert X)$ with a tupel of p-values for the $K$ potential class memberships. After a brief discussion of the potential benefits of this approach, we extend the classical theory of optimal classifiers to optimal p-values. Thereafter we discuss the impact of estimating the joint distribution from training data and describe a general method based on permutation tests. Some theoretical results and numerical examples show the method's potential. \end{document}