Title: Computer Vision and Metrics Learning for Hypothesis Testing: An Application of Q-Q Plot for Normality Test
Speaker: Dr. Ke-Wei Huang, Department of Information Systems and Analytics at the School of Computing, National University of Singapore
Venue: Room 214, Building37, Wushan Campus
Introduction to the speaker:
Dr. Ke-Wei Huang is an Associate Professor with the Department of Information Systems and Analytics at the School of Computing of National University of Singapore. Prior to joining NUS in 2007, he obtained his Bachelor’s degree in electrical engineering (1995) and M.B.A. in finance (1997) from National Taiwan University, M.S. degree (2002) and Ph.D. degree in Information Systems (2007) from Stern School of Business at New York University. His fields of research include using machine learning to improve social science research methods, machine learning applications in finance or accounting, entrepreneurship and innovation research for IT and knowledge workers, and pricing information goods. His works have been published in Information Systems Research, Strategic Management Journal, Production and Operations Management, Journal of Economics & Management Strategy, Quantitative Marketing and Economics, IEEE Transaction on Engineering Management, Decision Support Systems, and ACM Transactions on MIS.
This paper proposes a new deep-learning method to construct test statistics by computer vision and metrics learning. The application highlighted in this paper is applying computer vision on Q-Q plot to construct a new test statistic for normality test. To the best of our knowledge, there is no similar application documented in the literature. Traditionally, there are two families of approaches for verifying the probability distribution of a random variable. Researchers either subjectively assess the Q-Q plot or objectively use a mathematical formula, such as Kolmogorov-Smirnov test, to formally conduct a normality test. Graphical assessment by human beings is not rigorous whereas normality test statistics may not be accurate enough when the uniformly most powerful test does not exist. It may take tens of years for statistician to develop a new test statistic that is more powerful statistically. Our proposed method integrates four components based on deep learning: an image representation learning component of a Q-Q plot, a dimension reduction component, a metrics learning component that best quantifies the differences between two Q-Q plots for normality test, and a new normality hypothesis testing process. Our experimentation results show that the machine-learning-based test statistic can outperform several widely-used traditional normality tests. This study provides convincing evidence that the proposed method could objectively create a powerful test statistic based on Q-Q plots and this method could be modified to construct many more powerful test statistics for other applications in the future.