When Your Big Data Seems Too Small: Accurate Inferences Beyond the Empirical Distribution

By Bilderback, Dayna

October 28, 2019

Speaker: Prof. Greg Valiant
Affiliation: Stanford University

Abstract: I will discuss several problems related to the general challenge of making accurate inferences about a complex phenomenon, in the regime in which the amount of available data (i.e., the sample size) is too small for the empirical distribution of the data to be an accurate representation of the phenomenon in question. For several fundamental and practically relevant settings, we will describe how it is possible to “denoise” the empirical distribution of the data significantly. Additionally, we describe how one can often make accurate inferences about the “unseen” portion of the distribution, corresponding to events that were never observed in the given dataset. Finally, we will also discuss the problem of estimating the “learnability” of a dataset: given too little labeled data to train an accurate model, it is often possible to estimate the extent to which a good model exists. Framed differently, even in the regime in which there is insufficient data to learn, it is possible to estimate the performance that could be achieved if additional data (drawn from the same data source) were obtained. We will also discuss a number of practical applications of these works.

Biography: Greg Valiant is an Assistant Professor in Stanford’s Computer Science Department. Some of his recent projects focus on designing algorithms for accurately inferring information about complex distributions, when given surprisingly little data. More broadly, his research interests are in algorithms, learning, applied probability, and statistics, and evolution. Prior to joining Stanford, Prof. Greg Valiant was a Postdoc at Microsoft Research, New England, and received his PhD from Berkeley in Computer Science, and BA in Math from Harvard.

For more information, contact Prof. Suhas Diggavi (suhas@ee.ucla.edu)

Date/Time:
Date(s) - Oct 28, 2019
12:30 pm - 1:30 pm

Location:
EE-IV Shannon Room #54-134
420 Westwood Plaza - 5th Flr., Los Angeles CA 90095