Friday, February 22, 2008

[ML Lunch] Shuheng Zhou on High Dimensional Sparse Regression and Structure Estimation

Speaker: Shuheng Zhou
Title: High Dimensional Sparse Regression and Structure Estimation
Venue: NSH 1507
Date: Monday February 25Time: 12:00 noon

Abstract:
Recent research has demonstrated that sparsity is a powerful technique insignal reconstruction and in statistical inference. Recent work shows that$\ell_1$-regularized least squares regression can accurately estimate a sparsemodel from n noisy samples in $p$ dimensions, even if p is much larger than n.My talk focuses on studying the role of sparsity in high dimensionalregression when the original noisy samples are compressed, and onstructure estimation in Gaussian graphical models when the graphs evolveover time.

In high-dimensional regression, the sparse object is a vector \betain Y = X \beta + \epsilon, where X is n by p matrix such that $n <<> n even for the case when $\epsilon =0$. However, when the vector \betais sparse, one can recover an empirical $\hat \beta$ that is consistent interms of its support with true $\beta$. In joint work with John Lafferty andLarry Wasserman, we studied the regression problem under the setting that theoriginal n input variables are compressed by a random Gaussian ensemble to m
examples in $p$ dimensions, where m << n or p. A primary motivation for thiscompression procedure is to anonymize the data and preserve privacy byrevealing little information about the original data. We establishedsufficient mutual incoherence conditions on X, under which a sparse linearmodel can be successfully recovered from the compressed data. Wecharacterized the number of random projections that are required for$\ell_1$-regularized compressed regression to identify the nonzerocoefficients in the true model with probability approaching one. Inaddition, we showed that $\ell_1$-regularized compressed regressionasymptotically predicts as well as an oracle linear model, a propertycalled ``persistence''. Finally, we established upper bounds on the mutualinformation between the compressed and uncompressed data that decay to zero.

Undirected graphs are often used to describe high dimensional distributions.Under sparsity conditions, the graph can be estimated using $L_1$ penalizationmethods. However, current methods assume that the data are independent andidentically distributed. If the distribution---and hence the graph--- evolvesover time then the data are not longer identically distributed. In the secondpart of the talk, I show how to estimate the sequence of graphs fornon-identically distributed data and establish some theoretical results onconvergence rate in the predictive risks and the Frobenius norm of theinverse covariance matrix. This is joint work with John Lafferty and LarryWasserman.

No comments: