




LJKProbability & Statistics SeminarOn Thursday June 21 2012 at 14h00 in Salle 2  Tour IRMA Seminary of Asis Kumar CHATTOPADHYAY (Department of Statistics / Calcutta University) Dimension reduction and identification of data clusters under Gaussian and non Gaussian set up Summary For many real life situations, the number of variables under consideration and the number of observations are very large. In order to analyze such multivariate data, it is necessary to reduce the dimension properly. A smaller dimension is necessary for further analysis like classification or clustering. In statistics, Principal Component Analysis (PCA) is the most popular among the dimension reduction techniques. Although basically PCA is an exploratory technique, for making inference it is necessary to make normality assumption regarding the underlying multivariate distribution. The eigen values and eigenvectors of the covariance or correlation matrix are the main contributors of a PCA. The eigenvectors determine the directions of maximum variability whereas the eigen values specify the variances. In practice, decisions regarding the quality of the Principal Component approximation should be made on the basis of eigen valueeigenvector pairs. In order to study the sampling distribution of their estimates the multivariate normality assumptions become necessary as otherwise it is too difficult. Principal components (PCs) are a sequence of projections of the data. The components are constructed in such a way that they are uncorrelated and ordered in variance. The PCs of a pdimensional data set provide a sequence of best linear approximations. As only a few (say, m<< p) of such linear combinations may explain a larger percentage of variation in the data, one can take only those m components instead of p variables for further analysis.


Mentions légales  contact: Webmaster