Working Paper (2021)
Fernández-de-Marcos Alberto and García-Portugués Eduardo
By Koch Inge and Naito Kanta
Working Paper (2021)
Principal component analysis (PCA) is the tool of choice for summarising multi-variate and high-dimensional data in a lower-dimensional space. PCA works well for normal data, but may not do so well for high-dimensional, heavy-tailed data and data with outliers as encountered in practice. We consider nonlinear PCA based on the spatial sign, the spatial rank, and Kendall’s τ covariance matrix and examine properties of these nonlinear covariance matrices and the asymptotic behaviour of their sample analogues. At the population level we review relationships between the canonical covariance matrix and these nonlinear covariance matrices. For the random sample we consider estimators of the population eigenvectors derived from these nonlinear covariance matrices, we examine asymptotic properties of the estimators and present relationships between the nonlinear sample covariance matrices. The third part focusses on data: real multivariate and high-dimensional low sample size data as well as simulated data from a wide range of distributions and sample sizes. We investigate and compare the behaviour and performance of the first few PC directions of the different covariance matrices at various dimensions as the sample size increases. The synthesis of the theoretical properties at the population level, their sample- based estimators and the realisation of these ideas on data provides insight into the performance of the different estimators, when they work well and under what conditions. Overall, the rank-based covariance matrix emerges as a strong contender to the natural covariance matrix with superior properties for data for which the sample co-variance matrix has been known to perform poorly. It also outperforms the spatial sign covariance matrix which can be unstable. These properties render rank-based PCA a serious competitor for dimension reduction and feature selection while retaining most of the features valued in PCA.
Koch I. and Naito K. (2021) Principal componentsof spatial, rank and Kendalls tau-covariance matrices for the population, randomsamples, real and simulated data. Working Paper.