Demixed Principal Component Analysis

- 2 mins

We often encounter the brain areas where neurons are modulated by more than one variable; e.g. neurons in the prefrontal cortex (PFC) respond to both stimuli and the decisions of a subject simultaneously in various working memory tasks. Such neurons are said to have a mixed selectivity/representation, which makes it difficult to understand what information those neurons encode and how it is represented.

We want to unravel the neural specificity for each task parameter from a large number of population neurons. The problem reminds us of standard dimensionality reduction methods such as principal component analysis (PCA) or linear discriminant analysis (LDA) because they share the same objective to extract a few key low-dimensional components that explain the essence of high-dimensional data. Unfortunately, a vanilla PCA has no mathematical operations of actively separating the task parameters and thus the mixed selectivity remains in all principal components, whereas LDA typically aims to maximize the separation between groups of data with regard to only one parameter. Kobak et al.[1] introduced an interesting dimensionality reduction technique, demixed principal component analysis (dPCA), to overcome the limitations of existing methods and achieve the aforementioned goal.

In a nutshell, the first major step of dPCA is to decompose the data \(\mathbf{X}\) into uncorrelated conditional terms \(\mathbf{X}_{\phi}\) through marginalization procedures: \(\mathbf{X} = \sum_{\phi}\mathbf{X}_{\phi} + \mathbf{X}_{noise}\) where \(\mathbf{X}_{\phi}\) is regarded as the portion dependent on the parameter \(\phi\) only (with appropriate mean subtractions followed by averaging over \(\backslash \phi\)). The next is to perform PCA-like dimensionality reduction to demix the inherently entangled selectivity of neurons with the objective of minimizing Frobenius norm of the difference between \(\mathbf{X}_{\phi}\) and \(\hat{\mathbf{X}}_{\phi}\).

The core of dPCA lies in designing the linear decoder \(\mathbf{D}_{\phi}\) and encoder \(\mathbf{F}_{\phi}\) that match \(\hat{\mathbf{X}}_{\phi} (=\mathbf{F}_{\phi}\mathbf{D}_{\phi}\mathbf{X})\) to \(\mathbf{X}_{\phi}\) as much as possible. Here, \(\mathbf{D}_{\phi}\) and \(\mathbf{F}_{\phi}\) are estimated analytically by solving a linear regression problem with a sparse constraint on the regression coefficient matrix, which is known as the reduced-rank regression problem. I said “PCA-like” because \(\mathbf{F}_{\phi}\) and \(\mathbf{D}_{\phi}\) are forced to be equal in PCA but not in dPCA. Nevertheless, PCA technique is used in meeting the sparse constraint and determining final \(\mathbf{D}_{\phi}\) and \(\mathbf{F}_{\phi}\) as a consequence.

The authors provided extensive approaches to deal with practical issues such as overfitting and unbalanced/missing data, and they also discussed the current limitations of dPCA, which is what I strongly encourage to read.

I posted a Jupyter notebook on dPCA to revisit a toy example in the paper and visualize what dPCA actually performs. Hope it helps get to know dPCA better.

[1] D Kobak+, W Brendel+, C Constantinidis, CE Feierstein, A Kepecs, ZF Mainen, X-L Qi, R Romo, N Uchida, CK Machens
Demixed principal component analysis of neural population data
eLife 2016