Canonical Correlation Analysis (CCA) is a multivariate statistical technique that explores the relationships between two sets of variables by finding linear combinations (projections) of each set that are maximally correlated with each other.In formal terms, given two random vectors $X \in \mathbb{R}^p$ and $Y \in \mathbb{R}^q$, CCA finds vectors $a$ and $b$ such that the canonical variables $U = a^T X$ and $V = b^T Y$ have the highest possible Pearson correlation coefficient.It produces a sequence of such pairs $(U_1, V_1), (U_2, V_2), \dots$ where each subsequent pair captures the next largest remaining correlation under the constraint of being uncorrelated with previous pairs. These $U_i$ and $V_i$ are called canonical variates. By examining the coefficients (the elements of $a$ and $b$), one can interpret how the original variables contribute to the shared patterns between the two datasets.CCA is particularly useful when we have two different sets of features describing the same observations and we want to understand the common underlying factors. It has been applied in fields like psychology (e.g., relating test scores and physiological measurements) and neuroscience (relating brain activity features to stimulus features), among others. Notably, CCA generalizes several other techniques: for instance, many multivariate significance tests (MANOVA, multivariate regression) can be framed as special cases of CCA.The method was first introduced by Harold Hotelling in 1936 and remains a cornerstone of multi-view learning, with modern extensions like kernel CCA and deep CCA allowing nonlinear and deep learning-based correlations to be captured. In summary, CCA finds the best cross-covariance structure between two sets of variables, helping to uncover latent associations that are not apparent from individual correlations within either set alone.
Data Selection & Data Viewer
Get data insights and find the perfect selection strategy
Learn MoreSelf-Supervised Pretraining
Leverage self-supervised learning to pretrain models
Learn MoreSmart Data Capturing on Device
Find only the most valuable data directly on devide
Learn More