Correlation and Variance-Covariance Matrices

Variance-covariance and correlation matrices are among the most important quantitative measures of a data set that characterize statistical relationships involving dependence.

Specifically, the covariance measures the extent to which variables “fluctuate together” (that is, co-vary). The correlation is the covariance normalized to be between -1 and +1. A positive correlation indicates the extent to which variables increase or decrease simultaneously. A negative correlation indicates the extent to which one variable increases while the other one decreases. Values close to +1 and -1 indicate a high degree of linear dependence between variables.

Details

Given a set \(X\) of \(n\) feature vectors \(x_1 = (x_{11}, \ldots, x_{1p}), \ldots, x_n = (x_{n1}, \ldots, x_{np})\) of dimension \(p\), the problem is to compute the sample means and variance-covariance matrix or correlation matrix:

Correlation and Variance-Covariance Matrices

Statistic

Definition

Means

\(M = (m(1), \ldots , m(p))\), where \(m\left(j\right)=\frac{1}{n}\sum _{i}{x}_{ij}\)

Variance-covariance matrix

\(Cov = (v_{ij})\), where \(v_{ij}=\frac{1}{n-1}\sum_{k=1}^{n}(x_{ki}-m(i))(x_{kj}-m(j))\), \(i=\overline{1,p}\), \(j=\overline{1,p}\)

Correlation matrix

\(Cor = (c_{ij})\), where \(c_{ij}=\frac{v_{ij}}{\sqrt{v_{ii}\cdot v_{jj}}}\), \(i=\overline{1,p}\), \(j=\overline{1,p}\)

Computation

The following computation modes are available:

Examples

Performance Considerations

To get the best overall performance when computing correlation or variance-covariance matrices:

  • If input data is homogeneous, provide the input data and store results in homogeneous numeric tables of the same type as specified in the algorithmFPType class template parameter.

  • If input data is non-homogeneous, use AOS layout rather than SOA layout.

Product and Performance Information

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex​.

Notice revision #20201201