Principal Component Analysis (PCA) or Empirical Orthogonal Function (EOF) analysis is a procedure that reduces the dimensionality of multivariate data by identifying a set of orthogonal vectors (eigenvectors or EOFs) that represent directions of maximum variance in the dataset. EOF analysis is often used interchangably with the geographically weighted PCA. As they are mathematically identical, a single pca procedure is offered with optional arguments and outputs that also makes it usable as a classic EOF analysis.
For a classic PCA, the input matrix x
is assumed to contain observations in rows
and variables in columns.
For a classic EOF analysis, the input matrix x
is assumed to contain time in rows
and space in columns.
Optionally, the data can be standardised (using the correlation matrix) and/or column-wise weights can be applied prior to analysis. While the latter is unusual for a standard PCA, it is common for EOF analysis (geographically weighted PCA as often applied in geographical sciences).
The covariance or correlation matrix is computed as:
where:
- is the preprocessed (centred and optionally standardised) data matrix,
- is the number of observations (rows in x
).
A symmetric eigen-decomposition is performed: where: - contains the eigenvectors (EOFs), - is a diagonal matrix of eigenvalues representing variance explained.
The principal components (PCs) are given by:
The explained variance for each component is computed as:
EOFs may optionally be scaled for plotting:
This subroutine uses eigh
from the stdlib_linalg
module to compute
eigenvalues and eigenvectors of the symmetric covariance matrix.
x(m,n)
: Input data matrix (observations × variables)m
: Number of rows (observations)n
: Number of columns (variables)opt
: (Optional) Use 0 for covariance matrix, 1 for correlation matrix (default: 1)wt(n)
: (Optional) Column weights (default: equal weights)pc(m,n)
: Principal components (scores)eof(n,n)
: EOFs / eigenvectors (unweighted)ev(n)
: Eigenvalues (explained variance)r2(n)
: (Optional) Percentage of variance explained by each componenteof_scaled(n,n)
: (Optional) EOFs scaled by square root of eigenvaluesThe number of valid EOF/PC modes is determined by the number of non-zero eigenvalues. Arrays are initialised to zero and populated only where eigenvalues are strictly positive.
Empirical Orthogonal Function (EOF) analysis / Principal Component Analysis (PCA)
Type | Intent | Optional | Attributes | Name | ||
---|---|---|---|---|---|---|
real(kind=wp), | intent(in) | :: | x(m,n) |
input data |
||
integer(kind=i4), | intent(in) | :: | m |
number of rows |
||
integer(kind=i4), | intent(in) | :: | n |
number of columns |
||
integer(kind=i4), | intent(in), | optional | :: | opt |
0 = covariance, 1 = correlation |
|
real(kind=wp), | intent(in), | optional | :: | wt(n) |
optional weights (default = 1.0/n) |
|
real(kind=wp), | intent(out) | :: | pc(m,n) |
principal components |
||
real(kind=wp), | intent(out) | :: | eof(n,n) |
EOFs/eigenvectors (unweighted) |
||
real(kind=wp), | intent(out) | :: | ev(n) |
eigenvalues |
||
real(kind=wp), | intent(out), | optional | :: | eof_scaled(n,n) |
EOFs/eigenvectors scaled for plotting |
|
real(kind=wp), | intent(out), | optional | :: | r2(n) |
explained variance (%) |