interface fsml_lda_2class The 2-class multivariate Linear Discriminant Analysis (LDA) is a statistical procedure for classification and the investigation and explanation of differences between two groups (or classes) with regard to their attribute variables. It quantifies the discriminability of the groups and the contribution of each of the attribute variables to this discriminability.
The procedure finds a discriminant function that best separates the two groups. The function can be expressed as a linear combination of the attribute variables:
where is the discriminant function, are the attribute
variables used in evaluating the differences between the groups,
are the discriminant coefficients associated with each variable, (nv
) is
the number of variables, and is the y-intercept.
(Note: Mathematically, it is analogous to a multivariate linear regression function.)
Each attribute variable contains elements (x
), where
(nd
) is the number of elements in each group. Each element is associated with a
discriminant value described by:
Geometrically, this can be visualised as elements being projected on the
discriminant axis . The optimal discriminant function is then determined by
finding an axis, on which the projected elements for the two groups are best separated.
The best separation is given by maximising the discriminant criterion (g
),
a signal to noise ratio, so that:
where and are the number of elements in groups and ,
respectively. The procedure assumes that these are the same (nd
) and only accepts 2 groups (nc = 2
).
The discriminant coefficients are then standardised (sa
) using the standard deviations
of respective variables. The discriminant function represents a model that best seperates
the groups and can be used as a classification model. The skill of that model is determined
by forgetting the association of each element with the groups and using the model to reclassify
the elements. The score (score
) is the fraction of correct classifications and can be
interpreted as a measure of how well the function works as a classification model.
The procedure optionally returns the Mahalanobis distance (mh
) as a measure of distance
between the groups.
Note: This subroutine uses eigh
from the stdlib_linalg
module.
2-class multivariate Linear Discriminant Analysis (LDA)
Type | Intent | Optional | Attributes | Name | ||
---|---|---|---|---|---|---|
real(kind=wp), | intent(in) | :: | x(nd,nv,nc) |
input data (nd samples × nv variables × nc classes) |
||
integer(kind=i4), | intent(in) | :: | nd |
number of datapoints per class |
||
integer(kind=i4), | intent(in) | :: | nv |
number of variables |
||
integer(kind=i4), | intent(in) | :: | nc |
number of classes (must be 2) |
||
real(kind=wp), | intent(out) | :: | sa(nv) |
standardised discriminant coefficients |
||
real(kind=wp), | intent(out) | :: | g |
discriminant criterion |
||
real(kind=wp), | intent(out) | :: | score |
classification score |
||
real(kind=wp), | intent(out), | optional | :: | mh |
Mahalanobis distance |