fsml_lda_2class

public interface fsml_lda_2class

interface fsml_lda_2class The 2-class multivariate Linear Discriminant Analysis (LDA) is a statistical procedure for classification and the investigation and explanation of differences between two groups (or classes) with regard to their attribute variables. It quantifies the discriminability of the groups and the contribution of each of the attribute variables to this discriminability.

The procedure finds a discriminant function that best separates the two groups. The function can be expressed as a linear combination of the attribute variables:

$Y = \nu_0 + \nu_1 X_1 + \nu_2 X_2 + \dots + \nu_m X_m + \dots + \nu_M X_M$

where $Y$ is the discriminant function, $X_m (m=1...M)$ are the attribute variables used in evaluating the differences between the groups, $nu_m (m=1...M)$ are the discriminant coefficients associated with each variable, $M$ (nv) is the number of variables, and $\nu_0$ is the y-intercept. (Note: Mathematically, it is analogous to a multivariate linear regression function.)

Each attribute variable $X_m$ contains elements $x_{mn} (n=1…N)$ (x), where $N$ (nd) is the number of elements in each group. Each element is associated with a discriminant value $y_n$ described by:

$y_n = \nu_1 x_{1n} + \nu_2 x_{2n} + \dots + \nu_m x_{mn} + \dots + \nu_M x_{Mn}$

Geometrically, this can be visualised as elements $y_n$ being projected on the discriminant axis $Y$ . The optimal discriminant function is then determined by finding an axis, on which the projected elements for the two groups are best separated. The best separation is given by maximising the discriminant criterion $\Gamma$ (g), a signal to noise ratio, so that:

$\Gamma = \frac{\text{scatter between groups }}{\text{scatter within groups }} = \frac{(\bar{y}_{G1} - \bar{y}_{G2})^2} {\sum_{j=1}^{n_1} (y_{G1j} - \bar{y}_{G1})^2 + \sum_{j=1}^{n_2} (y_{G2j} - \bar{y}_{G2})^2} \rightarrow \max$

where $n_1$ and $n_2$ are the number of elements in groups $G1$ and $G2$ , respectively. The procedure assumes that these are the same (nd) and only accepts 2 groups (nc = 2).

The discriminant coefficients are then standardised (sa) using the standard deviations of respective variables. The discriminant function represents a model that best seperates the groups and can be used as a classification model. The skill of that model is determined by forgetting the association of each element with the groups and using the model to reclassify the elements. The score (score) is the fraction of correct classifications and can be interpreted as a measure of how well the function works as a classification model.

The procedure optionally returns the Mahalanobis distance (mh) as a measure of distance between the groups.

Note: This subroutine uses eigh from the stdlib_linalg module.

Calls

Help

Module Procedures

public subroutine s_lin_lda_2c(x, nd, nv, nc, sa, g, score, mh)

2-class multivariate Linear Discriminant Analysis (LDA)

Arguments

Type	Intent	Optional		Name
real(kind=wp),	intent(in)		::	x(nd,nv,nc)	input data (nd samples × nv variables × nc classes)
integer(kind=i4),	intent(in)		::	nd	number of datapoints per class
integer(kind=i4),	intent(in)		::	nv	number of variables
integer(kind=i4),	intent(in)		::	nc	number of classes (must be 2)
real(kind=wp),	intent(out)		::	sa(nv)	standardised discriminant coefficients
real(kind=wp),	intent(out)		::	g	discriminant criterion
real(kind=wp),	intent(out)		::	score	classification score
real(kind=wp),	intent(out),	optional	::	mh	Mahalanobis distance

fsml_lda_2class Interface

Contents

Module Procedures

public interface fsml_lda_2class

Calls

Module Procedures

public subroutine s_lin_lda_2c(x, nd, nv, nc, sa, g, score, mh)

Arguments