The procedure implements a hybrid clustering approach combining agglomerative hierarchical
clustering and k-means clustering, both using the Mahalanobis distance as the similarity measure.
The hierarchical step first partitions the data into nc
clusters by iteratively merging the most
similar clusters. The resulting centroids from are then used as initial centroids (cm_in
)
for the k-means procedure, which refines them iteratively.
The input matrix (x
) holds observations in rows (nd
) and variables in columns (nv
).
The number of clusters (nc
) must be at least 1 and not greater than the number of data points.
In the hierarchical clustering step, variables are standardised before computing the covariance matrix
on the transformed data. The covariance matrix is passed to the k-means clustering procedure along
with the initial cluster centroids. The k-means clustering step then assigns each observation to the
nearest centroid, recomputes centroids from cluster memberships, and iterates until convergence or
the iteration limit is reached. Final centroids are sorted by the first variable, and assignments
are updated accordingly.
The global mean (gm
), final cluster centroids (cm
), membership assignments (cl
), and cluster
sizes (cc
), the covariance matrix (cov
) and standard deviations (sigma
) used in the distance
calculations are returned.
Note: This procedure uses the pure procedure for calculating the Mahalanobis distance
f_lin_mahalanobis_core
, which useschol
from the stdlib_linalg
module.
Impure wrapper procedure for s_nlp_hkmeans_core
.
Type | Intent | Optional | Attributes | Name | ||
---|---|---|---|---|---|---|
real(kind=wp), | intent(in) | :: | x(nd,nv) |
input data matrix (samples, variables) |
||
integer(kind=i4), | intent(in) | :: | nd |
number of data points |
||
integer(kind=i4), | intent(in) | :: | nv |
number of variables |
||
integer(kind=i4), | intent(in) | :: | nc |
number of clusters (target) |
||
real(kind=wp), | intent(out) | :: | gm(nv) |
global means for each variable |
||
real(kind=wp), | intent(out) | :: | cm(nv,nc) |
cluster centroids |
||
integer(kind=i4), | intent(out) | :: | cl(nd) |
cluster assignments for each data point |
||
integer(kind=i4), | intent(out) | :: | cc(nc) |
cluster sizes |
||
real(kind=wp), | intent(out) | :: | cov(nv,nv) |
covariance matrix |
||
real(kind=wp), | intent(out) | :: | sigma(nv) |
standard deviation per variable |