This is the API documentation for all nonlinear procedures.
fsml_hclust
The procedure is an implementation of the agglomerative hierarchical clustering method that groups data points into clusters by iteratively merging the most similar clusters. The procedure uses centroid linkage and the Mahalanobis distance as a measure of similarity.
The input matrix (x
) holds observations in rows (nd
) and variables in columns (nv
).
The target number of clusters (nc
) must be at least 1 and not greater than the number
of data points.
The variables are standardised before computing the covariance matrix on the transformed data. The matrix is used for calculating the Mahalanobis distance.
Clusters are merged iteratively until the target number of clusters is reached.
The global mean (gm
), cluster centroids (cm
), membership assignments (cl
),
and cluster sizes (cc
), the covariance matrix (cov
) and standard deviations
(sigma
) used in the distance calculations are returned.
Note: This procedure uses the pure procedure for calculating the Mahalanobis distance
f_lin_mahalanobis_core
, which useschol
from the stdlib_linalg
module.
Note
The current implementation of the procedure is exact, but very slow for large nd
.
call
fsml_hclust(x, nd, nv, nc, gm, cm, cl, cc, cov, sigma)
x
: A rank-2 array of type real
with dimensions nd
, nv
.
nd
: A scalar of type integer
.
nv
: A scalar of type integer
.
nc
: A scalar of type integer
.
Invalid argument values will result in the return of a sentinel value.
gm
: A rank-1 array of type real
with dimension nv
.
cm
: A rank-2 array of type real
with dimensions nv
, nv
.
cl
: A rank-1 array of type integer
with dimension nd
.
cc
: A rank-1 array of type integer
with dimension nc
.
cov
: A rank-2 array of type real
with dimensions nv
, nv
.
sigma
: A rank-1 array of type real
with dimension nv
.
fsml_kmeans
The procedure implements the k-means clustering algorithm using the Mahalanobis
distance as the similarity measure. It accepts initial centroids (cm_in
), refines
them iteratively, and returns the final centroids (cm
).
The input matrix (x
) holds observations in rows (nd
) and variables in columns (nv
).
The number of clusters (nc
) must be at least 1 and not greater than the number of
data points. The procedure assigns each observation to the nearest centroid using the
Mahalanobis distance, recomputes centroids from cluster memberships, and iterates until
convergence or the iteration limit is reached. Final centroids are sorted by the first
variable, and assignments are updated accordingly.
If the covariance matrix (cov_in
) is passed, it will be used to calculate the
Mahalanobis distance. If it is not passed, the variables are standardised before
computing the covariance matrix on the transformed data.
The global mean (gm
), cluster centroids (cm
), membership assignments (cl
),
and cluster sizes (cc
), the covariance matrix (cov
- either cov_in
if passed,
or internally calculated if cov_in
is not passed) and standard deviations (sigma
)
used in the distance calculations are returned.
Note: This procedure uses the pure procedure for calculating the Mahalanobis distance
f_lin_mahalanobis_core
, which useschol
from the stdlib_linalg
module.
call
fsml_kmeans(x, nd, nv, nc, cm_in, gm, cm, cl, cc, cov, sigma, [cov_in])
x
: A rank-2 array of type real
with dimensions nd
, nv
.
nd
: A scalar of type integer
.
nv
: A scalar of type integer
.
nc
: A scalar of type integer
.
cm_in
: A rank-2 array of type real
with dimensions nv
, nc
.
cov_in
(optional): A rank-2 array of type real
with dimensions nv
, nv
.
Invalid argument values will result in the return of a sentinel value.
gm
: A rank-1 array of type real
with dimension nv
.
cm
: A rank-2 array of type real
with dimensions nv
, nc
.
cl
: A rank-1 array of type integer
with dimension nd
.
cc
: A rank-1 array of type integer
with dimension nc
.
cov
: A rank-2 array of type real
with dimensions nv
, nv
.
sigma
: A rank-1 array of type real
with dimension nv
.
fsml_hkmeans
The procedure implements a hybrid clustering approach combining agglomerative hierarchical
clustering and k-means clustering, both using the Mahalanobis distance as the similarity measure.
The hierarchical step first partitions the data into nc
clusters by iteratively merging the most
similar clusters. The resulting centroids from are then used as initial centroids (cm_in
)
for the k-means procedure, which refines them iteratively.
The input matrix (x
) holds observations in rows (nd
) and variables in columns (nv
).
The number of clusters (nc
) must be at least 1 and not greater than the number of data points.
In the hierarchical clustering step, variables are standardised before computing the covariance matrix
on the transformed data. The covariance matrix is passed to the k-means clustering procedure along
with the initial cluster centroids. The k-means clustering step then assigns each observation to the
nearest centroid, recomputes centroids from cluster memberships, and iterates until convergence or
the iteration limit is reached. Final centroids are sorted by the first variable, and assignments
are updated accordingly.
The global mean (gm
), final cluster centroids (cm
), membership assignments (cl
), and cluster
sizes (cc
), the covariance matrix (cov
) and standard deviations (sigma
) used in the distance
calculations are returned.
Note: This procedure uses the pure procedure for calculating the Mahalanobis distance
f_lin_mahalanobis_core
, which useschol
from the stdlib_linalg
module.
call
fsml_hkmeans(x, nd, nv, nc, gm, cm, cl, cc, cov, sigma)
x
: A rank-2 array of type real
with dimensions nd
, nv
.
nd
: A scalar of type integer
. It must be at least 1.
nv
: A scalar of type integer
. It must be at least 1.
nc
: A scalar of type integer
. It must be at least 1 and not greater than nd
.
Invalid argument values will result in the return of a sentinel value.
gm
: A rank-1 array of type real
with dimension nv
.
cm
: A rank-2 array of type real
with dimensions nv
, nc
.
cl
: A rank-1 array of type integer
with dimension nd
.
cc
: A rank-1 array of type integer
with dimension nc
.
cov
: A rank-2 array of type real
with dimensions nv
, nv
.
sigma
: A rank-1 array of type real
with dimension nv
.
program example_nlp
! |--------------------------------------------------------------------|
! | fsml - fortran statistics and machine learning library |
! | |
! | about |
! | ----- |
! | Examples for nonlinear procedures (nlp module). |
! | |
! | license : MIT |
! | author : Sebastian G. Mutz (sebastian@sebastianmutz.com) |
! |--------------------------------------------------------------------|
use :: fsml
use :: fsml_ini ! import wp; alternatively: iso_fortran_env, wp => real64
implicit none
! ==== Description
!! The programme demonstrates the use of nonlinear procedures.
! ==== Declarations
integer(i4), parameter :: nd = 100 ! number of data points
integer(i4), parameter :: nv = 5 ! number of variables ("features")
integer(i4), parameter :: nc = 4 ! number of target clusters for hcluster
real(wp) :: x(nd, nv) ! data
real(wp) :: gm(nv) ! global mean per variable
real(wp) :: cm_h(nv, nc) ! cluster centroids (hcluster)
real(wp) :: cm_k(nv, nc) ! cluster centroids (k-means)
real(wp) :: cov(nv, nv) ! covariance matrix (hcluster)
real(wp) :: cov_k(nv, nv)
real(wp) :: sigma(nv) ! standard dev per variable
integer(i4) :: cl(nd) ! cluster assignment per data point
integer(i4) :: cc(nc) ! cluster sizes
integer(i4) :: i, j ! for loops
! ==== Instructions
! create data
do i = 1, nd
do j = 1, nv
x(i,j) = sin(real(i*j, wp)) ! example data, replace with real input
enddo
enddo
! ---- hierarchical clustering
! conduct hierarchical clustering
call fsml_hclust(x, nd, nv, nc, gm, cm_h, cl, cc, cov, sigma)
write(*,'(A)') "> hierarchical cluster analysis"
print*
write(*,'(A)') " cluster centroids: "
do j = 1, nc
write(*,'(A,I2,A,I2,A,5F12.6)'), " cluster", j, " (size: ", cc(j), ")", cm_h(1:nv, j)
enddo
print*
write(*,'(A)'), ' global means (gm):'
write(*,'(5F12.6)'), gm(1:nv)
print*
write(*,'(A)'), ' covariance matrix:'
do i = 1, nv
write(*,'(5F12.6)'), cov(i, 1:nv)
enddo
print*
write(*,'(A)'), ' standard deviations (sigma):'
write(*,'(5F12.6)'), sigma(1:nv)
print*
! cluster centroids:
! cluster 1 (size: 10) 1.114982 1.280285 0.426739 -0.620822 -0.979638
! cluster 2 (size: 74) 0.046810 -0.284541 -0.044039 -0.064682 0.347077
! cluster 3 (size: 9) -0.540211 0.961225 -1.147117 1.125295 -0.846290
! cluster 4 (size: 7) -1.393120 -0.056833 1.330794 0.123859 -1.181532
!
! global means (gm):
! -0.001272 -0.002720 -0.004636 -0.007745 -0.014948
!
! covariance matrix:
! 1.000000 0.000134 0.000297 0.000800 0.005803
! 0.000134 1.000000 0.000932 0.006108 -0.003596
! 0.000297 0.000932 1.000000 -0.003435 -0.002111
! 0.000800 0.006108 -0.003435 1.000000 -0.001955
! 0.005803 -0.003596 -0.002111 -0.001955 1.000000
!
! standard deviations (sigma):
! 0.712573 0.712679 0.714744 0.711827 0.711722
! ---- k-means clustering
! conduct k-means clustering with clusters and covariance matrix from hcluster
call fsml_kmeans(x, nd, nv, nc, cm_h, gm, cm_k, cl, cc, cov_k, sigma, cov_in=cov)
write(*,'(A)') "> k-means cluster analysis"
print*
write(*,'(A)') " cluster centroids: "
do j = 1, nc
write(*,'(A,I2,A,I2,A,5F12.6)'), " cluster", j, " (size: ", cc(j), ")", cm_k(1:nv, j)
enddo
print*
write(*,'(A)'), ' global means (gm):'
write(*,'(5F12.6)'), gm(1:nv)
print*
write(*,'(A)'), ' covariance matrix:'
do i = 1, nv
write(*,'(5F12.6)'), cov_k(i, 1:nv)
enddo
print*
write(*,'(A)'), ' standard deviations (sigma):'
write(*,'(5F12.6)'), sigma(1:nv)
print*
! cluster centroids:
! cluster 1 (size: 20) 0.897497 1.076016 0.566456 0.025259 -0.064641
! cluster 2 (size: 50) 0.268886 -0.806361 -0.226511 -0.142963 0.169572
! cluster 3 (size: 16) -0.772046 1.132262 -0.927534 0.439275 0.033617
! cluster 4 (size: 14) -1.360107 0.048683 1.059784 -0.027532 -0.551692
!
! global means (gm):
! -0.001272 -0.002720 -0.004636 -0.007745 -0.014948
!
! covariance matrix:
! 1.000000 0.000134 0.000297 0.000800 0.005803
! 0.000134 1.000000 0.000932 0.006108 -0.003596
! 0.000297 0.000932 1.000000 -0.003435 -0.002111
! 0.000800 0.006108 -0.003435 1.000000 -0.001955
! 0.005803 -0.003596 -0.002111 -0.001955 1.000000
!
! standard deviations (sigma):
! 0.712573 0.712679 0.714744 0.711827 0.711722
! ---- hybrid hierarchical - k-means clustering
! conduct clustering (combines 2 steps above)
call fsml_hkmeans(x, nd, nv, nc, gm, cm_k, cl, cc, cov_k, sigma)
write(*,'(A)') "> hierarchical clustering with k-means refinements"
print*
write(*,'(A)') " cluster centroids: "
do j = 1, nc
write(*,'(A,I2,A,I2,A,5F12.6)'), " cluster", j, " (size: ", cc(j), ")", cm_k(1:nv, j)
enddo
print*
write(*,'(A)'), ' global means (gm):'
write(*,'(5F12.6)'), gm(1:nv)
print*
write(*,'(A)'), ' covariance matrix:'
do i = 1, nv
write(*,'(5F12.6)'), cov_k(i, 1:nv)
enddo
print*
write(*,'(A)'), ' standard deviations (sigma):'
write(*,'(5F12.6)'), sigma(1:nv)
print*
! cluster centroids:
! cluster 1 (size: 20) 0.897497 1.076016 0.566456 0.025259 -0.064641
! cluster 2 (size: 50) 0.268886 -0.806361 -0.226511 -0.142963 0.169572
! cluster 3 (size: 16) -0.772046 1.132262 -0.927534 0.439275 0.033617
! cluster 4 (size: 14) -1.360107 0.048683 1.059784 -0.027532 -0.551692
!
! global means (gm):
! -0.001272 -0.002720 -0.004636 -0.007745 -0.014948
!
! covariance matrix:
! 1.000000 0.000134 0.000297 0.000800 0.005803
! 0.000134 1.000000 0.000932 0.006108 -0.003596
! 0.000297 0.000932 1.000000 -0.003435 -0.002111
! 0.000800 0.006108 -0.003435 1.000000 -0.001955
! 0.005803 -0.003596 -0.002111 -0.001955 1.000000
!
! standard deviations (sigma):
! 0.712573 0.712679 0.714744 0.711827 0.711722
end program example_nlp