fsml Module

FSML interface module.


Uses

  • module~~fsml~~UsesGraph module~fsml fsml module~fsml_dat fsml_dat module~fsml->module~fsml_dat module~fsml_dst fsml_dst module~fsml->module~fsml_dst module~fsml_ini fsml_ini module~fsml->module~fsml_ini module~fsml_lin fsml_lin module~fsml->module~fsml_lin module~fsml_nlp fsml_nlp module~fsml->module~fsml_nlp module~fsml_sts fsml_sts module~fsml->module~fsml_sts module~fsml_tst fsml_tst module~fsml->module~fsml_tst module~fsml_typ fsml_typ module~fsml->module~fsml_typ module~fsml_utl fsml_utl module~fsml->module~fsml_utl module~fsml_dat->module~fsml_ini module~fsml_dat->module~fsml_typ module~fsml_dat->module~fsml_utl module~fsml_dst->module~fsml_ini module~fsml_con fsml_con module~fsml_dst->module~fsml_con module~fsml_err fsml_err module~fsml_dst->module~fsml_err iso_fortran_env iso_fortran_env module~fsml_ini->iso_fortran_env stdlib_ansi stdlib_ansi module~fsml_ini->stdlib_ansi stdlib_linalg stdlib_linalg module~fsml_ini->stdlib_linalg module~fsml_lin->module~fsml_ini module~fsml_lin->module~fsml_sts module~fsml_lin->module~fsml_utl module~fsml_lin->module~fsml_con module~fsml_lin->module~fsml_err module~fsml_nlp->module~fsml_ini module~fsml_nlp->module~fsml_lin module~fsml_nlp->module~fsml_sts module~fsml_nlp->module~fsml_utl module~fsml_nlp->module~fsml_con module~fsml_nlp->module~fsml_err module~fsml_sts->module~fsml_ini module~fsml_sts->module~fsml_utl module~fsml_sts->module~fsml_con module~fsml_sts->module~fsml_err module~fsml_tst->module~fsml_dst module~fsml_tst->module~fsml_ini module~fsml_tst->module~fsml_sts module~fsml_tst->module~fsml_utl module~fsml_tst->module~fsml_con module~fsml_tst->module~fsml_err module~fsml_typ->module~fsml_ini module~fsml_utl->module~fsml_ini module~fsml_utl->module~fsml_con module~fsml_con->module~fsml_ini module~fsml_err->module~fsml_ini module~fsml_err->module~fsml_utl module~fsml_err->module~fsml_con

Interfaces

public interface fsml_anova_1way

The one-way ANOVA (Analysis of Variance) tests whether three or more population means are equal.

Hypotheses:

The null hypothesis and alternative hypothesis are defined as: : , and : At least one differs from the others.

Procedure:

The data is passed to the procedure as a rank-2 array x, where each column is a group of observations. The procedure partitions then the total variability in the data ( ) into variability between groups ( ; variability explained by groups ), and variability within groups ( ; unexplained or residual variability ), so that

The F-statistic (f) is the ratio of the mean sum of squares between groups to the mean sum of squares within groups: where is the number of groups, is the total number of observations, is the sum of squares between groups, and is the sum of squares within groups.

The degrees of freedom are between groups (df_b) and within groups (df_w).

The resulting p-value (p) is computed from the F-distribution:

It is computed with the elemental procedure f_dst_f_cdf_core.

The ANOVA makes the assumptions that a) the groups are independent, b) the observations within each group are normally distributed, and c) The variances within groups are equal.

  • public impure subroutine s_tst_anova_1w(x, f, df_b, df_w, p)

    Impure wrapper procedure for s_tst_anova_1w_core.

    Arguments

    Type IntentOptional Attributes Name
    real(kind=wp), intent(in) :: x(:,:)

    2D array, each column is a group

    real(kind=wp), intent(out) :: f

    F-statistic

    real(kind=wp), intent(out) :: df_b

    degrees of freedom between groups

    real(kind=wp), intent(out) :: df_w

    degrees of freedom within groups

    real(kind=wp), intent(out) :: p

    p-value from F distribution

public interface fsml_chi2_cdf

Cumulative distribution function for the chi-squared distribution.

  • public impure function f_dst_chi2_cdf(x, df, loc, scale, tail) result(p)

    Impure wrapper function for f_dst_chi2_cdf_core. Handles optional arguments and invalid values for arguments.

    Arguments

    Type IntentOptional Attributes Name
    real(kind=wp), intent(in) :: x

    sample position

    real(kind=wp), intent(in) :: df

    degrees of freedom

    real(kind=wp), intent(in), optional :: loc

    location parameter

    real(kind=wp), intent(in), optional :: scale

    scale parameter

    character(len=*), intent(in), optional :: tail

    tail options

    Return Value real(kind=wp)

    resulting CDF value

public interface fsml_chi2_pdf

Probability density function for the chi-squared distribution. Uses intrinsic exp and gamma function. where = degrees of freedom (df) and is the gamma function.

  • public impure function f_dst_chi2_pdf(x, df, loc, scale) result(fx)

    Impure wrapper function for f_dst_chi2_pdf_core. Handles optional arguments and invalid values for arguments.

    Arguments

    Type IntentOptional Attributes Name
    real(kind=wp), intent(in) :: x

    sample position

    real(kind=wp), intent(in) :: df

    degrees of freedom

    real(kind=wp), intent(in), optional :: loc

    location parameter

    real(kind=wp), intent(in), optional :: scale

    scale parameter

    Return Value real(kind=wp)

    resulting PDF value

public interface fsml_chi2_ppf

Percent point function/quantile function for the chi-squared distribution. Uses the bisection method for numerical inversion of the CDF.

  • public impure function f_dst_chi2_ppf(p, df, loc, scale) result(x)

    Impure wrapper function for f_dst_chi2_ppf_core. Handles optional arguments and invalid values for arguments.

    Arguments

    Type IntentOptional Attributes Name
    real(kind=wp), intent(in) :: p

    probability between 0.0 and 1.0

    real(kind=wp), intent(in) :: df

    degrees of freedom

    real(kind=wp), intent(in), optional :: loc

    location parameter

    real(kind=wp), intent(in), optional :: scale

    scale parameter

    Return Value real(kind=wp)

    sample position

public interface fsml_cov

Computes the population or sample covariance (depending on passed arguments). where is the size of (or number of observations in) vectors x and y, and are individual elements in x and y, (ddof) is a degrees of freedom adjustment (ddof = 0.0 for population variance, ddof = 1.0 for sample variance), and and are the arithmetic means of x and y.

Vectors x and y must be the same size.

  • public impure function f_sts_cov(x, y, ddf) result(cov)

    Impure wrapper function for f_sts_cov_core.

    Arguments

    Type IntentOptional Attributes Name
    real(kind=wp), intent(in) :: x(:)

    x vector (assumed size array)

    real(kind=wp), intent(in) :: y(:)

    y vector (assumed size array)

    real(kind=wp), intent(in), optional :: ddf

    delta degrees of freedom

    Return Value real(kind=wp)

    covariance

public interface fsml_eof

Empirical Orthogonal Function (EOF) analysis is a procedure to reduce the dimensionality of multivariate data by identifying a set of orthogonal vectors (EOFs or eigenvectores) that represent directions of maximum variance in the dataset. The term EOF analysis is often used interchangably with the geographically weighted principal component analysis (PCA). The procedures are mathematically equivalent, but procedures for EOF analysis offer some additional options that are mostly relevant for geoscience. The procedure fsml_pca is a wrapper for fsml_eof that offers a simpler, more familiar interface for non-geoscientists.

For a classic EOF analysis, the input matrix x holds data or observations that have been discretised in time and space. Rows (m) and columns (n) can therefore be interpreted as time and space dimensions, respectively. EOF analysis allows for geographical weighting, which translates to column-wise weighting prior to analysis in the procedure. Weights can be set by bassing the rank-1 array wt of dimension n. If this optional argument is not passed, the procedure will default to equal weights of value . It is numerically more stable than 1.0, which is the default for many implementations of a PCA.

After the weighting is applied, the covariance or correlation matrix is computed: where is the preprocessed (centred and optionally standardised) data matrix, and is the number of observations (rows in x). The value of the optional argument opt determines if the covariance matrix (opt = 0) or correlation matrix (opt = 1) is constructed. If the argument is not passed, the procedure will default to the use of the covariance matrix, as is the standard for a regular PCA.

A symmetric eigen-decomposition is then performed: where contains the EOFs (eof), and is a diagonal matrix of eigenvalues (ew).

The principal components or scores (PCs, pc) are given by: The number of valid EOF/PC modes is determined by the number of non-zero eigenvalues. Arrays are initialised to zero and populated only where eigenvalues are strictly positive.

The explained variance (r2) for each component is computed as a fraction: where is the PC index, and spans all retained eigenvalues, representing all principal components that explain variability in the data.

EOFs may optionally be scaled (eof_scaled) for more convenient plotting:

Note: This subroutine uses eigh from the stdlib_linalg module to compute eigenvalues and eigenvectors of the symmetric covariance matrix.

  • public subroutine s_lin_eof(x, nd, nv, pc, eof, ew, opt, wt, r2, eof_scaled)

    Empirical Orthogonal Function (EOF) analysis

    Arguments

    Type IntentOptional Attributes Name
    real(kind=wp), intent(in) :: x(nd,nv)

    input data

    integer(kind=i4), intent(in) :: nd

    number of rows

    integer(kind=i4), intent(in) :: nv

    number of columns

    real(kind=wp), intent(out) :: pc(nd,nv)

    principal components

    real(kind=wp), intent(out) :: eof(nv,nv)

    EOFs/eigenvectors (unweighted)

    real(kind=wp), intent(out) :: ew(nv)

    eigenvalues

    integer(kind=i4), intent(in), optional :: opt

    0 = covariance, 1 = correlation

    real(kind=wp), intent(in), optional :: wt(nv)

    optional weights (default = 1.0/n)

    real(kind=wp), intent(out), optional :: r2(nv)

    explained variance (fraction)

    real(kind=wp), intent(out), optional :: eof_scaled(nv,nv)

    EOFs/eigenvectors scaled for plotting

public interface fsml_exp_cdf

Cumulative distribution function for exponential distribution.

  • public impure function f_dst_exp_cdf(x, lambda, loc, tail) result(p)

    Impure wrapper function for f_dst_exp_cdf_core. Handles optional arguments and invalid values for arguments.

    Arguments

    Type IntentOptional Attributes Name
    real(kind=wp), intent(in) :: x

    sample position

    real(kind=wp), intent(in), optional :: lambda

    lambda parameter, beta(scale) = 1/lambda = mu/mean

    real(kind=wp), intent(in), optional :: loc

    location parameter

    character(len=*), intent(in), optional :: tail

    tail options

    Return Value real(kind=wp)

    returned probability integral

public interface fsml_exp_pdf

Probability density function for exponential distribution. Uses intrinsic exp function.

  • public impure function f_dst_exp_pdf(x, lambda, loc) result(fx)

    Impure wrapper function for f_dst_exp_pdf_core. Handles optional arguments and invalid values for arguments.

    Arguments

    Type IntentOptional Attributes Name
    real(kind=wp), intent(in) :: x

    sample position

    real(kind=wp), intent(in), optional :: lambda

    lambda parameter, beta(scale) = 1/lambda = mu/mean

    real(kind=wp), intent(in), optional :: loc

    location parameter

    Return Value real(kind=wp)

public interface fsml_exp_ppf

Percent point function/quantile function for exponential distribution. Procedure uses bisection method. p should be between 0.0 and 1.0.

  • public impure function f_dst_exp_ppf(p, lambda, loc) result(x)

    Impure wrapper function for f_dst_exp_ppf_core. Handles optional arguments and invalid values for arguments.

    Arguments

    Type IntentOptional Attributes Name
    real(kind=wp), intent(in) :: p

    probability between 0.0 - 1.0

    real(kind=wp), intent(in), optional :: lambda

    lambda parameter, beta(scale) = 1/lambda = mu/mean

    real(kind=wp), intent(in), optional :: loc

    location parameter

    Return Value real(kind=wp)

    sample position

public interface fsml_f_cdf

Cumulative density function for the F distribution.

  • public impure function f_dst_f_cdf(x, d1, d2, loc, scale, tail) result(p)

    Impure wrapper function for f_dst_f_cdf_core. Handles optional arguments and invalid values for arguments.

    Arguments

    Type IntentOptional Attributes Name
    real(kind=wp), intent(in) :: x

    sample position

    real(kind=wp), intent(in) :: d1

    numerator degrees of freedom

    real(kind=wp), intent(in) :: d2

    denominator degrees of freedom

    real(kind=wp), intent(in), optional :: loc

    location parameter

    real(kind=wp), intent(in), optional :: scale

    scale parameter

    character(len=*), intent(in), optional :: tail

    tail option

    Return Value real(kind=wp)

    output probability

public interface fsml_f_pdf

Probability density function for the F distribution. where = numerator degrees of freedom, = denominator degrees of freedom and is the complete beta function. (Uses intrinsic gamma function for beta.)

The F distribution is the distribution of , where and are are random variables with chi-square distributions with and degrees of freedom, respectively.

  • public impure function f_dst_f_pdf(x, d1, d2, loc, scale) result(fx)

    Impure wrapper function for f_dst_f_pdf_core. Handles optional arguments and invalid values for arguments.

    Arguments

    Type IntentOptional Attributes Name
    real(kind=wp), intent(in) :: x

    sample position

    real(kind=wp), intent(in) :: d1

    numerator degrees of freedom

    real(kind=wp), intent(in) :: d2

    denominator degrees of freedom

    real(kind=wp), intent(in), optional :: loc

    location parameter

    real(kind=wp), intent(in), optional :: scale

    scale parameter

    Return Value real(kind=wp)

public interface fsml_f_ppf

Percent point function / quantile function for the F distribution. Uses the bisection method to numerically invert the CDF.

  • public impure function f_dst_f_ppf(p, d1, d2, loc, scale) result(x)

    Impure wrapper function for f_dst_f_ppf_core. Handles optional arguments and invalid values for arguments.

    Arguments

    Type IntentOptional Attributes Name
    real(kind=wp), intent(in) :: p

    probability (0.0 < p < 1.0)

    real(kind=wp), intent(in) :: d1

    numerator degrees of freedom

    real(kind=wp), intent(in) :: d2

    denominator degrees of freedom

    real(kind=wp), intent(in), optional :: loc

    location parameter

    real(kind=wp), intent(in), optional :: scale

    scale parameter

    Return Value real(kind=wp)

public interface fsml_gamma_cdf

Cumulative distribution function for gamma distribution.

  • public impure function f_dst_gamma_cdf(x, alpha, beta, loc, tail) result(p)

    Impure wrapper function for f_dst_gamma_cdf_core. Handles optional arguments and invalid values for arguments.

    Arguments

    Type IntentOptional Attributes Name
    real(kind=wp), intent(in) :: x

    sample position

    real(kind=wp), intent(in), optional :: alpha

    shape parameter

    real(kind=wp), intent(in), optional :: beta

    scale parameter

    real(kind=wp), intent(in), optional :: loc

    location parameter

    character(len=*), intent(in), optional :: tail

    tail options

    Return Value real(kind=wp)

    returned probability integral

public interface fsml_gamma_pdf

Probability density function for gamma distribution. Uses intrinsic exp function.

  • public impure function f_dst_gamma_pdf(x, alpha, beta, loc) result(fx)

    Impure wrapper function for f_dst_gamma_pdf_core. Handles optional arguments and invalid values for arguments.

    Arguments

    Type IntentOptional Attributes Name
    real(kind=wp), intent(in) :: x

    sample position

    real(kind=wp), intent(in), optional :: alpha

    shape parameter

    real(kind=wp), intent(in), optional :: beta

    scale parameter

    real(kind=wp), intent(in), optional :: loc

    location parameter

    Return Value real(kind=wp)

public interface fsml_gamma_ppf

Percent point function/quantile function for gamma distribution. Procedure uses bisection method. p should be between 0.0 and 1.0.

  • public impure function f_dst_gamma_ppf(p, alpha, beta, loc) result(x)

    Impure wrapper function for f_dst_gamma_ppf_core. Handles optional arguments and invalid values for arguments.

    Arguments

    Type IntentOptional Attributes Name
    real(kind=wp), intent(in) :: p

    probability between 0.0 - 1.0

    real(kind=wp), intent(in), optional :: alpha

    shape parameter

    real(kind=wp), intent(in), optional :: beta

    scale parameter

    real(kind=wp), intent(in), optional :: loc

    location parameter

    Return Value real(kind=wp)

    sample position

public interface fsml_gpd_cdf

Cumulative distribution function for generalised pareto distribution.

  • public impure function f_dst_gpd_cdf(x, xi, mu, sigma, tail) result(p)

    Impure wrapper function for f_dst_gpd_cdf_core. Handles optional arguments and invalid values for arguments.

    Arguments

    Type IntentOptional Attributes Name
    real(kind=wp), intent(in) :: x

    sample position

    real(kind=wp), intent(in) :: xi

    distribution shape parameter

    real(kind=wp), intent(in), optional :: mu

    distribution location

    real(kind=wp), intent(in), optional :: sigma

    distribution dispersion/scale (must be positive)

    character(len=*), intent(in), optional :: tail

    tail options

    Return Value real(kind=wp)

    returned probability integral

public interface fsml_gpd_pdf

Probability density function for generalised pareto distribution. where is a shape parameter (xi), is the scale parameter (sigma), (mu) is the location (not mean).

  • public impure function f_dst_gpd_pdf(x, xi, mu, sigma) result(fx)

    Impure wrapper function for f_dst_gpd_pdf_core. Handles optional arguments and invalid values for arguments.

    Arguments

    Type IntentOptional Attributes Name
    real(kind=wp), intent(in) :: x

    sample position

    real(kind=wp), intent(in) :: xi

    distribution shape parameter

    real(kind=wp), intent(in), optional :: mu

    distribution location

    real(kind=wp), intent(in), optional :: sigma

    distribution dispersion/scale (must be positive)

    Return Value real(kind=wp)

public interface fsml_gpd_ppf

Percent point function/quantile function for generalised pareto distribution. Procedure uses bisection method. p must be between 0.0 and 1.0.

  • public impure function f_dst_gpd_ppf(p, xi, mu, sigma) result(x)

    Impure wrapper function for f_dst_gpd_ppf_core. Handles optional arguments and invalid values for arguments.

    Arguments

    Type IntentOptional Attributes Name
    real(kind=wp), intent(in) :: p

    probability between 0.0 - 1.0

    real(kind=wp), intent(in), optional :: xi

    distribution shape parameter

    real(kind=wp), intent(in), optional :: mu

    distribution location

    real(kind=wp), intent(in), optional :: sigma

    distribution dispersion/scale (must be positive)

    Return Value real(kind=wp)

    sample position

public interface fsml_hclust

The procedure is an implementation of the agglomerative hierarchical clustering method that groups data points into clusters by iteratively merging the most similar clusters. The procedure uses centroid linkage and the Mahalanobis distance as a measure of similarity.

The input matrix (x) holds observations in rows (nd) and variables in columns (nv). The target number of clusters (nc) must be at least 1 and not greater than the number of data points.

The variables are standardised before computing the covariance matrix on the transformed data. The matrix is used for calculating the Mahalanobis distance.

Clusters are merged iteratively until the target number of clusters is reached. The global mean (gm), cluster centroids (cm), membership assignments (cl), and cluster sizes (cc), the covariance matrix (cov) and standard deviations (sigma) used in the distance calculations are returned.

  • public impure subroutine s_nlp_hclust(x, nd, nv, nc, gm, cm, cl, cc, cov, sigma)

    Impure wrapper procedure for s_nlp_hclust_core.

    Arguments

    Type IntentOptional Attributes Name
    real(kind=wp), intent(in) :: x(nd,nv)

    input data matrix (samples, variables)

    integer(kind=i4), intent(in) :: nd

    number of data points

    integer(kind=i4), intent(in) :: nv

    number of variables

    integer(kind=i4), intent(in) :: nc

    number of clusters (target)

    real(kind=wp), intent(out) :: gm(nv)

    global means for each variable

    real(kind=wp), intent(out) :: cm(nv,nc)

    cluster centroids

    integer(kind=i4), intent(out) :: cl(nd)

    cluster assignments for each data point

    integer(kind=i4), intent(out) :: cc(nc)

    cluster sizes

    real(kind=wp), intent(out) :: cov(nv,nv)

    covariance matrix

    real(kind=wp), intent(out) :: sigma(nv)

    standard deviation per variable

public interface fsml_hkmeans

The procedure implements a hybrid clustering approach combining agglomerative hierarchical clustering and k-means clustering, both using the Mahalanobis distance as the similarity measure. The hierarchical step first partitions the data into nc clusters by iteratively merging the most similar clusters. The resulting centroids from are then used as initial centroids (cm_in) for the k-means procedure, which refines them iteratively.

The input matrix (x) holds observations in rows (nd) and variables in columns (nv). The number of clusters (nc) must be at least 1 and not greater than the number of data points. In the hierarchical clustering step, variables are standardised before computing the covariance matrix on the transformed data. The covariance matrix is passed to the k-means clustering procedure along with the initial cluster centroids. The k-means clustering step then assigns each observation to the nearest centroid, recomputes centroids from cluster memberships, and iterates until convergence or the iteration limit is reached. Final centroids are sorted by the first variable, and assignments are updated accordingly.

The global mean (gm), final cluster centroids (cm), membership assignments (cl), and cluster sizes (cc), the covariance matrix (cov) and standard deviations (sigma) used in the distance calculations are returned.

Note: This procedure uses the pure procedure for calculating the Mahalanobis distance f_lin_mahalanobis_core, which useschol from the stdlib_linalg module.

  • public impure subroutine s_nlp_hkmeans(x, nd, nv, nc, gm, cm, cl, cc, cov, sigma)

    Impure wrapper procedure for s_nlp_hkmeans_core.

    Arguments

    Type IntentOptional Attributes Name
    real(kind=wp), intent(in) :: x(nd,nv)

    input data matrix (samples, variables)

    integer(kind=i4), intent(in) :: nd

    number of data points

    integer(kind=i4), intent(in) :: nv

    number of variables

    integer(kind=i4), intent(in) :: nc

    number of clusters (target)

    real(kind=wp), intent(out) :: gm(nv)

    global means for each variable

    real(kind=wp), intent(out) :: cm(nv,nc)

    cluster centroids

    integer(kind=i4), intent(out) :: cl(nd)

    cluster assignments for each data point

    integer(kind=i4), intent(out) :: cc(nc)

    cluster sizes

    real(kind=wp), intent(out) :: cov(nv,nv)

    covariance matrix

    real(kind=wp), intent(out) :: sigma(nv)

    standard deviation per variable

public interface fsml_kmeans

The procedure implements the K-means clustering algorithm using the Mahalanobis distance as the similarity measure. It accepts initial centroids (cm_in), refines them iteratively, and returns the final centroids (cm).

The input matrix (x) holds observations in rows (nd) and variables in columns (nv). The number of clusters (nc) must be at least 1 and not greater than the number of data points. The procedure assigns each observation to the nearest centroid using the Mahalanobis distance, recomputes centroids from cluster memberships, and iterates until convergence or the iteration limit is reached. Final centroids are sorted by the first variable, and assignments are updated accordingly.

If the covariance matrix (cov_in) is passed, it will be used to calculate the Mahalanobis distance. If it is not passed, the variables are standardised before computing the covariance matrix on the transformed data.

The global mean (gm), cluster centroids (cm), membership assignments (cl), and cluster sizes (cc), the covariance matrix (cov - either cov_in or internally calculated) and standard deviations (sigma) used in the distance calculations are returned.

  • public impure subroutine s_nlp_kmeans(x, nd, nv, nc, cm_in, gm, cm, cl, cc, cov, sigma, cov_in)

    Impure wrapper procedure for s_nlp_kmeans_core.

    Arguments

    Type IntentOptional Attributes Name
    real(kind=wp), intent(in) :: x(nd,nv)

    raw data (samples, variables)

    integer(kind=i4), intent(in) :: nd

    number of data points

    integer(kind=i4), intent(in) :: nv

    number of variables

    integer(kind=i4), intent(in) :: nc

    number of clusters

    real(kind=wp), intent(in) :: cm_in(nv,nc)

    initial centroids (raw, not standardised)

    real(kind=wp), intent(out) :: gm(nv)

    global means

    real(kind=wp), intent(out) :: cm(nv,nc)

    centroids (refined, standardised)

    integer(kind=i4), intent(out) :: cl(nd)

    cluster assignments

    integer(kind=i4), intent(out) :: cc(nc)

    cluster sizes

    real(kind=wp), intent(out) :: cov(nv,nv)

    covariance matrix

    real(kind=wp), intent(out) :: sigma(nv)

    standard deviations per variable

    real(kind=wp), intent(in), optional :: cov_in(nv,nv)

    optional covariance matrix

public interface fsml_kruskalwallis

The Kruskal-Wallis H-test is used to determine whether samples originate from the same distribution without assuming normality. It is therefore considered a nonparametric alternative to the one-way ANOVA (Analysis of Variance).

Hypotheses:

The null hypothesis and alternative hypothesis are defined as: : The populations have the same distribution (medians are equal), and : At least one population differs from the others.

Procedure:

The data is passed to the procedure as a rank-2 array x, where each column is a group of observations. All values are ranked across the entire dataset, with tied values assigned the average rank.

The Kruskal-Wallis H-statistic (h) is computed as: where: - is the total number of observations, - is the number of groups, - is the number of observations in group , and - is the sum of ranks in group .

The degrees of freedom are: and returned as df.

The p-value (p) is computed from the chi-squared distribution:

It is computed using the elemental procedure f_dst_chi2_cdf_core.

The Kruskal-Wallis test assumes that: a) all groups are independent, b) the response variable is ordinal or continuous, c) the group distributions have the same shape, and d) observations are independent both within and between groups.

  • public impure subroutine s_tst_kruskalwallis(x, h, df, p)

    Impure wrapper procedure for s_tst_kruskalwallis_core.

    Arguments

    Type IntentOptional Attributes Name
    real(kind=wp), intent(in) :: x(:,:)

    2D array, each column is a group

    real(kind=wp), intent(out) :: h

    Kruskal-Wallis H-statistic

    real(kind=wp), intent(out) :: df

    degrees of freedom (k - 1)

    real(kind=wp), intent(out) :: p

    p-value from chi-squared distribution

public interface fsml_lda_2class

interface fsml_lda_2class The 2-class multivariate Linear Discriminant Analysis (LDA) is a statistical procedure for classification and the investigation and explanation of differences between two groups (or classes) with regard to their attribute variables. It quantifies the discriminability of the groups and the contribution of each of the attribute variables to this discriminability.

The procedure finds a discriminant function that best separates the two groups. The function can be expressed as a linear combination of the attribute variables:

where is the discriminant function, are the attribute variables used in evaluating the differences between the groups, are the discriminant coefficients associated with each variable, (nv) is the number of variables, and is the y-intercept. (Note: Mathematically, it is analogous to a multivariate linear regression function.)

Each attribute variable contains elements (x), where (nd) is the number of elements in each group. Each element is associated with a discriminant value described by:

Geometrically, this can be visualised as elements being projected on the discriminant axis . The optimal discriminant function is then determined by finding an axis, on which the projected elements for the two groups are best separated. The best separation is given by maximising the discriminant criterion (g), a signal to noise ratio, so that:

where and are the number of elements in groups and , respectively. The procedure assumes that these are the same (nd) and only accepts 2 groups (nc = 2).

The discriminant coefficients are then standardised (sa) using the standard deviations of respective variables. The discriminant function represents a model that best seperates the groups and can be used as a classification model. The skill of that model is determined by forgetting the association of each element with the groups and using the model to reclassify the elements. The score (score) is the fraction of correct classifications and can be interpreted as a measure of how well the function works as a classification model.

The procedure optionally returns the Mahalanobis distance (mh) as a measure of distance between the groups.

Note: This subroutine uses eigh from the stdlib_linalg module.

  • public subroutine s_lin_lda_2c(x, nd, nv, nc, sa, g, score, mh)

    2-class multivariate Linear Discriminant Analysis (LDA)

    Performs classification and returns: - Standardised discriminant coefficients - Reclassification accuracy - Mahalanobis distance - Discriminant criterion

    Arguments

    Type IntentOptional Attributes Name
    real(kind=wp), intent(in) :: x(nd,nv,nc)

    input data (nd samples × nv variables × nc classes)

    integer(kind=i4), intent(in) :: nd

    number of datapoints per class

    integer(kind=i4), intent(in) :: nv

    number of variables

    integer(kind=i4), intent(in) :: nc

    number of classes (must be 2)

    real(kind=wp), intent(out) :: sa(nv)

    standardised discriminant coefficients

    real(kind=wp), intent(out) :: g

    discriminant criterion

    real(kind=wp), intent(out) :: score

    classification score

    real(kind=wp), intent(out), optional :: mh

    Mahalanobis distance

public interface fsml_mahalanobis

Computes the Mahalanobis distance between two input feature vectors x and y. If a covariance matrix cov is provided, it is used directly in the calculation. Otherwise, the procedure estimates the covariance matrix from the two-sample dataset formed by x and y. A Cholesky-based solver is used to perform the distance calculation.

The Mahalanobis distance is defined as:

where is the covariance matrix. The inverse is applied via the Cholesky decomposition for numerical stability.

Note: If passed, the covariance matrix (cov) must be positive definite for the factorisation to succeed.

  • public impure function f_lin_mahalanobis(x, y, cov) result(dist)

    Impure wrapper function for f_lin_mahalanobis_core.

    Arguments

    Type IntentOptional Attributes Name
    real(kind=wp), intent(in) :: x(:)

    input vector 1

    real(kind=wp), intent(in) :: y(:)

    input vector 2

    real(kind=wp), intent(in), optional :: cov(:,:)

    covariance matrix

    Return Value real(kind=wp)

    Mahalanobis distance

public interface fsml_mean

Computes arithmetic mean. where is the size of (or number of observations in) vector x, are individual elements in x, and is the arithmetic mean of x.

  • public impure function f_sts_mean(x) result(mean)

    Impure wrapper function for f_sts_mean_core.

    Arguments

    Type IntentOptional Attributes Name
    real(kind=wp), intent(in) :: x(:)

    x vector (assumed size array)

    Return Value real(kind=wp)

    arithmetic mean

public interface fsml_median

Computes median of vector x. The procedures can handle tied ranks.

  • public impure function f_sts_median(x) result(median)

    Impure wrapper function for f_sts_median_core.

    Arguments

    Type IntentOptional Attributes Name
    real(kind=wp), intent(in) :: x(:)

    x vector (assumed size array)

    Return Value real(kind=wp)

    median

public interface fsml_norm_cdf

Cumulative distribution function for normal distribution.

The location parameter (mu) is an optional argument and will default to 0.0 if not passed. The scale parameter (sigma) is an optional argument. If passed, it must be non-zero positive. It will default to 1.0 if not passed. The tail option (tail) is an optional argument. If passed, it must be one of the following: "left", "right", "two", or "confidence". If not passed, it will default to "left".

  • public impure function f_dst_norm_cdf(x, mu, sigma, tail) result(p)

    Impure wrapper function for f_dst_norm_cdf_core. Handles optional arguments and invalid values for arguments.

    Arguments

    Type IntentOptional Attributes Name
    real(kind=wp), intent(in) :: x

    sample position

    real(kind=wp), intent(in), optional :: mu

    distribution location (mean)

    real(kind=wp), intent(in), optional :: sigma

    distribution dispersion/scale (standard deviation)

    character(len=*), intent(in), optional :: tail

    tail options

    Return Value real(kind=wp)

    returned probability integral

public interface fsml_norm_pdf

Probability density function for normal distribution.

The location parameter (mu) is an optional argument and will default to 0.0 if not passed. The scale parameter (sigma) is an optional argument. If passed, it must be non-zero positive. It will default to 1.0 if not passed.

  • public impure function f_dst_norm_pdf(x, mu, sigma) result(fx)

    Impure wrapper function for f_dst_norm_pdf_core. Handles optional arguments and invalid values for arguments.

    Arguments

    Type IntentOptional Attributes Name
    real(kind=wp), intent(in) :: x

    sample position

    real(kind=wp), intent(in), optional :: mu

    distribution location (mean)

    real(kind=wp), intent(in), optional :: sigma

    distribution dispersion/scale (standard deviation)

    Return Value real(kind=wp)

public interface fsml_norm_ppf

Percent point function/quantile function for normal distribution.

The probability (p)must be between 0.0 and 1.0. The location parameter (mu) is an optional argument and will default to 0.0 if not passed. The scale parameter (sigma) is an optional argument. If passed, it must be non-zero positive. It will default to 1.0 if not passed.

The procedure uses bisection method. Conditions p=0.0 and p=1.0 cannot return negative and positive infinity; will return large negative or positive numbers (highly dependent on the tolerance threshold).

  • public impure function f_dst_norm_ppf(p, mu, sigma) result(x)

    Impure wrapper function for f_dst_norm_ppf_core. Handles optional arguments and invalid values for arguments.

    Arguments

    Type IntentOptional Attributes Name
    real(kind=wp), intent(in) :: p

    probability between 0.0 - 1.0

    real(kind=wp), intent(in), optional :: mu

    distribution location (mean)

    real(kind=wp), intent(in), optional :: sigma

    distribution dispersion/scale (standard deviation)

    Return Value real(kind=wp)

    sample position

public interface fsml_ols

The multiple linear Ordinary Least Squares (OLS) regression models the relationship or linear dependence between a dependent (predictand) variable and and one or more independent (predictor) variables. The procedure estimates the linear regression coefficients by minimising the sum of squared residuals.

The estimated regression model is of the form:

where is the predictand variable, are the predictor variables (x) with nd observations, is the y-intercept (b0), (b) are the regression coefficients, and (nv) is the number of predictors (excluding the intercept).

The subroutine constructs a full matrix internally by prepending a column of ones to account for the intercept. The regression coefficients are estimated as:

where is the extended design matrix including the intercept term.

The coefficient of determination (r2) which represents the proportion of the total variance of (y) explained by the predictors. The predicted values (y_hat), standard errors (se) of the coefficients, and the covariance matrix of the predictors (cov_b) can optionally be returned by the procedure, too.

Note: This subroutine uses eigh from the stdlib_linalg module. Note: The intercept and predictor coefficients are computed separately and returned explicitly.

  • public subroutine s_lin_ols(x, y, nd, nv, b0, b, r2, y_hat, se, cov_b)

    Multiple Linear Ordinary Least Squares (OLS) Regression with intercept. NOTE: OLS could be wrapper for ridge (with lambda = 0 or presence checks if mande an optional argument). However, it would increase computation slightly and make code less readable. OLS is often used in teaching and therefore, an easily readable standalone is kept.

    Computes: - Intercept b0 (scalar) - Predictor coefficients b(nv) - Coefficient of determination R² - Standard errors se(nv) - Covariance matrix of predictors covb(nv,nv)

    Arguments

    Type IntentOptional Attributes Name
    real(kind=wp), intent(in) :: x(nd,nv)

    predictor data matrix (no intercept column)

    real(kind=wp), intent(in) :: y(nd)

    response vector

    integer(kind=i4), intent(in) :: nd

    number of datapoints

    integer(kind=i4), intent(in) :: nv

    number of predictors (excluding intercept)

    real(kind=wp), intent(out) :: b0

    intercept coefficient

    real(kind=wp), intent(out) :: b(nv)

    predictor coefficients

    real(kind=wp), intent(out) :: r2

    coefficient of determination R²

    real(kind=wp), intent(out), optional :: y_hat(nd)

    predicted y values

    real(kind=wp), intent(out), optional :: se(nv)

    standard errors of predictor coefficients

    real(kind=wp), intent(out), optional :: cov_b(nv,nv)

    covariance matrix of predictor coefficients

public interface fsml_pca

Principal Component Analysis (PCA) is a procedure to reduce the dimensionality of multivariate data by identifying a set of orthogonal vectors (eigenvectores) that represent directions of maximum variance in the dataset.

The procedure fsml_pca is a wrapper for fsml_eof and offers a simpler, more familiar interface for non-geoscientists. The EOF interface allows for more options to be passed that are irrelevant to standard applications of PCA. The PCA procedure calls the EOF procedures with weights (wt) set to 1.0, and matrix options set to opt = 0 to force the use of the covariance matrix to be comparable to other common implementations of a PCA (e.g., sklearn).

The covariance matrix is computed as: where is the preprocessed (centred and optionally standardised) data matrix, and is the number of observations (rows in x).

A symmetric eigen-decomposition is then performed: where contains the EOFs (ev), and is a diagonal matrix of eigenvalues (ew).

The principal components or scores (PCs, pc) are given by: The number of valid PC modes is determined by the number of non-zero eigenvalues. Arrays are initialised to zero and populated only where eigenvalues are strictly positive.

The explained variance (r2) for each component is computed as a fraction: where is the PC index, and spans all retained eigenvalues, representing all principal components that explain variability in the data.

Note: This subroutine uses eigh from the stdlib_linalg module to compute eigenvalues and eigenvectors of the symmetric covariance matrix.

  • public subroutine s_lin_pca(x, nd, nv, pc, ev, ew, r2)

    Principal Component Analysis (PCA). It is a special (simplified) case of EOF analysis offered as a separate procedure for clarity/familiarity. It calls s_lin_eof with equal weights.

    Arguments

    Type IntentOptional Attributes Name
    real(kind=wp), intent(in) :: x(nd,nv)

    input data

    integer(kind=i4), intent(in) :: nd

    number of rows

    integer(kind=i4), intent(in) :: nv

    number of columns

    real(kind=wp), intent(out) :: pc(nd,nv)

    principal components

    real(kind=wp), intent(out) :: ev(nv,nv)

    eigenvectors (unweighted)

    real(kind=wp), intent(out) :: ew(nv)

    eigenvalues

    real(kind=wp), intent(out), optional :: r2(nv)

    explained variance (fraction)

public interface fsml_pcc

Computes Pearson correlation coefficient (PCC). where is the Pearson correlation coefficient for vectors x and y, is the covariance of x and y, and and are the standard deviations of x and y.

Vectors x and y must be the same size.

  • public impure function f_sts_pcc(x, y) result(corr)

    Impure wrapper function for f_sts_trend_core.

    Arguments

    Type IntentOptional Attributes Name
    real(kind=wp), intent(in) :: x(:)

    x vector (assumed size array)

    real(kind=wp), intent(in) :: y(:)

    y vector (assumed size array)

    Return Value real(kind=wp)

    Pearson correlation coefficient

public interface fsml_rank

Ranks all samples such that the smallest value obtains rank 1 and the largest rank n. Handles tied ranks and assigns average rank to tied elements within one group of tied elements.

  • public pure subroutine s_utl_rank(x, ranks)

    Ranks all samples such that the smallest value obtains rank 1 and the largest rank n. Handles tied ranks and assigns average rank to tied elements within one group of tied elements.

    Arguments

    Type IntentOptional Attributes Name
    real(kind=wp), intent(in) :: x(:)

    x array

    real(kind=wp), intent(out), allocatable :: ranks(:)

    ranks of x

public interface fsml_ranksum

The ranks sum test (Wilcoxon rank-sum test or Mann–Whitney U test) is a non-parametric test to determine if two independent samples and are have the same distribution. It can be regarded as the non-parametric equivalent of the 2-sample t-test.

Hypotheses:

The null hypothesis and alternative hypothesis can be written as: : the distributions of and are equal. : the distributions of and are not equal.

Procedure:

The Mann–Whitney U statistic is calculated for each sample as follows: where is the sum of ranks of sample set and is the sample size of sample set . The final U statistic is:

The procedure takes into consideration tied ranks.

  • public impure subroutine s_tst_ranksum(x1, x2, u, p, h1)

    Impure wrapper procedure for s_tst_ranksum_core.

    Arguments

    Type IntentOptional Attributes Name
    real(kind=wp), intent(in) :: x1(:)

    x1 vector (samples)

    real(kind=wp), intent(in) :: x2(:)

    x2 vector (samples)

    real(kind=wp), intent(out) :: u

    U statistic

    real(kind=wp), intent(out) :: p

    p-value

    character(len=*), intent(in), optional :: h1

    option: "two" (default), "lt", or "gt"

public interface fsml_read_csv

Read CSV file directly into dataframe.

  • public subroutine s_dat_read_csv(infile, df, labelcol, labelrow, delimiter)

    Read CSV file directly into dataframe.

    Arguments

    Type IntentOptional Attributes Name
    character(len=*), intent(in) :: infile

    read csv file

    type(fsml_typ_df), intent(inout) :: df

    dataframe

    logical, intent(in), optional :: labelcol

    true if first column contains row labels

    logical, intent(in), optional :: labelrow

    true if first row contains column lavels

    character(len=1), intent(in), optional :: delimiter

    single char delimiter

public interface fsml_ridge

The multiple linear Ridge regression models the relationship or linear dependence between a dependent (predictand) variable and one or more independent (predictor) variables, incorporating a penalty term on the size of the regression coefficients to reduce multicollinearity and overfitting.

The procedure estimates the linear regression coefficients by minimising the sum of squared residuals plus a penalty proportional to the square of the magnitude of coefficients:

where (lambda) is the ridge penalty parameter, and is the identity matrix with the first diagonal element corresponding to the intercept set to zero (no penalty on intercept).

The estimated regression model is of the form:

where is the predictand variable, are the predictor variables (x) with nd observations, is the y-intercept (b0), (b) are the ridge regression coefficients, and (nv) is the number of predictors (excluding the intercept).

The subroutine constructs a full matrix internally by prepending a column of ones to account for the intercept. The coefficient of determination (r2), predicted values (y_hat), ridge-adjusted standard errors (se) of the coefficients, and the ridge-adjusted covariance matrix of the predictors (cov_b) can optionally be returned. The covariance matrix and standard errors are adjusted for the ridge penalty as:

where is the residual variance estimate.

Note: This subroutine uses eigh from the stdlib_linalg module.

  • public subroutine s_lin_ridge(x, y, nd, nv, lambda, b0, b, r2, y_hat, se, cov_b)

    Multiple Linear Ridge Regression (λ >= 0) with intercept.

    Computes: - Intercept b0 (scalar) - Predictor coefficients b(nv) - Coefficient of determination R² - Standard errors se(nv) (ridge-adjusted) - Covariance matrix of predictors covb(nv,nv) (ridge-adjusted)

    Notes: - When lambda (λ) = 0, this reduces to ordinary least squares (OLS). - Ridge modifies the variance-covariance formula: cov(β) = σ² (XᵀX + λI)⁻¹ XᵀX (XᵀX + λI)⁻¹ This shrinks coefficients and affects SEs.

    Arguments

    Type IntentOptional Attributes Name
    real(kind=wp), intent(in) :: x(nd,nv)

    predictor data matrix (no intercept column)

    real(kind=wp), intent(in) :: y(nd)

    response vector

    integer(kind=i4), intent(in) :: nd

    number of datapoints

    integer(kind=i4), intent(in) :: nv

    number of predictors (excluding intercept)

    real(kind=wp), intent(in) :: lambda

    ridge penalty parameter (≥ 0, non-optional)

    real(kind=wp), intent(out) :: b0

    intercept coefficient

    real(kind=wp), intent(out) :: b(nv)

    predictor coefficients

    real(kind=wp), intent(out) :: r2

    coefficient of determination R²

    real(kind=wp), intent(out), optional :: y_hat(nd)

    predicted y values

    real(kind=wp), intent(out), optional :: se(nv)

    standard errors of predictor coefficients

    real(kind=wp), intent(out), optional :: cov_b(nv,nv)

    covariance matrix of predictor coefficients

public interface fsml_scc

Computes the Spearman rank correlation coefficient (SCC). The procedure gets the ranks of cectors x and y, then calculates the Pearson correlation coefficient on these ranks.

Vectors x and y must be the same size.

  • public impure function f_sts_scc(x, y) result(corr)

    Impure wrapper for f_sts_scc_core.

    Arguments

    Type IntentOptional Attributes Name
    real(kind=wp), intent(in) :: x(:)

    x vector (assumed size array)

    real(kind=wp), intent(in) :: y(:)

    y vector (assumed size array)

    Return Value real(kind=wp)

    Spearman correlation coefficient

public interface fsml_signedrank_1sample

The 1-sample Wilcoxon signed rank test is a non-parametric test that determines if data comes from a symmetric population with centre . It can be regarded as a non-parametric version of the 1-sample t-test.

Hypotheses:

If the data consists of independent and similarly distributed samples from distribution , the null hypothesis can be expressed as:

is symmetric around .

The default alternative hypothesis is two-sided and also be set explicitly (h1 = "two"). It can be expressed as:

is symmetric around

If the alternative hypothesis is set to "greater than" (h1 = "gt"), it is:

is symmetric around

If the alternative hypothesis is set to "less than" (h1 = "lt"), it is:

is symmetric around

Procedure:

The test statistic is the smaller of the sum of positive and negative signed ranks:

The procedure takes into consideration tied ranks.

  • public impure subroutine s_tst_signedrank_1s(x, mu0, w, p, h1)

    Impure wrapper procedure for s_tst_signedrank_1s_core.

    Arguments

    Type IntentOptional Attributes Name
    real(kind=wp), intent(in) :: x(:)

    x vector (samples)

    real(kind=wp), intent(in) :: mu0

    population mean (null hypothesis expected value)

    real(kind=wp), intent(out) :: w

    W statistic (sum of signed ranks)

    real(kind=wp), intent(out) :: p

    p-value

    character(len=*), intent(in), optional :: h1

    : "two" (default), "lt", "gt"

public interface fsml_signedrank_paired

The Wilcoxon signed rank test is a non-parametric test that determines if two related paired samples come from the same distribution. It can be regarded as a non-parametric version of the paired t-test.

Hypotheses:

The Wilcoxon signed rank test is mathematically equivalent to the 1-sample Wilcoxon signed rank test conducted on the difference vector with set to zero. Consequently, the the null hypothesis can be expressed as:

Samples are symmetric around .

The default alternative hypothesis is two-sided and also be set explicitly (h1 = "two"). It can be expressed as:

Samples are symmetric around

If the alternative hypothesis is set to "greater than" (h1 = "gt"), it is:

Samples are symmetric around

If the alternative hypothesis is set to "less than" (h1 = "lt"), it is:

Samples are symmetric around

The procedure takes into consideration tied ranks.

  • public impure subroutine s_tst_signedrank_2s(x1, x2, w, p, h1)

    Impure wrapper procedure for s_tst_signedrank_2s_core.

    Arguments

    Type IntentOptional Attributes Name
    real(kind=wp), intent(in) :: x1(:)

    sample 1 (paired data)

    real(kind=wp), intent(in) :: x2(:)

    sample 2 (paired data)

    real(kind=wp), intent(out) :: w

    W statistic (sum of signed ranks)

    real(kind=wp), intent(out) :: p

    p-value

    character(len=*), intent(in), optional :: h1

    : "two" (default), "lt", "gt"

public interface fsml_std

Computes the population or sample standard deviation (depending on passed arguments). where is the variance of vector x. (ddof) can also be passed and serves as a degrees of freedom adjustment when the variance is caulculated. (ddof = 0.0 for population standard deviation, ddof = 1.0 for sample standard deviation)

  • public impure function f_sts_std(x, ddf) result(std)

    Impure wrapper function for f_sts_std_core.

    Arguments

    Type IntentOptional Attributes Name
    real(kind=wp), intent(in) :: x(:)

    x vector (assumed size array)

    real(kind=wp), intent(in), optional :: ddf

    delta degrees of freedom

    Return Value real(kind=wp)

    standard deviation

public interface fsml_t_cdf

Cumulative distribution function for student t distribution.

The value for degrees of freedom (df) must be 1.0 or higher. The location parameter (mu) is an optional argument and will default to 0.0 if not passed. The scale parameter (sigma) is an optional argument. If passed, it must be non-zero positive. It will default to 1.0 if not passed. The tail option (tail) is an optional argument. If passed, it must be one of the following: "left", "right", "two", or "confidence". If not passed, it will default to "left".

  • public impure function f_dst_t_cdf(x, df, mu, sigma, tail) result(p)

    Impure wrapper function for f_dst_t_cdf_core. Handles optional arguments and invalid values for arguments.

    Arguments

    Type IntentOptional Attributes Name
    real(kind=wp), intent(in) :: x

    sample position

    real(kind=wp), intent(in) :: df

    degrees of freedom

    real(kind=wp), intent(in), optional :: mu

    distribution location (mean)

    real(kind=wp), intent(in), optional :: sigma

    distribution dispersion/scale (standard deviation)

    character(len=*), intent(in), optional :: tail

    tail options

    Return Value real(kind=wp)

    returned probability integral

public interface fsml_t_pdf

Probability density function for student t distribution. Uses intrinsic gamma function (Fortran 2008 and later). where = degrees of freedom (df) and is the gamma function.

The value for degrees of freedom (df) must be 1.0 or higher. The location parameter (mu) is an optional argument and will default to 0.0 if not passed. The scale parameter (sigma) is an optional argument. If passed, it must be non-zero positive. It will default to 1.0 if not passed.

  • public impure function f_dst_t_pdf(x, df, mu, sigma) result(fx)

    Impure wrapper function for f_dst_t_pdf_core. Handles optional arguments and invalid values for arguments.

    Arguments

    Type IntentOptional Attributes Name
    real(kind=wp), intent(in) :: x

    sample position

    real(kind=wp), intent(in) :: df

    degrees of freedom

    real(kind=wp), intent(in), optional :: mu

    distribution location (~mean)

    real(kind=wp), intent(in), optional :: sigma

    distribution dispersion/scale (~standard deviation)

    Return Value real(kind=wp)

public interface fsml_t_ppf

Percent point function/quantile function for t distribution.

Procedure uses bisection method. Conditions p=0.0 and p=1.0 cannot return negative and positive infinity; will return large negative or positive numbers (highly dependent on the tolerance threshold).

The value for degrees of freedom (df) must be 1.0 or higher. The location parameter (mu) is an optional argument and will default to 0.0 if not passed. The scale parameter (sigma) is an optional argument. If passed, it must be non-zero positive. It will default to 1.0 if not passed.

  • public impure function f_dst_t_ppf(p, df, mu, sigma) result(x)

    Impure wrapper function for f_dst_t_ppf_core. Handles optional arguments and invalid values for arguments.

    Arguments

    Type IntentOptional Attributes Name
    real(kind=wp), intent(in) :: p

    probability between 0.0 - 1.0

    real(kind=wp), intent(in) :: df

    degrees of freedom

    real(kind=wp), intent(in), optional :: mu

    distribution location (mean)

    real(kind=wp), intent(in), optional :: sigma

    distribution dispersion/scale (standard deviation)

    Return Value real(kind=wp)

    sample position

public interface fsml_trend

Computes regression coefficient/trend. where is the slope of the regression line (linear trend), is the covariance of x and y, and is the variance of x.

Vectors x and y must be the same size.

  • public impure function f_sts_trend(x, y) result(trend)

    Impure wrapper function for f_sts_trend_core.

    Arguments

    Type IntentOptional Attributes Name
    real(kind=wp), intent(in) :: x(:)

    x vector (assumed size array)

    real(kind=wp), intent(in) :: y(:)

    y vector (assumed size array)

    Return Value real(kind=wp)

    trend/regression slope

public interface fsml_ttest_1sample

The 1-sample t-test determines if the sample mean has the value specified in the null hypothesis.

Hypotheses:

The null hypothesis and alternative hypothesis can be written as: : , and :

Procedure:

The test statstic is calculated as follows: where is the sample mean, is the sample standard deviation, is the sample size, and is the population mean.

The degrees of freedom is calculated as follows:

  • public impure subroutine s_tst_ttest_1s(x, mu0, t, df, p, h1)

    Impure wrapper procedure for s_tst_ttest_1s_core.

    Arguments

    Type IntentOptional Attributes Name
    real(kind=wp), intent(in) :: x(:)

    x vector (samples)

    real(kind=wp), intent(in) :: mu0

    population mean (null hypothesis expected value)

    real(kind=wp), intent(out) :: t

    test statistic

    real(kind=wp), intent(out) :: df

    degrees of freedom

    real(kind=wp), intent(out) :: p

    p-value

    character(len=*), intent(in), optional :: h1

    option: two (default), le, ge

public interface fsml_ttest_2sample

The 2-sample t-test determines if two population means and are the same. The procedure can handle 2-sample t-tests for equal variances and Welch's t-tests for unequal variances.

Hypotheses:

The null hypothesis and alternative hypothesis can be written as: : , and :

Procedure:

The procedure defaults to Welch's t-test for unequal variances if eq_var is not specified. In this case, the test statstic is calculated as follows: where and are the sample means and are the sample standard deviations, and and are the sample sizes. The degrees of freedom is approximated with the Welch–Satterthwaite equation:

If variances are assumed to be equal (eq_var = .true.), the procedure conducts a 2 sample t-test for equal variances, using the pooled standard deviation to calculate the t-statistic:

In case of assumed equal variances, the degrees of freedom is calculated as follows:

  • public impure subroutine s_tst_ttest_2s(x1, x2, t, df, p, eq_var, h1)

    Impure wrapper procedure for s_tst_ttest_2s_core.

    Arguments

    Type IntentOptional Attributes Name
    real(kind=wp), intent(in) :: x1(:)

    x1 vector (samples)

    real(kind=wp), intent(in) :: x2(:)

    x2 vector (samples)

    real(kind=wp), intent(out) :: t

    test statistic

    real(kind=wp), intent(out) :: df

    degrees of freedom

    real(kind=wp), intent(out) :: p

    p-value

    logical, intent(in), optional :: eq_var

    true if equal variances assumed

    character(len=*), intent(in), optional :: h1

    option: two (default), le, ge

public interface fsml_ttest_paired

The paired sample t-test (or dependent sample t-test) determines if the mean difference between two sample sets are zero. It is mathematically equivalent to the 1-sample t-test conducted on the difference vector with .

Hypotheses:

The null hypothesis and alternative hypothesis can be written as: : , and :

Procedure:

The test statstic is calculated as follows: where is the mean of the differences between the sample sets, is the standard deviation of the differences, and is the number of paired samples.

The degrees of freedom is calculated as follows:

  • public impure subroutine s_tst_ttest_paired(x1, x2, t, df, p, h1)

    Impure wrapper procedure for s_tst_ttest_paired_core.

    Arguments

    Type IntentOptional Attributes Name
    real(kind=wp), intent(in) :: x1(:)

    x1 vector (samples)

    real(kind=wp), intent(in) :: x2(:)

    x2 vector (samples); must be same length as x1

    real(kind=wp), intent(out) :: t

    test statistic

    real(kind=wp), intent(out) :: df

    degrees of freedom

    real(kind=wp), intent(out) :: p

    p-value

    character(len=*), intent(in), optional :: h1

    option: two (default), le, ge

public interface fsml_var

Computes the population or sample variance (depending on passed arguments). where is the size of (or number of observations in) vector x, are individual elements in x, (ddof) is a degrees of freedom adjustment (ddof = 0.0 for population variance, ddof = 1.0 for sample variance), and is the arithmetic mean of x.

  • public impure function f_sts_var(x, ddf) result(var)

    Impure wrapper function for f_sts_var_core.

    Arguments

    Type IntentOptional Attributes Name
    real(kind=wp), intent(in) :: x(:)

    x vector (assumed size array)

    real(kind=wp), intent(in), optional :: ddf

    delta degrees of freedom

    Return Value real(kind=wp)

    variance