FSML (Fortran Statistics and Machine Learning) is a scientific toolkit for statistics and machine learning - from basic correlation and statistical hypothesis testing to different multivariate regression and clustering procedures. It is written in modern Fortran (2008+) and offered as an FPM package for easy integration into your projects. The source code is hosted on GitHub and released under the MIT licence.

Download the Source

FSML


Description

FSML (Fortran Statistics and Machine Learning) is a scientific toolkit consisting of common statistical and machine learning procedures, including basic statistics (e.g., mean, variance, correlation), common statistical tests (e.g., t-test, Mann–Whitney U), linear parametric methods and models (e.g., multiple OLS regression, discriminant analysis), and non-linear statistical and machine learning procedures (e.g., k-means clustering).

Key features

  • Common statistics and machine learning techniques (as used in modern research).
  • Familiar/intuitive interface (similarities to popular Python or R libs).
  • Core procedures are kept pure (to simplify parallelisation and testing), while impure wrappers handle optional arguments and errors for safe conventional use.
  • Minimal requirements/dependencies (Fortran 2008 or later, and stdlib).

Modules

FSML has five thematic modules: Basic statistics (STS), hypothesis tests (TST), linear procedures (LIN), non-linear procedures (NLP), and statistical distribution functions (DST).

FSML modules. \label{fig:fig1}


Handbook

The FSML Handbook. includes a short tutorial, detailed API documentation, as well as information for contributors and licence (MIT) details. The documentation pages were generated by FORD.


Development

Aims and Scope

The aim is to create an easy-to-use library for modern Fortran applications that covers many statistics and machine learning procedures that are commonly used in research.

Background

FSML started as an effort to rewrite, re-structure, clean-up, and enhance old Fortran code I've written in the past 15 years, and to bundle and publish it as a well organised and well documented library.

The published research below uses some of the to-be-reworked code and demonstrates some applications of the above-mentioned methods:

  • Mutz and Ehlers (2019) (k-means and hierarchical clustering, and discriminant analysis).
  • Mutz et al. (2015) (multiple regression in cross validation and bootstrap setting, principal component analysis, and Bayesian classifier).

Progress

Currently covered are procedures for basic statistics (STS), statistical distributions (DST), statistical tests (TST), procedures that rely heavily on linear algebra (LIN), and non-linear algorithmic procedures (NLP). See the full list here. Additionally planned are machine learning framework extensions (e.g., cross-validation) and further additions to the NLP module.


Distribution, Installation, Testing

FSML is offered as an FPM package with examples and tests.

Developer picture

Developer Info

Sebastian G. Mutz

Climate/Earth scientist (assoc. professor) with passion for statistics, modelling, AI, games, music, open culture & coding (Fortran & Python). 🇪🇺 🦊🌱