FSML (Fortran Statistics and Machine Learning) is a scientific toolkit for statistics and machine learning - from basic correlation and statistical hypothesis testing to random forests regression and clustering. It is written in modern Fortran (2008+) and offered as an FPM package for easy integration into your projects. The source code is hosted on GitHub and released under the MIT licence.

Download the Source

FSML

logo

Description

FSML (Fortran Statistics and Machine Learning) is a scientific toolkit consisting of common statistical and machine learning procedures, including basic descriptive statistics (e.g., mean, variance, correlation), common statistical tests (e.g., t-test, Mann–Whitney U), linear parametric methods and models (e.g., multiple OLS regression, discriminant analysis), and non-linear statistical and machine learning procedures (e.g., k-means clustering).

Key features:

  • Common statistics and machine learning techniques (as used in modern research).
  • Familiar/intuitive interface (similarities to popular Python or R libs).
  • Core procedures are kept pure (to simplify parallelisation and testing), while impure wrappers handle optional arguments and errors for safe conventional use.
  • Minimal requirements/dependencies (Fortran 2008 or later, and stdlib).


Handbook

The FSML Handbook. includes a short tutorial, detailed API documentation, as well as information for contributors and licence (MIT) details. The documentation pages were generated by FORD.


Development

Aims and Scope

The aim is to create an easy-to-use library for modern Fortran applications that covers many statistics and machine learning procedures that are commonly used in research.

Background

FSML started as an effort to rewrite, re-structure, clean-up, and enhance old Fortran code I've written in the past 15 years, and to bundle and publish it as a well organised and well documented library.

The published research below uses some of the to-be-reworked code and demonstrates some applications of the above-mentioned methods:

  • Mutz and Ehlers (2019) (k-means and hierarchical clustering, and discriminant analysis).
  • Mutz et al. (2015) (multiple regression in cross validation and bootstrap setting, principal component analysis, and Bayesian classifier).

Progress

Currently covered are procedures for sample statistics (STS), statistical distributions (DST) and statistical tests (TST). See the full list here. Additionally planned are procedures that rely heavily on linear algebra (e.g., PCA), nonlinear algorithmic procedures (e.g., k-means clustering), and machine learning framework extensions (e.g., cross-validation).

Alpha

I will consider the library to be in "alpha" once FSML covers all of the originally planned functionality.

Beta

This stage is reached once FSML:

  • has undergone substantial testing (incl. comparisons to other libs).
  • has proper documentation.
  • fully works with GFortran and LFortran compilers.

Warning

FSML is in a pre-alpha state. Existing procedures and API may change significantly.

Developer picture

Developer Info

Sebastian G. Mutz

Climate/Earth scientist (assoc. professor) with passion for statistics, modelling, AI, games, music, open culture & coding (Fortran & Python). 🇪🇺 🦊🌱