Skip to content
Category

Data mining and machine learning software

page 1
MATLAB
MATLAB (Matrix Laboratory) is a proprietary multi-paradigm programming language and numeric computing environment developed by MathWorks. MATLAB allows matrix manipulations, plotting of functions and data, implementation of algorithms, creation of user interfaces, and interfacing with programs written in other languages.
R
programming language for statistical analysis
GNU Octave
numerical computation software
Mathematica
computational software program
SPSS
SPSS Statistics is a statistical software suite developed by IBM for data management, advanced analytics, multivariate analysis, business intelligence, and criminal investigation. Long produced by SPSS Inc., it was acquired by IBM in 2009. Versions of the software released since 2015 have the brand name IBM SPSS Statistics.
Maple
computer algebra system
Julia
high-performance dynamic programming language
Folding@home
Folding@home (FAH or F@h) is a distributed computing project aimed to help scientists develop new therapeutics for a variety of diseases by the means of simulating protein dynamics. This includes the process of protein folding and the movements of proteins, and is reliant on simulations run on volunteers' personal computers. Folding@home is currently based at the University of Pennsylvania and led by Greg Bowman, a former student of Vijay Pande.
SAS
statistical software
Weka
suite of machine learning software written in Java
Apache Spark
open-source data analytics cluster computing framework
Stata
Stata (, , alternatively , occasionally stylized as STATA) is a general-purpose statistical software package developed by StataCorp for data manipulation, visualization, statistics, and automated reporting. It is used by researchers in many fields, including biomedicine, economics, epidemiology, and sociology.
scikit-learn
scikit-learn (formerly scikits.learn and also known as sklearn) is a free and open-source machine learning library for the Python programming language. It features various classification, regression and clustering algorithms including support-vector machines, random forests, gradient boosting, k-means and DBSCAN, and is designed to interoperate with the Python numerical and scientific libraries NumPy and SciPy. Scikit-learn is a NumFOCUS fiscally sponsored project.
XGBoost
XGBoost (eXtreme Gradient Boosting) is an open-source software library which provides a regularizing gradient boosting framework for C++, Java, Python, R, Julia, Perl, and Scala. It works on Linux, Microsoft Windows, and macOS. From the project description, it aims to provide a "Scalable, Portable and Distributed Gradient Boosting (GBM, GBRT, GBDT) Library". It runs on a single machine, as well as the distributed processing frameworks Apache Hadoop, Apache Spark, Apache Flink, and Dask.
Orange
component-based data mining and machine learning software suite
Q639194
KNIME (), the Konstanz Information Miner, is a data analytics, reporting and integrating platform. KNIME integrates various components for machine learning and data mining through its modular data pipelining "Building Blocks of Analytics" concept. A graphical user interface and use of Java Database Connectivity (JDBC) allows assembly of nodes blending different data sources, including preprocessing (extract, transform, load (ETL)), for modeling, data analysis and visualization with minimal, or no, programming. It is free and open-source software released under a GNU General Public License.
Apache Mahout
open-source machine learning algorithms
FICO
FICO (legal name: Fair Isaac Corporation), originally Fair, Isaac and Company, is an American data analytics company based in Bozeman, Montana, focused on credit scoring services. It was founded by Bill Fair and Earl Isaac in 1956. Its FICO score, a measure of consumer credit risk, has become a fixture of consumer lending in the United States.
RapidMiner
RapidMiner is a data science platform that analyses the collective impact of an organization's data. It was acquired by Altair Engineering in September 2022, which was acquired by Siemens for about $10 billion in March 2025.
General Architecture for Text Engineering
human language processing system
SenseTime
SenseTime is a partly state-owned publicly traded artificial intelligence company headquartered in Hong Kong. The company develops technologies including facial recognition, image recognition, object detection, optical character recognition, medical image analysis, video analysis, autonomous driving, and remote sensing. Since 2019, SenseTime has been repeatedly sanctioned by the U.S. government due to allegations that its facial recognition technology has been deployed in the surveillance and internment of the Uyghurs and other ethnic and religious minorities. SenseTime denies the allegations.
Apache UIMA
UIMA ( ), short for Unstructured Information Management Architecture, is an OASIS standard for content analytics, originally developed at IBM. It provides a component software architecture for the development, discovery, composition, and deployment of multi-modal analytics for the analysis of unstructured information and integration with search technologies.
Gremlin
graph traversal language
ELKI
ELKI (Environment for Developing KDD-Applications Supported by Index-Structures) is a data mining (KDD, knowledge discovery in databases) software framework developed for use in research and teaching. It was originally created by the database systems research unit at LMU Munich, Germany, led by Professor Hans-Peter Kriegel. The project has continued at the Technical University of Dortmund, Germany. It aims at allowing the development and evaluation of advanced data mining algorithms and their interaction with database index structures.
watsonx
IBM's AI and data platform
L-1 Identity Solutions
American biometric technology corporation
Megvii
Megvii () is a Chinese technology company that designs image recognition and deep-learning software. Based in Beijing, the company develops artificial intelligence (AI) technology for businesses and for the public sector.
Catboost
CatBoost is an open-source software library developed by Yandex. It provides a gradient boosting framework which, among other features, attempts to solve for categorical features using a permutation-driven alternative to the classical algorithm. It works on Linux, Windows, macOS, and is available in Python, R, and models built using CatBoost can be used for predictions in C++, Java, C#, Rust, Core ML, ONNX, and PMML. The source code is licensed under Apache License and available on GitHub.