Skip to content
Category

Big data products

page 1
Apache Hadoop
distributed data processing framework
Apache Cassandra
Free and open-source, distributed, wide column store, NoSQL database management system.
Apache Spark
open-source data analytics cluster computing framework
XGBoost
XGBoost (eXtreme Gradient Boosting) is an open-source software library which provides a regularizing gradient boosting framework for C++, Java, Python, R, Julia, Perl, and Scala. It works on Linux, Microsoft Windows, and macOS. From the project description, it aims to provide a "Scalable, Portable and Distributed Gradient Boosting (GBM, GBRT, GBDT) Library". It runs on a single machine, as well as the distributed processing frameworks Apache Hadoop, Apache Spark, Apache Flink, and Dask.
Snowflake Inc.
cloud-based data-warehousing startup
Apache Mahout
open-source machine learning algorithms
SAP HANA
relational database management system
BigQuery
BigQuery is a managed, serverless data warehouse product by Google, offering scalable analysis over large quantities of data. It is a Platform as a Service (PaaS) that supports querying using a dialect of SQL and Graph Query Language. It also has built-in machine learning capabilities. BigQuery was announced in May 2010 and made generally available in November 2011.
Apache Airflow
open-source workflow management platform written in Python, where workflows are created via Python scripts
DuckDB
DuckDB is an open-source column-oriented Relational Database Management System (RDBMS). It is designed to provide high performance on complex queries against large databases in embedded configuration, such as combining tables with hundreds of columns and billions of rows. Unlike other embedded databases (for example, SQLite) DuckDB is not focusing on transactional (OLTP) applications and instead is specialized for online analytical processing (OLAP) workloads. The project has over 6 million downloads per month.
Apache Beam
Unified programming model