Category
page 1Apache Hadoop
Apache Hadoop
distributed data processing framework
Apache Spark
open-source data analytics cluster computing framework
Apache Mahout
open-source machine learning algorithms
Apache HBase
open source, non-relational, distributed database system
Apache Hive
database engine
Deeplearning4j
Eclipse Deeplearning4j is a programming library written in Java for the Java virtual machine (JVM). It is a framework with wide support for deep learning algorithms. Deeplearning4j includes implementations of the restricted Boltzmann machine, deep belief net, deep autoencoder, stacked denoising autoencoder and recursive neural tensor network, word2vec, doc2vec, and GloVe. These algorithms all include distributed parallel versions that integrate with Apache Hadoop and Spark.
Sqoop
Sqoop is a command-line interface application for transferring data between relational databases and Hadoop.
Cloudera
Cloudera, Inc. is an American data lake software company.

Trino
Open-source distributed SQL query engine
Apache ZooKeeper
system for distributed coordination
Apache Pig
open-source data analytics software
MapR
MapR was a business software company headquartered in Santa Clara, California. MapR software provides access to a variety of data sources from a single computer cluster, including big data workloads such as Apache Hadoop and Apache Spark, a distributed file system, a multi-model database management system, and event stream processing, combining analytics in real-time with operational applications. Its technology runs on both commodity hardware and public cloud computing services. In August 2019, following financial difficulties, the technology and intellectual property of the company were sold
Apache Beam
Unified programming model
Cloudera Impala
open source massively parallel processing SQL query engine for data stored in a computer cluster running Apache Hadoop
Hortonworks
Hortonworks, Inc. was a data software company based in Santa Clara, California that developed and supported open-source software (primarily around Apache Hadoop) designed to manage big data and associated processing.
Apache Accumulo
Open source Bigtable implementation
Oozie
Workflow Scheduler for Hadoop
Gremlin
graph traversal language
Presto
distributed SQL query engine