Ericsson – what happens in 1 minute.
The amount of data that is generated every minute.
The Hadoop ecosystem visualised by Rich Taylor from Datameer.
Interesting post from Silvius Rus from the Cluster Team at Quantcast.
He implemented a simple Sawzall program to process 1 TB of text in 7 seconds starting from disk and did not use Hadoop (though he mentions that Quantcast’s proprietary MapReduce cluster is loosely based on Hadoop).
He also made a very interesting design decision – to drop the sort phase of MapReduce and run the Reducer concurrently with the Mapper.
I also intend to update this collection of Hadoop MapReduce algorithms based on my growing experience with the platform.
Artificial Intelligence / Machine Learning / Data Mining
- NIMBLE: a toolkit for the implementation of parallel data mining and machine learning algorithms on map reduce
- Distributed Evolutionary Algorithm Using the MapReduce Paradigm–A Case Study for Data Compaction Problem
- On Using Pattern Matching Algorithms in MapReduce Applications
- Using Variational Inference and MapReduce to Scale Topic Modeling
- A MapReduce-based distributed SVM algorithm for automatic image annotation
- Scalable and Parallel Boosting with MapReduce
- Master-Slave Parallel Genetic Algorithm Based on MapReduce Using Cloud Computing
- Fast clustering using MapReduce
- K-Means Clustering with Bagging and MapReduce
- In-situ MapReduce for Log Processing
- Clustering Very Large Multi-dimensional Datasets with MapReduce
- Large Scale Fuzzy pD* Reasoning Using MapReduce
- MapReduce network enabled algorithms for classification based on association rules
- PARABLE: A PArallel RAndom-partition Based HierarchicaL ClustEring Algorithm for the MapReduce Framework
- A MapReduce based parallel SVM for large scale spam filtering
- Clustering Systems with Kolmogorov Complexity and MapReduce
- Rapid parallel genome indexing with MapReduce
- CloudAligner: A fast and full-featured MapReduce based tool for sequence mapping
- Nephele: genotyping via complete composition vectors and MapReduce
- Genome Analysis with MapReduce
- Parallel Metagenomic Sequence Clustering via Sketching and Maximal Quasi-clique Enumeration on Map-reduce Clouds
- Hadoop-GIS: A High Performance Query System for Analytical Medical Imaging with MapReduce
Image and Video Processing
- Multi-layer graph-based semi-supervised learning for large-scale image datasets using mapreduce
- Skyline web service selection with MapReduce
- HIPI: A Hadoop Image Processing Interface for Image-based MapReduce Tasks
- An Approach for Processing Large and Non-uniform Media Objects on MapReduce-Based Clusters
- Building Wavelet Histograms on Large Data in MapReduce
Sets & Graphs
- MapReduce in MPI for Large-scale Graph Algorithms
- Design Distributed Digraph Algorithms using MapReduce
- An Intermediate Algebra for Optimizing RDF Graph Pattern Matching on MapReduce
- Processing theta-joins using MapReduce
- Clause-Iteration with MapReduce to Scalably Query Data Graphs in the SHARD Graph-Store
- Mining Tera-Scale Graphs with MapReduce: Theory, Engineering and Discoveries
- Filtering: a method for solving graph problems in MapReduce
- Colorful Triangle Counting and a MapReduce Implementation
- A parallel computing model for large-graph mining with MapReduce
- P 2 LSA and P 2 LSA+: two paralleled probabilistic latent semantic analysis algorithms based on the mapreduce model
- Processing Wikipedia Dumps: A Case-Study comparing the XGrid and MapReduce Approaches
- MapReduce for HITS Algorithm with Application to Chinese Word Networks
- Implementing MapReduce over language and literature data over the UK National Grid Service
- Representing n-gram language models for compact storage and fast retrieval
I love this visualisation.
It puts the recent efforts around driving Big Data Analytics trends within the enterprise into perspective.
In pioneer days they used oxen for heavy pulling, and when one ox couldn’t budge a log, they didn’t try to grow a larger ox. We shouldn’t be trying for bigger computers, but for more systems of computers.