Orange 3 Natural Language Processing Reusable Template

The below Natural Language workflow can be used to generate Topic Models from a monolingual corpus, along with their associated Word Clouds.

Latent Semantic Indexing, Latent Dirichlet Allocation and Hierarchical Dirichlet Process are the three techniques available for Topic Modelling in the Orange 3 toolkit.

From my experience, the most useful and relevant Topics were produced by the Latent Semantic Indexing (LSI) because of its ability to correlate semantically related terms that are latent in a collection of text. LSI employs a mathematical technique called singular value decomposition (SVD) to identify patterns in the relationships between the terms and concepts contained in an unstructured collection of text.

Screen Shot 2019-07-13 at 9.52.06 pm


Useful Hadoop and MapReduce Algorithms

I’ve chosen to reproduce the below list from Amund Tveit’s article so I can maintain a backed-up personal reference.

I also intend to update this collection of Hadoop MapReduce algorithms based on my growing experience with the platform. 

Artificial Intelligence / Machine Learning / Data Mining

  1. NIMBLE: a toolkit for the implementation of parallel data mining and machine learning algorithms on map reduce 

  2. Distributed Evolutionary Algorithm Using the MapReduce Paradigm–A Case Study for Data Compaction Problem 

  3. On Using Pattern Matching Algorithms in MapReduce Applications

  4. Using Variational Inference and MapReduce to Scale Topic Modeling 

  5. A MapReduce-based distributed SVM algorithm for automatic image annotation 

  6. Scalable and Parallel Boosting with MapReduce 

  7. Master-Slave Parallel Genetic Algorithm Based on MapReduce Using Cloud Computing  
  8. Fast clustering using MapReduce

  9. K-Means Clustering with Bagging and MapReduce

  10. In-situ MapReduce for Log Processing 

  11. Clustering Very Large Multi-dimensional Datasets with MapReduce 

  12. Large Scale Fuzzy pD* Reasoning Using MapReduce

  13. MapReduce network enabled algorithms for classification based on association rules 

  14. PARABLE: A PArallel RAndom-partition Based HierarchicaL ClustEring Algorithm for the MapReduce Framework

  15. A MapReduce based parallel SVM for large scale spam filtering 

  16. Clustering Systems with Kolmogorov Complexity and MapReduce  

Bioinformatics / Medical Informatics