Apriori Algorithm for Arabic Data Using MapReduce

  • Ola Abed El-nasser El-khoudary -----> Dr. Rebhi S. Baraka

Aprioi is the most popular algorithm that is used to extract frequent itemsets from large data sets where these frequent itemsets can be used to generate association rules. Such rules are used as a basis for discovering knowledge such as detecting unknown relationships and producing results which can be used for decision making and prediction. 
When the data size is very large, both memory use and computational cost are very expensive. And in this case single processor’s memory and CPU resources are very limited which make the algorithm performance inefficient.
Parallel and distributed computing is effective for improving algorithm performance.
In our research we propose a parallel Apriori approach for large volume of Arabic text document using MapReduce with enhanced speedup and performance, Apriori algorithm that has been popular to collect the itemsets frequently occurred in order to compose Association Rule, MapReduce is a scalable data processing tool that enables to process a massive volume of data in parallel.
The experiments show that the parallel Apriori approach can process large volume of Arabic text efficiently on a MapReduce with 16 computers, which can significantly improve the execution time and speedup and also generate strong association rules.

Keywords: Apriori, frequent itemset, Association Rule, MapReduce and Hadoop.