Research Trends in the Fields of Arabic Natural Language Processing Tasks and Arabic Information Extraction Applications: A Survey Study



This survey has explored the literature on the fields of Arabic NLP tasks and Arabic IE applications to analyze the state-of-the-art trends, identify the research gaps in these research fields, and recommend solutions to fulfill these gaps. This study is set out to gather appropriate research articles in the targeted fields from Academic Search Engines and Academic Databases. Subsequently, these articles were surveyed to obtain information about research trends aspects. That is, the contributions achieved, the methodologies applied, and the technical and linguistic resources utilized. This review study has followed systematic review procedure steps to meet the requirements of high-quality survey studies. The collected and reviewed articles cover different research contributions. For instance, the Morphological resolution in the field of Arabic NLP tasks and the Sentiment Analysis (SA) applications in the field of Arabic IE applications. The findings of this study can be summarized into that most of the researchers in the field of Arabic NLP tasks prefer to contribute to NER and then to the Morphological resolution tasks; however, in the field of Arabic IE they prefer to contribute to SA applications and then to the Question and Answering applications. Secondly, most of the reviewed articles applied methodologies, tools, techniques, and algorithms, not for specific languages such as Machine Learning, Artificial Neural Networks, and Deep Learning Algorithms. Lastly, this study provides the first comprehensive assessment which examines associations between the dataset sources domain types and dataset sources ownership types in addition to the relation between articles’ contribution fields and the datasets ownership types. It confirms that the highest-reviewed articles numbers in the field of Arabic NLP tasks are for those that utilize existing and available dataset sources; specifically, in Linguistic domain dataset sources. Nonetheless, the highest reviewed articles numbers in the field of Arabic IE applications are for those whose authors are collecting and creating the dataset sources by themselves; also, in Linguistic domain dataset sources.

Author Biographies

Abduladem Aljamel, Misurata University

Abduladem Aljamel received his Ph.D. degree certificate in Knowledge-based Information Extraction and Exploration, and a Postgraduate Certificate in Professional Reseach Practice from Nottingham Trent University. Also,He is currently a Lecturer and a Reseracher at the school of Information Technology in Misurata University. His research interests include Information Extraction and Knowledge Representation and Exploration. In addtion, he is a member of the Arabic Computational Linguistics research group in Misurata University.

Yousef Aburawi, Misurata University

Yousef Abdurahman Aburawi is an Assistant Professor of Information Technology at Misurata University, Libya. His work focuses on Artificial Intelligence, Natural Language Processing, Websites Development, and e-learning technologies.


Ababou, N., Mazroui, A., & Belehbib, R. (2017). Parsing Arabic Nominal sentences using context free grammar and fundamental rules of classical grammar. International Journal of Intelligent Systems and Applications, 9(8), 11–24.

Abdullah, M., AlMasawa, M., Makki, I., Alsolmi, M., & Mahrous, S. (2018). Emotions extraction from Arabic tweets. International Journal of Computers and Applications, 42(7), 661–675.

Abo, M. E. M., Raj, R. G., Qazi, A., & Zakari, A. (2019). Sentiment Analysis for Arabic in Social Media Network: A Systematic Mapping Study. ArXiv Preprint, ArXiv ID: 1911.05483.

Abolohom, A., & Omar, N. (2017). A Computational Model for Resolving Arabic Anaphora using Linguistic Criteria. Indian Journal of Science and Technology. Publisher: Indian Society for Education and Environment., 10(3), 1–6.

Abumalloh, R. A., AlSerhan, H. M., BinIbrahim, O., & AbuUlbeh, W. (2018). Arabic Part-of-Speech Tagger, an Approach Based on Neural Network Modelling. International Journal of Engineering & Technology. Publisher: Science Publishing Corporation, 7(2.29), 742.

Al-Ayyoub, M., Khamaiseh, A. A., Jararweh, Y., & Al-Kabi, M. N. (2019). A comprehensive survey of arabic sentiment analysis. Information Processing and Management. Pergamon, 56(2), 320–342.

AL-Shenak, M., Nahar, K. M. O., & Halawani, K. M. H. (2019). Aqas: Arabic question answering system based on svm, svd, and lsi. Journal of Theoretical and Applied Information Technology. Little Lion Scientific, 97(2), 681–691. 1992-8645

Al-Smadi, M., Al-Dalabih, I., Jararweh, Y., & Juola, P. (2019). Leveraging Linked Open Data to Automatically Answer Arabic Questions. IEEE Access, 7(March), 177122–177136.

Al-Smadi, M., Al-Zboon, S., Jararweh, Y., & Juola, P. (2020). Transfer Learning for Arabic Named Entity Recognition with Deep Neural Networks. IEEE Access, 8, 37736–37745.

Alalyani, N., & Marie-Sainte, S. L. (2018). NADA: New Arabic dataset for text classification. International Journal of Advanced Computer Science and Applications. Publisher: The Science and Information (SAI) Organization, 9(9), 206–212.

Alam, T. M., & Awan, M. J. (2018). Domain Analysis of Information Extraction Techniques. INTERNATIONAL JOURNAL OF MULTIDISCIPLINARY SCIENCES AND ENGINEERING, 9(6), 1–9.

Albarghothi, A., Khater, F., & Shaalan, K. (2017). Arabic Question Answering Using Ontology. Procedia Computer Science, 117, 183–191.

Ali, Mohammed N.A., Tan, G., & Hussain, A. (2018). Bidirectional recurrent neural network approach for arabic named entity recognition. Future Internet, 10(12), 1–12.

Ali, Mohammed Nadher Abdo, Tan, G., & Hussain, A. (2019). Boosting Arabic Named-Entity Recognition with Multi-Attention Layer. IEEE Access, 7, 46575–46582.

Alian, M., Awajan, A., & Al-kouz, A. (2017). Arabic Word Sense Disambiguation - Survey. International Conference on New Trends in Computing Sciences (ICTCS), 11-13 October 2017, November 2019.

Aljameel, S. S., Alabbad, D. A., Alzahrani, N. A., Alqarni, S. M., Alamoudi, F. A., Babili, L. M., Aljaafary, S. K., & Alshamrani, F. M. (2021). A sentiment analysis approach to predict an individual’s awareness of the precautionary procedures to prevent covid-19 outbreaks in Saudi Arabia. International Journal of Environmental Research and Public Health. Publisher: Multidisciplinary Digital Publishing Institute (MDPI), 18(1), 1–12.

Aljamel, A., Osman, T., Acampora, G., Vitiello, A., & Zhang, Z. (2019). Smart Information Retrieval: Domain Knowledge Centric Optimization Approach. IEEE Access, 7(Ml), 4167–4183.

Almarimi, A. A., & Enbiah, E. M. (2020). Recognition System for Libyan Entity Names. European Journal of Electrical Engineering and Computer Science, 4(6), 1–5.

Almuhareb, A., Alsanie, W., & Al-Thubaity, A. (2019). Arabic Word Segmentation With Long Short-Term Memory Neural Networks and Word Embedding. IEEE Access, 7, 12879–12887.

Alnaied, A., Elbendak, M., & Bulbul, A. (2020). An intelligent use of stemmer and morphology analysis for Arabic information retrieval. Egyptian Informatics Journal, 21(4), 209–217.

Alqrainy, S., & Alawairdhi, M. (2021). Towards developing a comprehensive tag set for the Arabic language. Journal of Intelligent Systems, 30(1), 287–296.

Alsafari, S., Sadaoui, S., & Mouhoub, M. (2020). Hate and offensive speech detection on Arabic social media. Online Social Networks and Media, 19(September), Article 100096.

Alshammari, N., & Alanazi, S. (2020). An Arabic dataset for disease named entity recognition with multi-annotation schemes. Data. Publisher: Multidisciplinary Digital Publishing Institute (MDPI), 5(3), 1–8.

Alswaidan, N., & Menai, M. (2020). Hybrid Feature Model for Emotion Recognition in Arabic Text. IEEE Access, 8, 37843–37854.

ASBAYOU, O. (2020). Automatic Arabic Named Entity Extraction and Classification for Information Retrieval. International Journal on Natural Language Computing, 9(6), 1–22.

Azman, B. (2019). Root Identification Tool for Arabic Verbs. IEEE Access, 7, 45866–45871.

Azmi, A. M., Al-qabbany, A. O., & Hussain, A. (2019). Computational and natural language processing based studies of hadith literature : a survey. Artificial Intelligence Review, 52(2), 1369–1414.

Bakari, W., & Neji, M. (2020). A novel semantic and logical ‑ based approach integrating RTE technique in the Arabic question – answering. International Journal of Speech Technology.

Ben-Othman, M. T., Al-Hagery, M. A., & El-Hashemi, Y. M. (2020). Arabic Text Processing Model: Verbs Roots and Conjugation Automation. IEEE Access, 8, 103913–103923.

Boudchiche, M., & Mazroui, A. (2019). A hybrid approach for Arabic lemmatization. International Journal of Speech Technology, 22(3), 563–573.

Chowdhury, G. (2003). Natural Language Processing. In The Annual Review of Information Science and Technology (Vol. 37). 0066-4200

Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., & Kuksa, P. (2011). Natural Language Processing (Almost) from Scratch. Journal of Machine Learning Research, 12, 2461–2505.

Daoud, D. M., & El-Seoud, M. S. A. (2017). Employing information extraction for building mobile applications. International Journal of Interactive Mobile Technologies, 11(2), 99–112.

El Bazi, I., & Laachfoubi, N. (2018). Arabic Named Entity Recognition using topic modeling. International Journal of Intelligent Engineering and Systems, 11(1), 229–238.

Eldin, S. S., Mohammed, A., Eldin, A. S., & Hefny, H. (2020). An enhanced opinion retrieval approach via implicit feature identification. Journal of Intelligent Information Systems.

Farghaly, A., & Shaalan, K. (2009). Arabic Natural Language Processing: Challenges and Solutions. ACM Transactions on Asian Language Information Processing, 8(4), 1–19.

Fasha, M., Obeid, N., & Hammo, B. (2017). A Proposed Model for Extracting Information from Arabic-Based Controlled Text Domains. Proceedings of the New Trends in Information Technology (NTIT), 25-27 April 2017, 86–92.

Ghembaza, M. I. E., Aloufi, K. S., & Smai, A. (2018). Arabic Solid-Stems for an Efficient Morphological Analysis. Arabian Journal for Science and Engineering, 43(12), 7373–7383.

Ghoniem, R. M., Alhelwa, N., & Shaalan, K. (2019). A novel hybrid genetic-whale optimization model for ontology learning from Arabic text. Algorithms. Publisher: Multidisciplinary Digital Publishing Institute (MDPI), 12(9), 1–32.

Guellil, I., Adeel, A., Azouaou, F., Chennoufi, S., Maafi, H., & Hamitouche, T. (2020). Detecting hate speech against politicians in Arabic community on social media. International Journal of Web Information Systems. Emerald Publishing, 16(3), 295–313.

Gusenbauer, M., & Haddaway, N. R. (2020). Which academic search systems are suitable for systematic reviews or meta-analyses? Evaluating retrieval qualities of Google Scholar, PubMed, and 26 other resources. Research Synthesis Methods. Wiley Online Library, 11(2), 181–217.

Hamza, A., En-Nahnahi, N., Zidani, K. A., & El Alaoui Ouatik, S. (2021). An arabic question classification method based on new taxonomy and continuous distributed representation of words. Journal of King Saud University - Computer and Information Sciences, 33(2), 218–224.

Karaa, W., & Slimani, T. (2017). A new approach for arabic named entity recognition. International Arab Journal of Information Technology, 14(3), 332–338.

Khalatia, M. M., & Al-Romanyb, T. A. H. (2020). Artificial Intelligence Development and Challenges ( Arabic Language as a Model ). International Journal of Innovation, Creativity and Change, 13(5), 916–926.

Khalil, H., & Osman, T. (2014). Challenges in information retrieval from unstructured arabic data. Proceedings - UKSim-AMSS 16th International Conference on Computer Modelling and Simulation, UKSim 2014, 456–461.

Khalil, H., Osman, T., & Miltan, M. (2020). Extracting Arabic Composite Names Using Genitive Principles of Arabic Grammar. ACM Transactions on Asian and Low-Resource Language Information Processing, 19(4), 1–16.

Maloney, J., & Niv, M. (1998). TAGARAB: A Fast, Accurate Arabic Name Recogniser Using High Precision Morphological Analysis. Proceedings of the ACL Workshop on Computational Approaches to Semitic Languages, 8–15.

Mannai, M., Karâa, W. B. A., & Ghezala, H. H. Ben. (2018). Information extraction approaches: A survey. In D. K. Mishra, A. T. Azar, & A. Joshi (Eds.), Information and Communication Technology. Advances in Intelligent Systems and Computing (Vol. 625, pp. 289–297). Springer, Singapore.

Mansour, M. A. (2013). The Absence of Arabic Corpus Linguistics: A Call for Creating an Arabic National Corpus. International Journal of Humanities and Social Science, 3(12), 81–90.

Marie-sainte, S. L., Alalyani, N., Alotaibi, S., Ghouzali, S., & Abunadi, I. (2019). Arabic Natural Language Processing and Machine Learning-Based Systems. IEEE Access, 7, 7011–7020.

Miswar, Suhardi, & Kurniawan, N. B. (2018). A Systematic Literature Review on Survey Data Collection System. International Conference on Information Technology Systems and Innovation (ICITSI), 22-26 Oct. 2018, 177–181.

Mohamed, E. H., & Shokry, E. M. (2020). QSST: A Quranic Semantic Search Tool based on word embedding. Journal of King Saud University - Computer and Information Sciences, xx(xx), xx.

Mohamed, S., Hussien, M., & Mousa, H. M. (2021). ADPBC: Arabic Dependency Parsing Based Corpora for Information Extraction. International Journal of Modern Education and Computer Science (IJMECS). Publisher: Modern Education and Computer Science (MECS) Press, 13(1), 54–61.

Muhammad, M., Rohaim, M., Hamouda, A., & Abdel-Mageid, S. (2020). A comparison between conditional random field and structured support vector machine for Arabic named entity recognition. Journal of Computer Science, 16(1), 117–125.

Nadkarni, P. M., Ohno-machado, L., & Chapman, W. W. (2011). Natural language processing : an introduction. Journal of the American Medical Informatics Association, 18(5), 544–551.

Najeeb, M. M. A. (2020). A novel hadith processing approach based on genetic algorithms. IEEE Access, 8, 20233–20244.

Obeid, O., Zalmout, N., Khalifa, S., Taji, D., Oudah, M., Alhafni, B., Inoue, G., Eryani, F., Erdmann, A., & Habash, N. (2020). CAMeL tools: An open source python toolkit for arabic natural language processing. LREC 2020 - 12th International Conference on Language Resources and Evaluation, Conference Proceedings. 13-14-15 May 2020, 7022–7032.

Omar, N., & Al-Tashi, Q. (2018). Arabic nested noun compound extraction based on linguistic features and statistical measures. GEMA Online Journal of Language Studies. Publisher: Universiti Kebangsaan Malaysia Press, 18(2), 93–107.

Ombabi, A. H., Ouarda, W., & Alimi, A. M. (2020). Deep learning CNN – LSTM framework for Arabic sentiment analysis using textual information shared in social networks. Social Network Analysis and Mining, 10(Article number: 53), 1–13.

Paré, G., & Kitsiou, S. (2016). Methods for Literature Reviews. In F. L. and C. Kuziemsky (Ed.), Handbook of eHealth Evaluation: An Evidence-based Approach (pp. 157–179). University of Victoria.

Pare, G., Trudel, M., Jaana, M., & Kitsiou, S. (2015). Synthesizing information systems knowledge: A typology of literature reviews. Information & Management. Elsevier, 52, 183–199.

Saadi, A., & Belhadef, H. (2020). Deep neural networks for Arabic information extraction. Smart and Sustainable Built Environment, Emerald Publishing, 9(4), 467–482.

Salloum, S. A., AlHamad, A. Q., Al-Emran, M., & Shaalan, K. (2018). A Survey of Arabic Text Mining. In Studies in Computational Intelligence (pp. 417–431). Springer International Publishing.

Sarhan, I., El-Sonbaty, Y., & El-Nasr, M. A. (2016). Arabic Relation Extraction : A Survey. International Journal of Computer and Information Technology, 05(05), 430–437.

Schubert, L. (2019). Computational Linguistics. In E. N. Zalta (Ed.), The Stanford Encyclopedia of Philosophy, SEP (Spr2019ed.). Stanford University.

Shaalan, K., Siddiqui, S., Alkhatib, M., & Monem, A. A. (2018). Challenges in Arabic Natural Language Processing. In N. El Gayar & C. Y. Suen (Eds.), Computational Linguistics, Speech and Image Processing for Arabic Language (pp. 59–83, Chapter 3). World Scientific Publishing.

Soudani, N., Bounhas, I., & Slimani, Y. (2019). MOSSA: a morpho-semantic knowledge extraction system for Arabic information retrieval. International Journal of Knowledge and Web Intelligence. Inderscience Publisher, 6(2), 106–141.

Taghizadeh, N., Faili, H., & Maleki, J. (2018). Cross-Language Learning for Arabic Relation Extraction. Procedia Computer Science, 142, 190–197.

Thalji, N., Hanin, N. A., Al-Hakeem, S., Hani, W. B., & Thalji, Z. (2018). A novel rule-based root extraction algorithm for Arabic language. International Journal of Advanced Computer Science and Applications. Publisher: Science and Information Organization, 9(10), 120–128.

Wang, Y., Wang, L., Rastegar-Mojarad, M., Moon, S., Shen, F., Afzal, N., Liu, S., Zeng, Y., Mehrabi, S., Sohn, S., & Liu, H. (2018). Clinical information extraction applications: A literature review. Journal of Biomedical Informatics, 77(November 2017), 34–49.

Zakria, G., Farouk, M., Fathy, K., & Makar, M. N. (2019). Relation Extraction from Arabic Wikipedia. Indian Journal of Science and Technology, 12(46), 01–06.

Zerrouki, T. (2020). Towards An Open Platform For Arabic Language Processing. Degree of Doctor of Science, Thesis, National School of Computer Science (ESI), Algiers.