Research Trends in the Fields of Arabic Natural Language Processing Tasks and Arabic Information Extraction Applications: A Survey Study

Abduladem Aljamel, Hussein Khalil, Yousef Aburawi

Abstract


This survey has explored the literature on the fields of Arabic NLP tasks and Arabic IE applications to analyze the state-of-the-art trends, identify the research gaps in these research fields, and recommend solutions to fulfill these gaps. This study is set out to gather appropriate research articles in the targeted fields from Academic Search Engines and Academic Databases. Subsequently, these articles were surveyed to obtain information about research trends aspects. That is, the contributions achieved, the methodologies applied, and the technical and linguistic resources utilized. This review study has followed systematic review procedure steps to meet the requirements of high-quality survey studies. The collected and reviewed articles cover different research contributions. For instance, the Morphological resolution in the field of Arabic NLP tasks and the Sentiment Analysis (SA) applications in the field of Arabic IE applications. The findings of this study can be summarized into that most of the researchers in the field of Arabic NLP tasks prefer to contribute to NER and then to the Morphological resolution tasks; however, in the field of Arabic IE they prefer to contribute to SA applications and then to the Question and Answering applications. Secondly, most of the reviewed articles applied methodologies, tools, techniques, and algorithms, not for specific languages such as Machine Learning, Artificial Neural Networks, and Deep Learning Algorithms. Lastly, this study provides the first comprehensive assessment which examines associations between the dataset sources domain types and dataset sources ownership types in addition to the relation between articles’ contribution fields and the datasets ownership types. It confirms that the highest-reviewed articles numbers in the field of Arabic NLP tasks are for those that utilize existing and available dataset sources; specifically, in Linguistic domain dataset sources. Nonetheless, the highest reviewed articles numbers in the field of Arabic IE applications are for those whose authors are collecting and creating the dataset sources by themselves; also, in Linguistic domain dataset sources.

Full Text:

PDF

References


Ababou, N., Mazroui, A., & Belehbib, R. (2017). Parsing Arabic Nominal sentences using context free grammar and fundamental rules of classical grammar. International Journal of Intelligent Systems and Applications, 9(8), 11–24. https://doi.org/10.5815/ijisa.2017.08.02

Abdullah, M., AlMasawa, M., Makki, I., Alsolmi, M., & Mahrous, S. (2018). Emotions extraction from Arabic tweets. International Journal of Computers and Applications, 42(7), 661–675. https://doi.org/10.1080/1206212X.2018.1482395

Abo, M. E. M., Raj, R. G., Qazi, A., & Zakari, A. (2019). Sentiment Analysis for Arabic in Social Media Network: A Systematic Mapping Study. ArXiv Preprint, ArXiv ID: 1911.05483.

Abolohom, A., & Omar, N. (2017). A Computational Model for Resolving Arabic Anaphora using Linguistic Criteria. Indian Journal of Science and Technology. Publisher: Indian Society for Education and Environment., 10(3), 1–6. https://doi.org/10.17485/ijst/2017/v10i3/110637

Abumalloh, R. A., AlSerhan, H. M., BinIbrahim, O., & AbuUlbeh, W. (2018). Arabic Part-of-Speech Tagger, an Approach Based on Neural Network Modelling. International Journal of Engineering & Technology. Publisher: Science Publishing Corporation, 7(2.29), 742. https://doi.org/10.14419/ijet.v7i2.29.14009

Al-Ayyoub, M., Khamaiseh, A. A., Jararweh, Y., & Al-Kabi, M. N. (2019). A comprehensive survey of arabic sentiment analysis. Information Processing and Management. Pergamon, 56(2), 320–342. https://doi.org/https://doi.org/10.1016/j.ipm.2018.07.006

AL-Shenak, M., Nahar, K. M. O., & Halawani, K. M. H. (2019). Aqas: Arabic question answering system based on svm, svd, and lsi. Journal of Theoretical and Applied Information Technology. Little Lion Scientific, 97(2), 681–691. https://doi.org/ISSN: 1992-8645

Al-Smadi, M., Al-Dalabih, I., Jararweh, Y., & Juola, P. (2019). Leveraging Linked Open Data to Automatically Answer Arabic Questions. IEEE Access, 7(March), 177122–177136. https://doi.org/10.1109/ACCESS.2019.2956233

Al-Smadi, M., Al-Zboon, S., Jararweh, Y., & Juola, P. (2020). Transfer Learning for Arabic Named Entity Recognition with Deep Neural Networks. IEEE Access, 8, 37736–37745. https://doi.org/10.1109/ACCESS.2020.2973319

Alalyani, N., & Marie-Sainte, S. L. (2018). NADA: New Arabic dataset for text classification. International Journal of Advanced Computer Science and Applications. Publisher: The Science and Information (SAI) Organization, 9(9), 206–212. https://doi.org/10.14569/ijacsa.2018.090928

Alam, T. M., & Awan, M. J. (2018). Domain Analysis of Information Extraction Techniques. INTERNATIONAL JOURNAL OF MULTIDISCIPLINARY SCIENCES AND ENGINEERING, 9(6), 1–9.

Albarghothi, A., Khater, F., & Shaalan, K. (2017). Arabic Question Answering Using Ontology. Procedia Computer Science, 117, 183–191. https://doi.org/10.1016/j.procs.2017.10.108

Ali, Mohammed N.A., Tan, G., & Hussain, A. (2018). Bidirectional recurrent neural network approach for arabic named entity recognition. Future Internet, 10(12), 1–12. https://doi.org/10.3390/fi10120123

Ali, Mohammed Nadher Abdo, Tan, G., & Hussain, A. (2019). Boosting Arabic Named-Entity Recognition with Multi-Attention Layer. IEEE Access, 7, 46575–46582. https://doi.org/10.1109/ACCESS.2019.2909641

Alian, M., Awajan, A., & Al-kouz, A. (2017). Arabic Word Sense Disambiguation - Survey. International Conference on New Trends in Computing Sciences (ICTCS), 11-13 October 2017, November 2019. https://doi.org/10.1109/ICTCS.2017.23

Aljameel, S. S., Alabbad, D. A., Alzahrani, N. A., Alqarni, S. M., Alamoudi, F. A., Babili, L. M., Aljaafary, S. K., & Alshamrani, F. M. (2021). A sentiment analysis approach to predict an individual’s awareness of the precautionary procedures to prevent covid-19 outbreaks in Saudi Arabia. International Journal of Environmental Research and Public Health. Publisher: Multidisciplinary Digital Publishing Institute (MDPI), 18(1), 1–12. https://doi.org/10.3390/ijerph18010218

Aljamel, A., Osman, T., Acampora, G., Vitiello, A., & Zhang, Z. (2019). Smart Information Retrieval: Domain Knowledge Centric Optimization Approach. IEEE Access, 7(Ml), 4167–4183. https://doi.org/10.1109/ACCESS.2018.2885640

Almarimi, A. A., & Enbiah, E. M. (2020). Recognition System for Libyan Entity Names. European Journal of Electrical Engineering and Computer Science, 4(6), 1–5. https://doi.org/10.24018/ejece.2020.4.6.263

Almuhareb, A., Alsanie, W., & Al-Thubaity, A. (2019). Arabic Word Segmentation With Long Short-Term Memory Neural Networks and Word Embedding. IEEE Access, 7, 12879–12887. https://doi.org/10.1109/ACCESS.2019.2893460

Alnaied, A., Elbendak, M., & Bulbul, A. (2020). An intelligent use of stemmer and morphology analysis for Arabic information retrieval. Egyptian Informatics Journal, 21(4), 209–217. https://doi.org/10.1016/j.eij.2020.02.004

Alqrainy, S., & Alawairdhi, M. (2021). Towards developing a comprehensive tag set for the Arabic language. Journal of Intelligent Systems, 30(1), 287–296. https://doi.org/10.1515/jisys-2019-0256

Alsafari, S., Sadaoui, S., & Mouhoub, M. (2020). Hate and offensive speech detection on Arabic social media. Online Social Networks and Media, 19(September), Article 100096. https://doi.org/10.1016/j.osnem.2020.100096

Alshammari, N., & Alanazi, S. (2020). An Arabic dataset for disease named entity recognition with multi-annotation schemes. Data. Publisher: Multidisciplinary Digital Publishing Institute (MDPI), 5(3), 1–8. https://doi.org/10.3390/data5030060

Alswaidan, N., & Menai, M. (2020). Hybrid Feature Model for Emotion Recognition in Arabic Text. IEEE Access, 8, 37843–37854. https://doi.org/10.1109/ACCESS.2020.2975906

ASBAYOU, O. (2020). Automatic Arabic Named Entity Extraction and Classification for Information Retrieval. International Journal on Natural Language Computing, 9(6), 1–22. https://doi.org/10.5121/ijnlc.2020.9601

Azman, B. (2019). Root Identification Tool for Arabic Verbs. IEEE Access, 7, 45866–45871. https://doi.org/10.1109/ACCESS.2019.2908177

Azmi, A. M., Al-qabbany, A. O., & Hussain, A. (2019). Computational and natural language processing based studies of hadith literature : a survey. Artificial Intelligence Review, 52(2), 1369–1414. https://doi.org/10.1007/s10462-019-09692-w

Bakari, W., & Neji, M. (2020). A novel semantic and logical ‑ based approach integrating RTE technique in the Arabic question – answering. International Journal of Speech Technology. https://doi.org/10.1007/s10772-020-09684-0

Ben-Othman, M. T., Al-Hagery, M. A., & El-Hashemi, Y. M. (2020). Arabic Text Processing Model: Verbs Roots and Conjugation Automation. IEEE Access, 8, 103913–103923. https://doi.org/10.1109/ACCESS.2020.2999259

Boudchiche, M., & Mazroui, A. (2019). A hybrid approach for Arabic lemmatization. International Journal of Speech Technology, 22(3), 563–573. https://doi.org/10.1007/s10772-018-9528-3

Chowdhury, G. (2003). Natural Language Processing. In The Annual Review of Information Science and Technology (Vol. 37). https://doi.org/ISSN 0066-4200

Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., & Kuksa, P. (2011). Natural Language Processing (Almost) from Scratch. Journal of Machine Learning Research, 12, 2461–2505.

Daoud, D. M., & El-Seoud, M. S. A. (2017). Employing information extraction for building mobile applications. International Journal of Interactive Mobile Technologies, 11(2), 99–112. https://doi.org/10.3991/ijim.v11i2.6569

El Bazi, I., & Laachfoubi, N. (2018). Arabic Named Entity Recognition using topic modeling. International Journal of Intelligent Engineering and Systems, 11(1), 229–238. https://doi.org/10.22266/ijies2018.0228.24

Eldin, S. S., Mohammed, A., Eldin, A. S., & Hefny, H. (2020). An enhanced opinion retrieval approach via implicit feature identification. Journal of Intelligent Information Systems. https://doi.org/10.1007/s10844-020-00622-9

Farghaly, A., & Shaalan, K. (2009). Arabic Natural Language Processing: Challenges and Solutions. ACM Transactions on Asian Language Information Processing, 8(4), 1–19. https://doi.org/10.1145/1644879.1644881

Fasha, M., Obeid, N., & Hammo, B. (2017). A Proposed Model for Extracting Information from Arabic-Based Controlled Text Domains. Proceedings of the New Trends in Information Technology (NTIT), 25-27 April 2017, 86–92.

Ghembaza, M. I. E., Aloufi, K. S., & Smai, A. (2018). Arabic Solid-Stems for an Efficient Morphological Analysis. Arabian Journal for Science and Engineering, 43(12), 7373–7383. https://doi.org/10.1007/s13369-017-2938-8

Ghoniem, R. M., Alhelwa, N., & Shaalan, K. (2019). A novel hybrid genetic-whale optimization model for ontology learning from Arabic text. Algorithms. Publisher: Multidisciplinary Digital Publishing Institute (MDPI), 12(9), 1–32. https://doi.org/10.3390/a12090182

Guellil, I., Adeel, A., Azouaou, F., Chennoufi, S., Maafi, H., & Hamitouche, T. (2020). Detecting hate speech against politicians in Arabic community on social media. International Journal of Web Information Systems. Emerald Publishing, 16(3), 295–313. https://doi.org/10.1108/IJWIS-08-2019-0036

Gusenbauer, M., & Haddaway, N. R. (2020). Which academic search systems are suitable for systematic reviews or meta-analyses? Evaluating retrieval qualities of Google Scholar, PubMed, and 26 other resources. Research Synthesis Methods. Wiley Online Library, 11(2), 181–217. https://doi.org/10.1002/jrsm.1378

Hamza, A., En-Nahnahi, N., Zidani, K. A., & El Alaoui Ouatik, S. (2021). An arabic question classification method based on new taxonomy and continuous distributed representation of words. Journal of King Saud University - Computer and Information Sciences, 33(2), 218–224. https://doi.org/10.1016/j.jksuci.2019.01.001

Karaa, W., & Slimani, T. (2017). A new approach for arabic named entity recognition. International Arab Journal of Information Technology, 14(3), 332–338.

Khalatia, M. M., & Al-Romanyb, T. A. H. (2020). Artificial Intelligence Development and Challenges ( Arabic Language as a Model ). International Journal of Innovation, Creativity and Change, 13(5), 916–926.

Khalil, H., & Osman, T. (2014). Challenges in information retrieval from unstructured arabic data. Proceedings - UKSim-AMSS 16th International Conference on Computer Modelling and Simulation, UKSim 2014, 456–461. https://doi.org/10.1109/UKSim.2014.115

Khalil, H., Osman, T., & Miltan, M. (2020). Extracting Arabic Composite Names Using Genitive Principles of Arabic Grammar. ACM Transactions on Asian and Low-Resource Language Information Processing, 19(4), 1–16. https://doi.org/10.1145/3382187

Maloney, J., & Niv, M. (1998). TAGARAB: A Fast, Accurate Arabic Name Recogniser Using High Precision Morphological Analysis. Proceedings of the ACL Workshop on Computational Approaches to Semitic Languages, 8–15.

Mannai, M., Karâa, W. B. A., & Ghezala, H. H. Ben. (2018). Information extraction approaches: A survey. In D. K. Mishra, A. T. Azar, & A. Joshi (Eds.), Information and Communication Technology. Advances in Intelligent Systems and Computing (Vol. 625, pp. 289–297). Springer, Singapore. https://doi.org/10.1007/978-981-10-5508-9_28

Mansour, M. A. (2013). The Absence of Arabic Corpus Linguistics: A Call for Creating an Arabic National Corpus. International Journal of Humanities and Social Science, 3(12), 81–90.

Marie-sainte, S. L., Alalyani, N., Alotaibi, S., Ghouzali, S., & Abunadi, I. (2019). Arabic Natural Language Processing and Machine Learning-Based Systems. IEEE Access, 7, 7011–7020. https://doi.org/10.1109/ACCESS.2018.2890076

Miswar, Suhardi, & Kurniawan, N. B. (2018). A Systematic Literature Review on Survey Data Collection System. International Conference on Information Technology Systems and Innovation (ICITSI), 22-26 Oct. 2018, 177–181. https://doi.org/10.1109/ICITSI.2018.8696036

Mohamed, E. H., & Shokry, E. M. (2020). QSST: A Quranic Semantic Search Tool based on word embedding. Journal of King Saud University - Computer and Information Sciences, xx(xx), xx. https://doi.org/10.1016/j.jksuci.2020.01.004

Mohamed, S., Hussien, M., & Mousa, H. M. (2021). ADPBC: Arabic Dependency Parsing Based Corpora for Information Extraction. International Journal of Modern Education and Computer Science (IJMECS). Publisher: Modern Education and Computer Science (MECS) Press, 13(1), 54–61. https://doi.org/10.5815/ijitcs.2021.01.04

Muhammad, M., Rohaim, M., Hamouda, A., & Abdel-Mageid, S. (2020). A comparison between conditional random field and structured support vector machine for Arabic named entity recognition. Journal of Computer Science, 16(1), 117–125. https://doi.org/10.3844/jcssp.2020.117.125

Nadkarni, P. M., Ohno-machado, L., & Chapman, W. W. (2011). Natural language processing : an introduction. Journal of the American Medical Informatics Association, 18(5), 544–551. https://doi.org/10.1136/amiajnl-2011-000464

Najeeb, M. M. A. (2020). A novel hadith processing approach based on genetic algorithms. IEEE Access, 8, 20233–20244. https://doi.org/10.1109/ACCESS.2020.2968417

Obeid, O., Zalmout, N., Khalifa, S., Taji, D., Oudah, M., Alhafni, B., Inoue, G., Eryani, F., Erdmann, A., & Habash, N. (2020). CAMeL tools: An open source python toolkit for arabic natural language processing. LREC 2020 - 12th International Conference on Language Resources and Evaluation, Conference Proceedings. 13-14-15 May 2020, 7022–7032.

Omar, N., & Al-Tashi, Q. (2018). Arabic nested noun compound extraction based on linguistic features and statistical measures. GEMA Online Journal of Language Studies. Publisher: Universiti Kebangsaan Malaysia Press, 18(2), 93–107. https://doi.org/10.17576/gema-2018-1802-07

Ombabi, A. H., Ouarda, W., & Alimi, A. M. (2020). Deep learning CNN – LSTM framework for Arabic sentiment analysis using textual information shared in social networks. Social Network Analysis and Mining, 10(Article number: 53), 1–13. https://doi.org/10.1007/s13278-020-00668-1

Paré, G., & Kitsiou, S. (2016). Methods for Literature Reviews. In F. L. and C. Kuziemsky (Ed.), Handbook of eHealth Evaluation: An Evidence-based Approach (pp. 157–179). University of Victoria.

Pare, G., Trudel, M., Jaana, M., & Kitsiou, S. (2015). Synthesizing information systems knowledge: A typology of literature reviews. Information & Management. Elsevier, 52, 183–199. https://doi.org/http://dx.doi.org/10.1016/j.im.2014.08.008

Saadi, A., & Belhadef, H. (2020). Deep neural networks for Arabic information extraction. Smart and Sustainable Built Environment, Emerald Publishing, 9(4), 467–482. https://doi.org/10.1108/SASBE-03-2019-0031

Salloum, S. A., AlHamad, A. Q., Al-Emran, M., & Shaalan, K. (2018). A Survey of Arabic Text Mining. In Studies in Computational Intelligence (pp. 417–431). Springer International Publishing. https://doi.org/10.1007/978-3-319-67056-0_20

Sarhan, I., El-Sonbaty, Y., & El-Nasr, M. A. (2016). Arabic Relation Extraction : A Survey. International Journal of Computer and Information Technology, 05(05), 430–437.

Schubert, L. (2019). Computational Linguistics. In E. N. Zalta (Ed.), The Stanford Encyclopedia of Philosophy, SEP (Spr2019ed.). Stanford University. https://plato.stanford.edu/archives/spr2019/entries/computational-linguistics/

Shaalan, K., Siddiqui, S., Alkhatib, M., & Monem, A. A. (2018). Challenges in Arabic Natural Language Processing. In N. El Gayar & C. Y. Suen (Eds.), Computational Linguistics, Speech and Image Processing for Arabic Language (pp. 59–83, Chapter 3). World Scientific Publishing. https://doi.org/10.1142/9789813229396_0003

Soudani, N., Bounhas, I., & Slimani, Y. (2019). MOSSA: a morpho-semantic knowledge extraction system for Arabic information retrieval. International Journal of Knowledge and Web Intelligence. Inderscience Publisher, 6(2), 106–141. https://doi.org/10.1504/ijkwi.2019.103622

Taghizadeh, N., Faili, H., & Maleki, J. (2018). Cross-Language Learning for Arabic Relation Extraction. Procedia Computer Science, 142, 190–197. https://doi.org/10.1016/j.procs.2018.10.475

Thalji, N., Hanin, N. A., Al-Hakeem, S., Hani, W. B., & Thalji, Z. (2018). A novel rule-based root extraction algorithm for Arabic language. International Journal of Advanced Computer Science and Applications. Publisher: Science and Information Organization, 9(10), 120–128. https://doi.org/10.14569/IJACSA.2018.091015

Wang, Y., Wang, L., Rastegar-Mojarad, M., Moon, S., Shen, F., Afzal, N., Liu, S., Zeng, Y., Mehrabi, S., Sohn, S., & Liu, H. (2018). Clinical information extraction applications: A literature review. Journal of Biomedical Informatics, 77(November 2017), 34–49. https://doi.org/10.1016/j.jbi.2017.11.011

Zakria, G., Farouk, M., Fathy, K., & Makar, M. N. (2019). Relation Extraction from Arabic Wikipedia. Indian Journal of Science and Technology, 12(46), 01–06. https://doi.org/10.17485/ijst/2019/v12i46/147512

Zerrouki, T. (2020). Towards An Open Platform For Arabic Language Processing. Degree of Doctor of Science, Thesis, National School of Computer Science (ESI), Algiers.