Roadmap for an Arabic Controlled Language

Hoyam Salah El Fahal, Mohammed Nasri, Karim Bouzoubaa, Adil Kabbaj

Abstract


Controlled Natural Languages or CNLs are artificial subsets of natural languages that aim to make communication clearer and more precise. In general, CNLs are used in communication between humans or with computers and, particularly, when clarity and unambiguity are required. Existing CNLs have been developed to be exploited in many applications such as technical documentation, machine translation or database query language. So far, many CNLs have been developed for Western languages, especially English, but no concrete CNL has yet been proposed for Arabic even with the increasing number of Arabic Internet users in the last two decades. In this paper, we propose a roadmap for developing an Arabic CNL to provide new kind and advanced natural language services for Arabic people. Methodologically speaking, we review the most important existing CNLs in English and other languages helping us knowing some statistics related to the vocabulary size and the number of grammar rules that could help in designing the new CNL. This paper proposes two major approaches; one relies on leveraging on already-built CNLs, whereas the other consists in starting from scratch. The survey of Arabic NLP challenges along the available resources and tools allowed us to favor the second approach as the basis for the proposed roadmap.

Full Text:

PDF

References


Abdelali, Ahmed, et al. "Farasa: A fast and furious segmenter for Arabic." Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations. 2016.

Abouenour, Lahsen, et al. "Building an Arabic morphological analyzer as part of an open Arabic NLP platform." Workshop on HLT and NLP within the Arabic world: Arabic Language and local languages processing Status Updates and Prospects At the 6th Language Resources and Evaluation Conference (LREC’08). 2008.

Adriaens, Geert, and Dirk Schreors. "From COGRAM to ALCOGRAM: Toward a controlled English grammar checker." Proceedings of the 14th conference on Computational linguistics-Volume 2. Association for Computational Linguistics, 1992.

Alansary, Sameh, MagdyNagi, and NohaAdly. "A suite of tools for Arabic natural language processing: A UNL approach." 2013 1st International Conference on Communications, Signal Processing, and their Applications (ICCSPA). IEEE, 2013.

Alhumoud, Sarah O., et al. "Survey on arabic sentiment analysis in twitter." International Science Index 9.1 (2015): 364-368.

Androutsopoulos, Ion, Graeme D. Ritchie, and Peter Thanisch. "Natural language interfaces to databases–an introduction." Natural language engineering 1.1 (1995): 29-81.

Authority, Civil Aviation. "CAP 413: Radiotelephony Manual." (2010).

Bernth, Arendse. "EasyEnglish: a tool for improving document quality." Proceedings of the fifth conference on Applied natural language processing. Association for Computational Linguistics, 1997.

Boudchiche, Mohamed, et al. "AlKhalilMorpho Sys 2: A robust Arabic morpho-syntactic analyzer." Journal of King SaudUniversity-Computer and Information Sciences 29.2 (2017): 141-146.

Boudlal, Abderrahim, et al. "Alkhalilmorpho sys1: A morphosyntactic analysis system for arabic texts." International Arabconference on information technology. Benghazi Libya, 2010.

Buckwalter, T. (2002a). Arabic morphology analysis. Retrieved April23, 2015, from QAMUS: http://www.qamus.org/morphology.htm.

Calderón, Sebastián León. "Building a Controlled Natural Language Framework for Real-time Machine Translation." Revista de LenguasModernas 23 (2015).

Crego, Josep, et al. "Systran's pure neural machine translation systems." arXiv preprint arXiv:1610.05540 (2016).

Elazhary, Hanan. "CAL: A controlled Arabic language for authoring ontologies." Arabian Journal for Science and Engineering 41.8 (2016): 2911-2926.

Elghazaly, T., and A. M. Maabid. "Assessing and Evaluating Arabic Morphological Analyzers and Generators." Future Communication Technology and Engineering: Proceedings of the 2014 International Conference on Future Communication Technology and Engineering (FCTE 2014), Shenzhen, China, 16-17 November 2014. CRC Press, 2015.

FAA (Federal Aviation Administration) JO Order 7110.65W, Air Traffic Control

Fuchs, Norbert E., and Rolf Schwitter. "Attempto controlled english (ace)." arXiv preprint cmp-lg/9603003 (1996).

Fuchs, Norbert E., KaarelKaljurand, and Tobias Kuhn. "Attempto controlled english for knowledge representation." Reasoning Web. Springer, Berlin, Heidelberg, 2008. 104-124.

Gao, Tiantian. "Achieving High Quality Knowledge Acquisition using Controlled Natural Language." Technical Communications of the 33rd International Conference on Logic Programming (ICLP 2017). SchlossDagstuhl-Leibniz-ZentrumfuerInformatik, 2018.

Gridach, Mourad, and NoureddineChenfour. "Developing a new approach for arabic morphological analysis and generation." arXiv preprint arXiv:1101.5494 (2011).

Haralambous, Yannis, Julie Sauvage-Vincent, and John Puentes. "INAUT, a controlled language for the French coast pilot books instructions nautiques." International Workshop on Controlled Natural Language. Springer, Cham, 2014.

Hart, Glen, Martina Johnson, and Catherine Dolbear. "Rabbit: Developing a control natural language for authoring ontologies." European Semantic Web Conference. Springer, Berlin, Heidelberg, 2008.

Höfler, Stefan, and Alexandra Bünzli. "Designing a controlled natural language for the representation of legal norms." Second Workshop on Controlled Natural Languages. 2010.

Y. Jaafar, K. Bouzoubaa, A. Yousfi, R. Tajmout, H. Khamar, "Improving Arabic Morphological Analyzers Benchmark", In The International Journal of Speech Technology (IJST), pp. 1-9, April 2016

Jaafar Y., Bouzoubaa K. (2018) "A New Tool for Benchmarking and Assessing Arabic Syntactic Parsers". In: Lachkar A., Bouzoubaa K., Mazroui A., Hamdani A., Lekhouaja A. (eds) Arabic Language Processing: From Theory to Practice. ICALP 2017. Communications in Computer and Information Science, vol 782. Springer, Cham

Kabbaj, Adil. "Development of intelligent systems and multi-agents systems with amine platform." International Conference on Conceptual Structures. Springer, Berlin, Heidelberg, 2006.

Kahn JM, Gould MK, Krishnan JA, Wilson KC, Au DH, Cooke CR, Douglas IS, Feemster LC, Mularski RA, Slatore CG, Wiener RS. “An official American thoracic society workshop report: developing performance measures from clinical practice guidelines.”. ATS Ad Hoc Committee on the Development of Performance Measures from ATS Guidelines.Ann Am Thorac Soc. 2014 May;11(4):S186-95. doi: 10.1513/AnnalsATS.201403-106ST.

Kammoun, NouhaChaâben, Lamia HadrichBelguith, and Abdelmajid Ben Hamadou. "The MORPH2 new version: A robust morphological analyzer for Arabic texts." JADT 2010: 10th International Conference on Statistical Analysis of Textual Data. 2010.

Kamp, Hans, and Uwe Reyle. From discourse to logic: Introduction to modeltheoretic semantics of natural language, formal logic and discourse representation theory. Vol. 42. Springer Science & Business Media, 2013.

Klein, Dan, and Christopher D. Manning. "Accurate unlexicalized parsing." Proceedings of the 41st Annual Meeting on Association for Computational Linguistics-Volume 1. Association for ComputationalLinguistics, 2003.

Kuhn, Tobias, and Rolf Schwitter. "Writing support for controlled natural languages." Proceedings of the Australasianlanguagetechnology association workshop 2008. 2008.

Kuhn, Tobias. "A survey and classification of controlled natural languages." ComputationalLinguistics 40.1 (2014): 121-170.

Kuhn, Tobias. Controlled English for knowledge representation. Diss. Doctoral thesis, Faculty of Economics, Business Administration and Information Technology of the University of Zurich, Switzerland, to appear, 2009.

Microsoft, "Arabic Toolkit Service (ATKS)," [Online]. Available: https://www.microsoft.com/en-us/research/project/arabic-toolkit-service-atks/. [Accessed 01 03 2017].

Mitamura, Teruko. "Controlled language for multilingual machine translation." Proceedings of Machine Translation Summit VII, Singapore. 1999.

Miyata, Rei, et al. "Japanese controlled language rules to improve machine translatability of municipal documents." Proc. of MT Summit. 2015.

Mubarak, Hamdy. "Build fast and accurate lemmatization for Arabic." arXiv preprint arXiv:1710.06700 (2017).

Muegge, Uwe. "Controlled language-does my company need it?." URL: www.tekom.de/artikel/artikel_2756 html (2009).

Nasri, M., et al. "Toward a semantic analyzer for Arabic language." 22nd IBIMA (2013).

Nasri, Mohammed, AdilKabbaj, and Karim Bouzoubaa. "Integration of the controlled language ace to the amine platform." International Conference on Conceptual Structures. Springer, Berlin, Heidelberg, 2011.

Nasri, Mohammed. Intégration d’une composante pour le traitement du langage naturel dans une plateforme pour les systèmes intelligents. Doctoral dissertation. Ecole Mohammadia d’Ingénieurs, 2016.

Nyberg, Eric H., and Teruko Mitamura. "Controlled language and knowledge-based machine translation: Principles and practice." Proceedings of the first international workshop on controlled language applications. Vol. 74. 1996.

Ogden, C. K. (1930). Basic English: A general introduction with rules and grammar.

Pasha, Arfath, et al. "Madamira: A fast, comprehensive tool for morphological analysis and disambiguation of arabic." LREC. Vol. 14. 2014.

Phraseology, ICAO ICAO Standard. "A Quick Reference Guide for Commercial Air Transport Pilots." ICAO Phraseology Ref. Guide: 1-19.

Pool, Jonathan. "Can controlled languages scale to the Web?." International Workshop on Controlled Language Applications 5. 2006.

S. S. Pradhan, E. Hovy, M. Marcus, M. Palmer, L. Ramshaw and R. Weischedel, "Ontonotes: A unified relational semantic representation," International Journal of Semantic Computing, vol. 1, no. 04, pp. 405-419, 2007.

Robertson, Fiona A. Airspeak. Pearson Education, 2008.

Robertson, Fiona A., and Edward Johnson. Airspeak. Radiotelephony communication for pilots. 1987.

Ross, R. G. "Rulespeak." Business Rule Solutions, LLC (1996).

Ross, Ronald G. "Basic RuleSpeak Guidelines." Do’s and Don’ts in Expressing Natural-Language Business Rules in English (2009a).

Ross, Ronald G. "RuleSpeak Sentence Forms: Specifying Natural-Language Business Rules in English." Business Rules Journal 10.4 (2009b).

Rychtyckyj, Nestor. "An assessment of machine translation for vehicle assembly process planning at Ford motor company." Conference of the Association for Machine Translation in the Americas. Springer, Berlin, Heidelberg, 2002.

Rychtyckyj, Nestor. "Ergonomics analysis for vehicle assembly using artificial intelligence." AI Magazine 26.3 (2005): 41-41.

Sawalha, Majdi, Eric Atwell, and Mohammad AM Abushariah. "SALMA: standard Arabic language morphological analysis." 2013 1st International Conference on Communications, Signal Processing, and their Applications (ICCSPA). IEEE, 2013.

SC (Smart Communications Inc.). News from Smart Communications, Inc. In MT News International—Newsletter of the International Association for Machine Translation. 1994. Issue no. 7.

Schwitter, Rolf, and Marc Tilbrook. "Controlled natural language meets the semanticweb." Proceedings of the AustralasianLanguageTechnology Workshop 2004. 2004.

Schwitter, Rolf. "Controlled natural languages for knowledge representation." Proceedings of the 23rd International Conference on Computational Linguistics: Posters. Association for ComputationalLinguistics, 2010.

Schwitter, Rolf. "English as a formal specification language." Proceedings. 13th International Workshop on Database and Expert Systems Applications. IEEE, 2002.

SEC (Securities and Exchange Commission). "A plain English handbook: How to create clear SEC disclosure documents." US Securities and Exchange Commission, Washington, DC (1998).

Shaalan, Khaled. "A survey of arabic named entity recognition and classification." Computational Linguistics 40, no. 2 (2014): 469-510.

Smart, John M. "SMART controlled English." Proceedings of CLAW 2006 9 (2006).

Smrž, Otakar. "Elixirfm: implementation of functional arabic morphology." Proceedings of the 2007 workshop on computational approaches to Semitic languages: common issues and resources. Association for ComputationalLinguistics, 2007.

Sowa, John F. "Conceptual structures: information processing in mind and machine." (1983).

Strevens, Peter, and Edward Johnson. "SEASPEAK: A project in applied linguistics, language engineering, and eventually ESP for sailors." The ESP Journal 2.2 (1983): 123-129.

Trenkner, Peter. "The IMO Standard Marine Communication Phrases–Refreshing memories to refresh motivation." Proceedings of the IMLA 17th International Maritime English Conference. 2005.

Van der Eijck, P. "Controlled languages in technical documentation." Selected Papers from the Eight CLIN meeting. 1998.

Van Renssen, A. "Gellish Formal English. Definition and Application of a Universal Information Modeling Language." (2013).

Van Renssen, Andries Simon Hendrik Paul. "Gellish: a generic extensible ontological language-design and application of a universal data structure." (2005).

Voice of America. 2009. VOA Special English Word Book: A List of Words Used in Special English Programs on Radio, Television, and the Internet,Washington, DC. Warren, David H. D. and Fernando

Wyner, Adam, et al. "On controlled natural languages: Properties and prospects." International Workshop on Controlled Natural Language. Springer, Berlin, Heidelberg, 2009.

Zaghouani, Wajdi. "Critical survey of the freely available Arabic corpora." arXiv preprint arXiv:1702.07835 (2017).

Zaraket, Fadi, and JadMakhlouta. "Arabic morphological analyzer with agglutinative affix morphemes and fusional concatenation rules." Proceedings of COLING 2012: Demonstration Papers (2012): 517-526.

Zaroukian, Erin. "Human understanding of Controlled Natural Language in simulated tactical environments." 2016 IEEE International Multi-Disciplinary Conference on Cognitive Methods in Situation Awareness and Decision Support (CogSIMA). IEEE, 2016.