Arabic Stemmer System based on Rules of Roots

Waed Waleed Al-Abweeny, Nahed Abu Zaid


Stemmer is an automated process, which produces a base string in an attempt to represent related words, which is the main step that is used to process data in many types of applications such as text mining, information retrieval, and natural language processing. The stemmer task is to reduce words to their base. The more systems are used to analyse and understand the syntax and semantic of the documents the more accurate is the result. Arabic stemmer is not an easy task due to the morphological variants of certain words which are not always semantically related. This paper introduces an Arabic stemmer system based on Arabic rules to extract trilateral (three radicals), quadrilateral (four radicals), sometimes quintuple (five radicals) and hexagonal (six radicals) if available. In addition, it compares the Arabic stemmer with other stemmer systems, and evaluates it by four Arabic native speakers specialists where it has achieved 96.8% ratio of accuracy.

