Meaningful Information Extraction from Unstructured Clinical Documents

Asim Abbas, Muhammad Afzal, Jamil Hussain, Sungyoung Lee


Medical concept and entity extraction from the medical narrative unstructured documents is the crucial step in most of the contemporary health systems. For the extraction of medical concepts and entities, the Unified Medical Language System (UMLS) Metathesaurus is a big source of biomedical and health-related concepts. Recently various tools like Sophia, MetaMap and cTAKES, and many other rules-based methods and algorithm like Quick UMLS etc. have been developed which are performing a successful role in the process of medical concept extraction. The goal of this paper is to design a generic algorithm to identify a package consisting of standard concepts, their semantic types, and entity types on the basis of medical phrases and terms used in the clinical unstructured documents. The proposed algorithm implements the UMLS terminology service (UTS) and customizes to extract concepts for all the meaningful phrases and terms used in the narratives and determine their semantic and entity types in order to find exact categorization of the concepts. The proposed algorithm has produced a very useful set of results to use for labeling the biomedical data, which could in term be used for training data-driven approaches such as
machine learning.

Full Text:



  • There are currently no refbacks.