Text Similarity Based on English Morphological Analyzer Approach

  • Abeer K Al-Mashhadany Department of Computer Science, College of Science, Al-Nahrain University, Baghdad-Iraq.
  • Sura M Ali Department of Software Eng., Baghdad College of Economic Sciences University.
  • Sawsan K Thamer Department of Computer Science, College of Science, Al-Nahrain University, Baghdad-Iraq.
Keywords: words frequencies, EMA Approach,Keywords Extraction, text summarization, Text Similarity

Abstract

Nowadays many applications require text similarity. It becomes important for comparing texts on websites. Keywords are useful for a variety of purposes, including summarizing, indexing, labeling, information retrieval, text similarity, clustering, and searching. The objective of the proposed systemis achieving automatic test for text similarity and compute similarity ratio. The system based on several techniques especially English Morphological Analyzer (EMA). In this work, keyword extraction and text summarization are very useful to determine text similarity for long and very long texts. The proposed system solves the problem of text similarity through applying several statistics and linguistic approaches especially based on morphological rules. The linguistic approaches in this system also include synonym, word-frequencies, word position, and Part-Of-Speech (POS). It will be shown that keyword extraction and text summarization that are built on EMA approach and other statistics and linguistic approaches are very useful in building high accurate method for text similarity. The system was tested and the accuracy rates of results bounded from %98.85 to %100.

 

Published
2018-06-26
How to Cite
Al-Mashhadany, A. K., Ali, S. M., & Thamer, S. K. (2018). Text Similarity Based on English Morphological Analyzer Approach. Al-Nahrain Journal of Science, 17(4), 203-212. Retrieved from https://anjs.edu.iq/index.php/anjs/article/view/397
Section
Articles