PMCN: combining pdf-modified similarity and complex network in multi-document summarization

Authors

  • Yi-Ning Tu Fu Jen Catholic University
  • Wei-Tse Hsu Master's student, National Taipei University, Taiwan

Keywords:

TF-PDF, pdf-modified similarity, Document summarization, complex network

Abstract

This study combines the concept of degree centrality in complex network with the Term Frequency * Proportional Document Frequency (TF*PDF) algorithm; the combined method, called TF-PDF, constructs relationship networks among sentences for writing news summaries. The TF-PDF method is a multi-document summarization extension of the ideas of Bun and Ishizuka (2002), who first published the TF*PDF algorithm for detecting hot topics. In their TF*PDF algorithm, Bun and Ishizuka defined the publisher of a news item as its channel. If the PDF weight of a term is higher than the weights of other terms, then the term is hotter than the other terms. However, this study attempts to develop summaries for news items. Because the TF*PDF algorithm summarizes daily news, TF-PDF replaces the concept of “channel” with “the date of the news event,” and uses the resulting chronicle ordering for a multi-document summarization algorithm, of which the F-measure scores were 0.042 and 0.051 higher than LexRank for the famous d30001t and d30003t tasks, respectively.

URL: http://ijkcdt.net/xml/21159/21159.pdf

Author Biography

Yi-Ning Tu, Fu Jen Catholic University

Department of Statistics and Information Science

Downloads

Published

2019-09-30

How to Cite

Tu, Y.-N., & Hsu, W.-T. (2019). PMCN: combining pdf-modified similarity and complex network in multi-document summarization. International Journal of Knowledge Content Development & Technology, 9(3). Retrieved from https://journals.sfu.ca/ijkcdt/index.php/ijkcdt/article/view/180

Issue

Section

Articles