Mining Emerging News from Text Data

S. A. DAR, M. MUZAMMAL, H. ZAHEER, I. A. KOREJO

Abstract


we live in the information age. There is so much information emerging over the internet that it is next to impossible to be able to go through all of it. This work is focused on extracting “interesting” information from the web. As a first step, we assume that newspapers report the most interesting information and thus propose a framework that is able to extract interesting information from the internet using the news feed from news websites. We collect RSS feed from a set of user-specified sources and thus obtain the title of the news from the RSS feed. Next, we remove the insignificant words from the news title and a tokenization procedure transforms the keywords into tokens. These tokens are combined to form sets of items. An itemset mining algorithm is implemented to extract most interesting patterns and a de-tokenization procedure is used to extract the most interesting news.

Full Text:

PDF

Refbacks

  • There are currently no refbacks.


Copyright (c) 2016 Sindh University Research Journal - SURJ (Science Series)

 Copyright © University of Sindh, Jamshoro. 2017 All Rights Reserved.
Printing and Publication by: Sindh University Press.