DATABASE TECHNOLOGY ON THE WEB: QUERY INTERFACE DETERMINING ALGORITHM FOR DEEP WEB BASED ON HTML FEATURES AND HIERARCHICAL CLUSTERING

R. A. SHAIKH, I. MEMON, J. A. MAHAR, H. SHAIKH

Abstract


According to the features of Hypertext Markup Language, the interactive elements exist in the terminal of Document Object Model tree and they are close to each other in local area, we proposed a method to find web query interface which combines models and rules. In this method, after establishing tree model of Hypertext Markup Language, we locate the parts of interfaces by interaction density and cluster interactive groups by their similarity in local structure hierarchically. Then some nonquery interfaces are filtered out in the help of content-filter composed of rules. This method avoids the excessive dependence on tag “form” and presents a better performance than traditional methods in the property of accuracy and generality. And the accuracy of experiment results on common dataset TEL-8 and self-organized dataset reached respectively to 90.1% and 92%.

Full Text:

PDF

Refbacks

  • There are currently no refbacks.


Copyright (c) 2016 Sindh University Research Journal - SURJ (Science Series)

 Copyright © University of Sindh, Jamshoro. 2017 All Rights Reserved.
Printing and Publication by: Sindh University Press.