Cover Image

Studying the Reduction Techniques for Mining Engineering Datasets

Mustafa Ali Abuzaraida


Over the world, companies often have huge datasets those are stored in databases. The huge size could make difficulty of data analysis because data are more complex in terms of attributes number and number of cases. To overcome this problem could be done by using a sufficient number of attributes and cases before mining this dataset. In data mining field, many techniques that can be used to reduce the number of attributes and similar cases. In this paper, three reduction techniques namely Genetic Algorithm (GA), Principal Component Analysis (PCA), and Johnson have been tested on engineering domain using five datasets which obtained from UCI machine learning archive. The study examines which reduction technique is most proper for Engineering datasets. In addition, the study also identifies the ranking of the three techniques based on percentage accuracy and number of selected attributes.

Full Text:



Jia Li, Yimin Zhang, Dongyun Du and Zhengyu Liu, "Improvements in the decision making for Cleaner Production by data mining: Case study of vanadium extraction industry using weak acid leaching process," Journal of Cleaner Production, vol. 143, pp. 582-597, 2017.

Mustafa Ali Abuzaraida and Amel Faraj Elramalli, "Identifying the Suitable Reduction Technique for Mining Medical Data," In Proceeding of the The 8th International Conference on Information Technology (ICIT 2017), Amman, Jordan, 2017.

Jiawei Han, Jian Pei and Micheline Kamber, Data mining: concepts and techniques: Elsevier, 2011.

Lindsay I Smith, "A tutorial on principal components analysis," Cornell University, USA, vol. 51, p. 52, 2002.

John H Holland, "Genetic algorithms," Scientific american, vol. 267, pp. 66-72, 1992.

Donald B Johnson, "Efficient algorithms for shortest paths in sparse networks," Journal of the ACM (JACM), vol. 24, pp. 1-13, 1977.

Wilko Henecka and Matthew Roughan, "Privacy-Preserving Fraud Detection Across Multiple Phone Record Databases," IEEE Transactions on Dependable and Secure Computing, vol. 12, pp. 640-651, 2015.

Elusade O Moses and Osuolale A Festus, "Multidimensional Analysis and Mining of Call Detail Records Using Pattern Cube Algorithm," Computer Engineering & Information Technology, vol. 2017, 2017.

Yi Lou, Juqin Shen and Shiye Yuan, "The development and application of hydraulic engineering migration risk early warning system based on data mining," In Proceeding of the IEEE International Conference on Computer Communication and the Internet (ICCCI), 2016, pp. 346-349, 2016.

PENG Chen, ZHAO Rong-Cai, Shan ZHENG, XUN Jia and YAN Li-Jing, "Android Malware of Static Analysis Technology Based on Data Mining," DEStech Transactions on Computer Science and Engineering, 2016.

Hao Wang and Jinhai Sun, "Quantitative Analysis of Data Mining Application and Sports Industry Financing Mechanism based on Cloud Computing," International Journal of Grid and Distributed Computing, vol. 9, pp. 233-244, 2016.

Leandro L Minku, Emilia Mendes and Burak Turhan, "Data mining for software engineering and humans in the loop," Progress in Artificial Intelligence, vol. 5, pp. 307-314, 2016.

Attila Nemes and Bela Lantos, "Training data reduction for optimisation of fuzzy logic systems for dynamic modeling of robot manipulators by genetic algorithms," In Proceeding of the Proceedings of the 18th IEEE Instrumentation and Measurement Technology Conference, 2001., pp. 1418-1423, 2001.

Ashish S Banthia, Anura P Jayasumana and Yashwant K Malaiya, "Data size reduction for clustering-based binning of ICs using principal component analysis (PCA)," In Proceeding of the IEEE International Workshop on Current and Defect Based Testing, 2005., pp. 24-30, 2005.

Ira Cohen, Qi Tian, Xiang Sean Zhou and Thomas S Huang, "Feature selection using principal feature analysis," Univ. of Illinois at Urbana-Champaign, 2002.

Pang-Ning Tan, Introduction to data mining: Pearson Education India, 2006.

Usama Fayyad, Gregory Piatetsky-Shapiro and Padhraic Smyth, "The KDD process for extracting useful knowledge from volumes of data," Communications of the ACM, vol. 39, pp. 27-34, 1996.


  • There are currently no refbacks.

Copyright (c) 2018 University of Sindh Journal of Information and Communication Technology

ISSN-E: 2523-1235, ISSN-P: 2521-5582

 Copyright © University of Sindh, Jamshoro. 2017 All Rights Reserved.  
Printing and Publication by: Sindh University Press. 

Journal Office, Institute of Information and Communication Technology, 
University of Sindh, Jamshoro, Sindh, Pakistan. 76080