Cover Image

The Efficient Way of Detecting Anomalies in Large Scale Streaming Data

Sheeraz Lighari, Dil Muhammad Akbar Hussain


These days many companies has marketed the big data streams in numerous applications including industry, Internet of Things and telecommunication. The stream of data produced by these applications may contain the values which are not normal. These values are called as anomalies. A lot of work has been done in anomaly detection to the batch data but detecting anomalies from streaming data nevertheless remains a largely available issue. In streaming data, the tasks related to find out the anomalies has become challenging with the passage of time because of the dynamic changes in data, which are produced by different methods applied in data streaming infrastructures. In the process of anomaly detection, first of all, it is required to know the way of finding the normal behavior of data and then it is easy to know the dynamic behavior or change in the data. In this context, clustering is a very prominent technique. The application of clustering method is very common to analyze the static data but in the field of data mining, it is key a problem especially on the streaming data. In this paper, we are applying streaming version of KMeans clustering algorithm for anomaly detection. The algorithm is analyzed both on single and distributed environments. Furthermore, we are investigating the stream of data to know various factors such as accuracy, anomaly detection time, true positive rate, and false positive rate. The data stream used in our analysis is generated from Kddcup99 dataset which is largely used in the field of intrusion detection.

Full Text:



. Sculley D , “Web-scale k-means clustering”, In: Proceedings of the 19th international conference on World wide web. ACM 2010 S. 1177-1178





. Kreps, Jay, Neha Narkhede, and Jun Rao. "Kafka: A distributed messaging system for log processing." in Proceedings of the NetDB, pp. 1-7, June 2011.

. W. Lee , J. Stolfo , “A framework for constructing features and models for intrusion detection systems,” ACM Trans. Inf. Syst. Sec., 2000

. S. Bridges, B. Vaughn, “Fuzzy Data Mining and Genetic Algorithms Applied to Intrusion Detection,” Proceedings of the National Information Systems Security Conference (NISSC), Baltimore,MD, October,

. Govinda, Manish, “A framework for fast and efficient cybersecurity,” Conference on Advances in Computing & Communications, ICACC 2016, 6-8 September 2016, Cochin, India

. Lighari, S. N. and Hussain, D. M. A, “Testing of algorithms for anomaly detection in Big data using apache spark”, 1 Sep 2017 2017 9th International Conference on Computational Intelligence and Communication Networks (CICN). IEEE, p. 97-100 4 p. (International Conference on Computational Intelligence and Communication Networks (CICN)).

. Lighari, S. N. & Hussain, D. M. A, “Hybrid model of rule based and clustering analysis for big data security”, 1 Nov 2017 2017 First International Conference on Latest trends in Electrical Engineering and Computing Technologies (INTELLECT). IEEE, p. 1-5 5 p

. L. Yu and Z. Lan, "A scalable, non-parametric anomaly detection framework for hadoop," in Proceedings of the 2013 ACM Cloud and Autonomic Computing Conference. acm, 2013, p. 22.

. M. Gupta, A. B. Sharma, H. Chen, and G. Jiang, "Context-aware time series anomaly detection for complex systems”, Published 2013

. Apache mahout. [Online]. Available:

. Apache hbase. [Online]. Available:


. Storm-distributed and fault-tolerant realtime computation.[Online].Available:

. s4. [Online]. Available:

. M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. Mccauley, M. Franklin, S. Shenker, and I. Stoica, "Fast and interactive analytics over hadoop data with spark”, August 2012

. Padma, Priya, and Chitturi, “Spark for data science cook book”, Packt 2016

. Preeti, Sudheer, “Analysis of Kdd Dataset attributes-Class wise intrusion detection”, ICRTC 2015


  • There are currently no refbacks.

Copyright (c) 2018 University of Sindh Journal of Information and Communication Technology

ISSN-E: 2523-1235, ISSN-P: 2521-5582

 Copyright © University of Sindh, Jamshoro. 2017 All Rights Reserved.  
Printing and Publication by: Sindh University Press. 

Journal Office, Institute of Information and Communication Technology, 
University of Sindh, Jamshoro, Sindh, Pakistan. 76080