A Survey of Data Mining Techniques for Crime Detection

Shamaila Qayyum, Hafsa Dar


In large datasets, data mining is one of the most powerful ways of knowledge extraction or we can say it is one of the best approaches to detect underlying relationships among data with the help of machine learning and artificial intelligence techniques. Crime Detection is one of the hot topics in data mining where different patterns of criminology are identified. It includes variety of steps, starting from identification of crime characterization till detection of crime pattern. For this purpose, various crime detection techniques have been discussed in literature. In this paper, we have selected widely adapted data mining techniques that are specifically used for crime detection. The analytical study is presented with an extraction in form of strengths and weakness of each technique. Each technique is specific to its use, for example to identify the social ties and roles of criminal in any network, Social Network Analysis techniques is best suited because of its degree, density and centrality of nodes. This survey would serve as a helping guide to researchers to get state of the art crime detection techniques in data mining along with pros and cons. 

Full Text:



P. Kanellis, Digital crime and forensic science in cyberspace: IGI Global, 2006.

J. Hosseinkhani, S. Ibrahim, S. Chuprat, and J. H. Naniz, "Web Crime Mining by Means of Data Mining Techniques," Research Journal of Applied Sciences, Engineering and Technology, vol. 7, pp. 2027-2032, 2014.

P. Thongtae and S. Srisuk, "An analysis of data mining applications in crime domain," in Computer and Information Technology Workshops, 2008. CIT Workshops 2008. IEEE 8th International Conference on, 2008, pp. 122-126.

H. Chen, W. Chung, Y. Qin, M. Chau, J. J. Xu, G. Wang, et al., "Crime data mining: an overview and case studies," in Proceedings of the 2003 annual national conference on Digital government research, 2003, pp. 1-5.

V. Pinheiro, V. Furtado, T. Pequeno, and D. Nogueira, "Natural language processing based on semantic inferentialism for extracting crime information from text," in Intelligence and Security Informatics (ISI), 2010 IEEE International Conference on, 2010, pp. 19-24.

U. Fayyad and R. Uthurusamy, "Evolving data into mining solutions for insights," Communications of the ACM, vol. 45, pp. 28-31, 2002.

S. V. Nath, "Crime pattern detection using data mining," in Web Intelligence and Intelligent Agent Technology Workshops, 2006. WI-IAT 2006 Workshops. 2006 IEEE/WIC/ACM International Conference on, 2006, pp. 41-44.

U. Fayyad, G. Piatetsky-Shapiro, and P. Smyth, "From data mining to knowledge discovery in databases," AI magazine, vol. 17, p. 37, 1996.

J. Hosseinkhani, M. Koochakzaei, S. Keikhaee, and J. H. Naniz, "Detecting suspicion information on the Web using crime data mining techniques," International Journal of Advanced Computer Science and Information Technology, vol. 3, pp. 32-41, 2014.

H. Chen, W. Chung, J. J. Xu, G. Wang, Y. Qin, and M. Chau, "Crime data mining: a general framework and some examples," Computer, vol. 37, pp. 50-56, 2004.

H. Hassani, X. Huang, E. S. Silva, and M. Ghodsi, "A review of data mining applications in crime," Statistical Analysis and Data Mining: The ASA Data Science Journal, vol. 9, pp. 139-154, 2016.

H. Kargupta, K. Liu, and J. Ryan, "Privacy sensitive distributed data mining from multi-party data," in International Conference on Intelligence and Security Informatics, 2003, pp. 336-342.

J. Han, J. Pei, and M. Kamber, Data mining: concepts and techniques: Elsevier, 2011.

G. Gupta, Introduction to data mining with case studies: PHI Learning Pvt. Ltd., 2014.

M. Chau, J. J. Xu, and H. Chen, "Extracting meaningful entities from police narrative reports," in Proceedings of the 2002 annual national conference on Digital government research, 2002, pp. 1-5.

A. Gray, S. MacDonell, and P. Sallis, "Software forensics: Extending authorship analysis techniques to computer programs," 1997.

S. Baluja, V. O. Mittal, and R. Sukthankar, "Applying Machine Learning for High‐Performance Named‐Entity Extraction," Computational Intelligence, vol. 16, pp. 586-595, 2000.

A. Borthwick, J. Sterling, E. Agichtein, and R. Grishman, "Exploiting diverse knowledge sources via maximum entropy in named entity recognition," in Proc. of the Sixth Workshop on Very Large Corpora, 1998.

S. Miller, M. Crystal, H. Fox, L. Ramshaw, R. Schwartz, R. Stone, et al., "Algorithms that learn to extract information: Bbn: Tipster phase iii," in Proceedings of a workshop on held at Baltimore, Maryland: October 13-15, 1998, 1998, pp. 75-89.

I. H. Witten, Z. Bray, M. Mahoui, and W. J. Teahan, "Using language models for generic entity extraction," in Proceedings of the ICML Workshop on Text Mining, 1999.

R. V. Hauck, H. Atabakhsb, P. Ongvasith, H. Gupta, and H. Chen, "Using Coplink to analyze criminal-justice data," Computer, vol. 35, pp. 30-37, 2002.

R. T. Ng and J. Han, "E cient and E ective Clustering Methods for Spatial Data Mining," in Proc. of, 1994, pp. 144-155.

W. Lee, S. J. Stolfo, and K. W. Mok, "A data mining framework for building intrusion detection models," in Security and Privacy, 1999. Proceedings of the 1999 IEEE Symposium on, 1999, pp. 120-132.

H. Yun, D. Ha, B. Hwang, and K. H. Ryu, "Mining association rules on significant rare data using relative support," Journal of Systems and Software, vol. 67, pp. 181-191, 2003.

P.-N. Tan, Introduction to data mining: Pearson Education India, 2006.

J. Ayres, J. Flannick, J. Gehrke, and T. Yiu, "Sequential pattern mining using a bitmap representation," in Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, 2002, pp. 429-435.

O. De Vel, A. Anderson, M. Corney, and G. Mohay, "Mining e-mail content for author identification forensics," ACM Sigmod Record, vol. 30, pp. 55-64, 2001.

C. C. Aggarwal and P. S. Yu, "Outlier detection for high dimensional data," in ACM Sigmod Record, 2001, pp. 37-46.

A. Arning, R. Agrawal, and P. Raghavan, "A Linear Method for Deviation Detection in Large Databases," in KDD, 1996, pp. 164-169.

C. J. Stone, "Classification and regression trees," Wadsworth International Group, vol. 8, pp. 452-456, 1984.

J. R. Quinlan, C4. 5: programs for machine learning: Elsevier, 2014.

J. R. Quinlan, "Induction of decision trees," Machine learning, vol. 1, pp. 81-106, 1986.

C. Cortes and V. Vapnik, "Support-vector networks," Machine learning, vol. 20, pp. 273-297, 1995.

P. Langley, W. Iba, and K. Thompson, "An analysis of Bayesian classifiers," in Aaai, 1992, pp. 223-228.

M. D. Richard and R. P. Lippmann, "Neural network classifiers estimate Bayesian a posteriori probabilities," Neural computation, vol. 3, pp. 461-483, 1991.

G. P. Zhang, "Neural networks for classification: a survey," IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), vol. 30, pp. 451-462, 2000.

H. Gish, "A probabilistic approach to the understanding and training of neural network classifiers," in Acoustics, Speech, and Signal Processing, 1990. ICASSP-90., 1990 International Conference on, 1990, pp. 1361-1364.

P. A. Shoemaker, "A note on least-squares learning procedures and classification by neural network models," IEEE Transactions on Neural Networks, vol. 2, pp. 158-160, 1991.

E. A. Wan, "Neural network classification: A Bayesian interpretation," IEEE Transactions on Neural Networks, vol. 1, pp. 303-305, 1990.

B. Widrow, D. E. Rumelhart, and M. A. Lehr, "Neural networks: applications in industry, business and science," Communications of the ACM, vol. 37, pp. 93-106, 1994.

G. Wang, H. Chen, and H. Atabakhsh, "Automatically detecting deceptive criminal identities," Communications of the ACM, vol. 47, pp. 70-76, 2004.

J. Mena, Investigative data mining for security and criminal detection: Butterworth-Heinemann, 2003.

A. M. Fard and M. Ester, "Collaborative mining in multiple social networks data for criminal group discovery," in Computational Science and Engineering, 2009. CSE'09. International Conference on, 2009, pp. 582-587.

M. K. Sparrow, "The application of network analysis to criminal intelligence: An assessment of the prospects," Social networks, vol. 13, pp. 251-274, 1991.

A. Iriberri and G. Leroy, "Natural language processing and e-government: extracting reusable crime report information," in Information Reuse and Integration, 2007. IRI 2007. IEEE International Conference on, 2007, pp. 221-226.

K. Chan and J. Liebowitz, "The synergy of social network analysis and knowledge mapping: a case study," International journal of management and decision making, vol. 7, pp. 19-35, 2005.

Data citation

Data has been solely made on our own


  • There are currently no refbacks.

Copyright (c) 2018 University of Sindh Journal of Information and Communication Technology

ISSN-E: 2523-1235, ISSN-P: 2521-5582

 Copyright © University of Sindh, Jamshoro. 2017 All Rights Reserved.  
Printing and Publication by: Sindh University Press. 

Journal Office, Institute of Information and Communication Technology, 
University of Sindh, Jamshoro, Sindh, Pakistan. 76080