IMPROVING ETL/SQL EXECUTION THROUGH ANOMALY DETECTION

Authors

  • Prof. (Dr.) R. Kamatchi Prof. (Dr.) R. Kamatchi Amity University, Mumbai, India Author
  • Kunal Suri K.J. Somaiya Institute of Management Studies and Research , VidyaVihar, Mumbai-77, India Author

Keywords:

Data Warehousing, Anomaly Detection, Clustering, Log Files, Query tuning, Tuning

Abstract

Data Warehouse is a business analyst’s dream – a platform where data from multiple sources are collected. Different types of analysis are performed on the data obtained from the data warehouse. The process of loading data intoadata warehouse is known as ETL (Extract, Transform, Load). It is a complex process that comprises of executingthousands of SQL queries. These queries lead to the creation of ETL execution trace. The potential of these conceptshaven’t been used to their full potential. Anomaly is one of the features of these data sets that haven’t beencompletely utilized. The study and identification of Anomaly in the execution of SQL queries and ETLexecutiontrace can help us increase the efficiency of the ETL processes by removing them. To accomplish this task, we tacklethis problem in two stages:- In the first stage, we use Anomaly detection techniques on a rich collectionof production queries. In the second stage, we apply the Anomaly detection technique on the execution logs. Byfollowing this process, we greatly reduce the domain of our detailed analysis. We also identify the clusters that havea genuine concern from the clusters that were created by a huge data store.

Downloads

Published

2017-02-28

Issue

Section

Articles