Data mining is a complex discipline that requires knowledge of statistics and computer science. There is a common misconception that data mining is all about retrieving data from very large datasets. This is not exactly true, because the main idea behind data mining techniques is discovery of new patterns and behaviors, and information retrieval is a secondary consideration. In this article, I’m going to give you a brief explanation of what data mining is, where it is used and why it is such a hot topic right now.
Why do we need data mining?
Over the years, both businesses and research centers have accumulated incredible amounts of data. Sadly, most of that data just sits there without any specific purpose. Some time ago, businesses realized that data discovery can lead to major breakthroughs in product creation, marketing, safety and many other areas. This is exactly why data analysts are in such high demand these days – organizations need people who are able to make sense of all the data which they have accumulated over the years, as well as come up with new, creative ways to collect, monetize and control it.
For example, through use of data mining techniques companies and other organizations are able to retrieve a lot of additional information from simple things like product usage statistics and short survey results, also known as dashboarding, data reporting or analysis. Without data mining, they would still have all the hard data, but wouldn’t be able to use it to come up with new ideas which may not be obvious at first glance. Many larger companies are using these kinds of tools every day and getting huge benefits from it. This is especially important when dealing with millions or even billions of data points, as even the simplest conclusions might not be obvious.
How does it work?
While the inner workings of algorithms and techniques used for data mining are incredibly complex, understanding the basic ideas behind them isn’t very difficult. In essence, information discovery aims to simplify the process of processing incredibly large amounts of data (some times called big data) in order to discover new patterns or relationships within various data sets. Here are a few examples:
- Anomaly detection – large data sets normally have one or more common patterns. We can use data mining techniques to detect anomalies within these data sets. These anomalies can then be investigated in more detail.
- Cluster detection – used to detect categories and sub-categories in large data sets. It’s often impossible to do this manually.
- Association learning – often used in advertising and sales to maximize conversion rates or improve user experience. Major websites, such as Amazon and Netflix, use these techniques to recommend products to visitors.
This is, of course, just a basic overview of what data mining is and it’s used in many of the state of the art ERP systems, business intelligence or decision support systems that many companies use today.