Challenges with Data Mining
The challenges of big data are prolific and penetrate every field that collects, stores, and analyzes data
- Big Data:
Big data is characterized by four major challenges: volume, variety, veracity, and velocity
- Volume
describes the challenge of storing and processing the enormous quantity of data collected by organizations.
This enormous amount of data presents two major challenges: first, it is more difficult to find the correct data,
and second, it slows down the processing speed of data mining tools.
- Vareity
encompasses the many different types of data collected and stored. Data mining tools must be equipped to
simultaneously process a wide array of data formats. Failing to focus an analysis on both structured and
unstructured data inhibits the value added by data mining.
- Veracity
details the increasing speed at which new data is created, collected, and stored. While volume refers to
increasing storage requirement and variety refers to the increasing types of data, velocity is the challenge
associated with the rapidly increasing rate of data generation.
- Velocity
acknowledges that not all data is equally accurate. Data can be messy, incomplete, improperly collected, and even biased.
With anything, the quicker data is collected, the more errors will manifest within the data. The challenge of veracity is
to balance the quantity of data with its quality.