This is the second article in the series written by Suhas Marathe on EBN. Suhas writes:
What many companies are beginning to realize is that technology alone cannot lead to data science-driven insights. In terms of the perspective of IT infrastructure, big data appears to be a problem that can be reduced to four variables that are often referred to as the “4 V’s”:
- Volume – the gross quantity of data being managed
- Velocity – the speed and efficiency with which data can be collected
- Variety – the many different forms & formats of data
- Veracity – the fundamental integrity of the data
Yet this definition is actually quite shallow. In order to discover true value from Big Data using data science best practices, a more complete and accurate definition of the 4 V’s is needed. While these attributes represent the key characteristics of ‘big’ data, each poses a problem as one attempts to overcome their magnitude. Many businesses are, in fact, mired in the vastness of the problems presented by these 4 V’s. A superior elaboration of each of the V’s would be as follows:
- Volume – the gross quantity of data being managed, as well as identifying the “particular” or “right” data which is relevant to a given task. An abundance of data does not lead to answers unless there is a “right” focus. Knowing what data to use is critical.
- Velocity – the speed and efficiency with which “right” data can both be collected and made available for consumption to a client. Uninhibited and continuous capture of untethered data cannot exhibit meaningful correlation unless the ‘right’ relationships are established.
- Variety – the many different forms & formats of data as compared to the consumption needs of the client. Variability of format, structure and media may convey dissimilarities resulting in misinterpretation. Knowing the ‘right’ congruent meaning of dissimilar data forms is important.
- Veracity – the fundamental integrity of the data and its relevance to the problem at hand.
Read more on EBN online here.