Big Data: The big impact, from myths to reality
I would like to start this write up by first congratulating “You” for being the ‘Person of the Year’ in Time’s magazine 2006. Yes, you read it right, “You” were chosen to be the historically important person according to Times. People like you, me or millions of others who anonymously create content on various social media sites like Facebook, Twitter, Youtube, Wikipedia or other platforms were chosen to be the Person of the Year as we control the age of information. From 2006 to 2020, we had a wonderful journey through technological advancement with the booming internet and smart devices connecting the brains of the world in a single space and there is no stopping from that in near future.
These have resulted in generating gazillions of data per day which was not much into attention until the value of these were realized. Once the data started to impact it didn’t take much time to become ‘The new oil’ of the generations. But this overwhelming amount of data soon became a great challenge to be handled properly. That is where the Big data came into the picture and it soon became the trending buzzword of the industry. But what is Big data? How big is really big the big data is? Why big data is a big problem?
Let’s try to dig into the details of the above queries and start with Gartner’s definition as “Big data is high-volume, high-velocity or high-variety information assets that demand cost-effective, innovative forms of information processing that enable enhanced insight, decision making, and process automation.” Feww... that’s difficult information to digest and understand what is big data right? Let’s break the complicated things to start our journey into the world of big data smoothly.
You might by now have the fair amount of idea that the overwhelming amount of data generated from various sources is called big data. But there has been a slight misconception regarding this as big data is not only about size. It does involve size obviously, but not entirely big amount of data is always called a big data. Big data rather being a problem I would like to state it as a paradigm to solve a problem. Simplifying the things for easy digestion, Big Data is a paradigm to solve a problem that can’t be solved by conventional computing systems. The conventional computing systems mean general computers, laptops we use on our daily basis.
Ever wondered how google map predicts the optimized route from source to destinations even in real-time with all the traffic information or how Twitter, Youtube or other platforms tell you the trending hashtags right now? Have you ever surprised to see how e-com sites like Amazon, Flipcart tells you what items you may want with what you already have bought? These are all Big data aspects to solve problems and implement in our day to day business by analyzing big data.
But still, the question remains the same. What is big data? How do you define a problem to be a big data problem or a trivial problem of computations? A problem can be determined as a problem of Big data if it satisfies any of 3 V’s with an additional 2 V’s as of -
If the data size is too big for a conventional computing system to be stored and processed then it is a big data problem in terms of size. In the era of technological enrichment cheap devices capable of storing Terabytes of data are easily available. So it is safe by now to state that storing exabytes of data can be a challenge for the conventional computing system. But there is no such boundary line of size to be a Big data problem. If you can’t store a particular amount of data you start to solve this as an approach of big data problem where size plays a vital role.
Data is not a moving device, right? Then why velocity involves for data? Technically, there are a lot of sensor devices, monitoring devices and smartphones that are generating data may be in little amount but in very quick intervals. You need to store all the data without fail. Can you afford to drop some amount of data that is coming out of a critical patient monitoring systems? What happens if you change the driving directions suddenly and Google map fails to show you an optimized route to your destination? These are the problem for big data to handle it properly in terms of ‘velocity’.
Think of one of the most popular social media websites Facebook. The Facebook users uploads images, post videos, tags locations, attach documents, writes content, release audio. The images can be in .jpg, .jpeg, .png or any other format. The videos can be in .mp4, Avi while the documents can be in .pdf, .docx, .xls or any other format. These can be structured data or unstructured data. There involves a lot of challenges to store all of this variety of data without fail to lead to the problem of big data in terms of ‘variety’.
The various sources of data may be mixed with noises or biases that lead to data abnormality or the problem of data accuracy which have a big impact on analyzing the big data. Several sarcastic reviews of a product may affect the accurate rating of the same or organization may make a wrong decision through a data which is analyzed with statistical bias. These may lead to a greater problem for the quality of data which is further referred to as ‘veracity’.
You have successfully stored a big chunk of data. you have handled high velocity and a rich variety of data with proper removal of data abnormality or biases but you fail to extract any new insight that can value your business then what is the relevance of so much effort and juggle with data? Thus the challenges involved in applying the right algorithmic and statistical approach which extracts the right value for the business is referred to as the Value for big data.
This 5 V’s are the dimension to judge a problem and proceed with the right approach to solve it towards greater business values under the hood of big data. As a concluding remark, I can say the paradigm of big data is so remarkable that it can change or revolutionalize every field of the industry extracting greater value for the business.