Hero Shape

Zooming into Data Black Holes

If you are an astronomer or physicist you will be very familiar with the concept of visible matter – basically the stars, planets, etc and dark matter – the Black Holes. Dark matter seems to make up most of the universe, but it’s difficult to work out how to see or measure it. In 2019, a team of researchers from the Event Horizon Telescope (EHT) published the first-ever image of a black hole and it fetched half of Nobel prize in Physics this year “for the discovery that black hole formation is a robust prediction of the general theory of relativity” to Roger Penrose and rest for the supermassive compact object in our galaxy.

This is analogous to “Enterprise Data” spectrum. Most of what you see through OLTP, EDW, CRM or Mobile App will help in generating insights that lead on to business actions. But, just like dark matter, there is a lot of hidden data in an enterprise, which may not render easily for our analysis. The volume of dark data massively exceeds the volume of operational data.

This dark data is found in logs, in metadata, in text fields and documents, in video, in audio, in pictures. While operational data can be easily analyzed in databases, this dark data needs some complicated maneuvering before it can be analyzed.

Essentially, Dark data is any digital information that is not being used by the companies. Gartner describes dark data as “information assets that an organization collects, processes and stores in the course of its regular business activity, but generally fails to use for other purposes.” Many times, an organization may ignore data dark for practical reasons like data may be adulterated and by the time it can be cleansed, the information may be too old to be useful. According to a study by IBM, over 80% of all data is dark and unstructured and this will rise to 93% in 2021.

The notion of dark data isn’t new thing. In fact it always existed in our IT systems for many years. But with the increased adoption of scalable big data technologies more and more of the dark data is out into the open. The ability of Hadoop clusters and NoSQL databases to process large volumes of data makes it feasible to use such long-neglected Dark Data for big data analytics applications and unlock its business value. Examples include server log files that could provide clues to website visitor behavior, customer call records that include unstructured consumer sentiment data and mobile geolocation data that could reveal traffic patterns which would help with business planning.

Convergence of IoT and Dark Data:

The emergence of IoT is unlocking new possibilities and will add to the already existing pile of data. According to a study, IoT devices may add up to 269 times more data than what is already available. Out of this, 80% will be in the form of dark data which cannot be ignored by any organizations.

Benefits of extracting Dark Data

Although organizations incur an expense and spend considerable engineering effort to extract dark data, there are multiple benefits to doing this.

  • Dark data is valuable because it often holds information that is not available in any other format. Therefore, organizations continue to collect and storing dark data with hopes of exploiting it in the future.
  • With access to more and in-depth information, the quality of analytics improves drastically resulting in faster and better data-driven decision making, which in turn leads to business success.
  • Deriving insights from dark data makes organizations less exposed to risks. Organizations can also delete unnecessary data, thereby reducing the storage expenses.

Dark Data extraction Technology is valuable:

Apart from dark data itself, dark data extraction technologies are extremely valuable.

  • Recently a tech giant purchased Artificial Intelligence (AI) company Lattice Data for $200 million. Lattice Data applied an AI-enabled inference engine to extract dark data.
  • Similarly, a well known philanthropic organization founded by Facebook, bought Meta for an undisclosed amount. Meta is an AI-powered research search engine startup that plans to make it available freely.

Open Source Data-Extraction Tools

  • DeepDive: Developed by Stanford University this open source tool was commercially supported by Lattice Data. However, this tool is no longer active as post acquisition of Lattice Data.
  • Snorkel: This was another tool developed by Stanford University. Snorkel accelerates dark data extraction by developing tools, datasets and algorithms.
  • Dark Vision: This tool uses IBM Watson Supercomputer services to extract dark data from videos, a classic example of dark data extraction.

Circling back to the black hole at the start of this article and to get an appreciation of the size, the subject of the image was a supermassive black hole in the center of the galaxy known as Messier 87 (M87). This image required approximately 4.5 petabytes of astronomy data or half a ton of hard drives to store the image data. Fortunately we have the data extraction/crunching methodologies and tools are available today.

Conclusion:

Almost every bit of dark data collected can offer some insights for the business. For example, geolocation data of customers can share new insights into shopping patterns or information collected about the time a caller spends on a call with customer support agent can show how efficient the agent is, and what are the different bottlenecks in the system. Companies can analyze dark data to develop greater understanding and unveil trends, patterns and relationships that skip our minds during normal business intelligence and analytics activities.

Yes finally the dark data is a potential gold mine that we have just started appreciating!

#DarkData #SmartData #BigData #DustyData #DigitalInformation #BI #BusinessIntelligence #PredictiveAnalytics #IoT #Algorithms #Databases #DataSets #Patterns #DataExtraction #OpenSource #SearchEngine #AI #ML #DataDrivenDecision #Mobile #Geolocation

#Strategy #Management #Consulting #Transformation #Technology #Outsourcing #CreativeDisruptions #EternalQuest #FindingTruth

One thought on “Zooming into Data Black Holes

Leave a comment