Illuminate the dark data to make the big digital leap

With the understanding of structured and unstructured data, now the discussion moves to the next level: what are the IT and business challenges associated with dark data? 

Sanjay Agrawal Mar 25th 2019 A-A+

Gartner has defined dark data as “the information assets which organizations collect, process, and store during regular business activities, but generally fail to use for other purposes.” How does dark data differ from unstructured data? Unstructured data refers to the content that does not conform to a specific, pre-defined data model while structured data is an organized repository readily available for more effective processing and analysis.

Comparing the above definitions, we can analyse that dark data can appear in both structured and unstructured repository.

Majority of data in the unstructured category is dark, while a small fraction of it can come from structured assets as well. With this understanding, now the discussion moves to the next level: what are the IT and business challenges associated with dark data?  

Dark data, most of which is unstructured data that grows many times faster than structured business data, is retained by enterprises by deploying huge storage, backup and management infrastructure with a large IT budget being spent. IT is struggling to know what data they have.

On the business front, as the insights the dark data can provide aren’t being leveraged, it doesn’t add any value to the business.

In this context, addressing this challenge requires finding ways to extracting value from these clutters. Traditional methods of data analytics no longer address the three dimensions of big data: volume veracity and velocity. The diverse mix of content from disparate sources such as audio, video, PDFs, social feeds, IVRs and emails need to be curated in a secure repository that can be accessed across multiple users, applications and workloads on premise or cloud. 

How object storage and analytics software sheds light on dark data

Object storage enables enterprises to deal with the drastic growth of data while improving ease of use, providing flexibility to scale capacity and performance independently to address provisioning management issues, and meet a variety of workloads. Treating an object storage solution as a big data reservoir or scalable and centralized data hub enables analytics-based applications to blend structured and unstructured data together for business intelligence and visualization workloads. The custom metadata that object storage solutions attach to files as a form of detailed enrichment gives unstructured data more context and makes it easier to discover and search. Blending unstructured and structured data together improves the enterprise’s ability to gain more relevant insights from a more complete set of data.

For example, banks used to create their customer’s profile by looking at all the business transactions across their product lines and delivery channels. Today banks are embarking on a journey wherein customer profiles are not only created from the business that their customers do with banks, but also from their interactions, sentiments, preferences, online behavior etc. This new profile helps banks to take more informed decisions in the areas such as customer retention, offers etc.

Hitachi Vantara recently partnered with IDC to survey 1392 IT professionals and executives in India. The IDC survey revealed low awareness of object storage among enterprises in India, with 39 percent of surveyed enterprises in India unaware of the technology. It is therefore important to raise awareness of object storage technology among Indian enterprises.

Enterprises should focus on building a distributed object storage system that can evolve and scale according to their future requirements helping IT with optimized infrastructure. With multiple storage tiers and configurable attributes, the object storage system can create virtual content platforms that can be subdivided for better organization of content, policies and access. The system should also promise high levels of elasticity, secure file sharing, collaboration and synchronization. 

Metadata associated with object storage helps in bringing desired quality in unstructured data, making these fit for discovery and analysis.

Blending of above data with structured data sources or enterprise data warehouses and analyzing the same, starts bringing in much higher value to the business.

As the industries brace for the next round of digital revolution driven by AI and IoT, data analytics will emerge as the key competitive differentiation. Undoubtedly, customer communication data will hold key to new business opportunities and those who make intelligent moves will gain the competitive edge.  IDC estimates that organizations that analyze all relevant data and deliver actionable information will achieve an extra USD 430 billion in productivity gains over their less analytically oriented peers by 2020.

Sanjay Agrawal is the Technology Head at Hitachi Vantara.

Disclaimer: This article is published as part of the IDG Contributor Network. The views expressed in this article are solely those of the contributing authors and not of IDG Media and its editor(s).