In the last two posts, we started a conversation about big data. The first post titled, “Two Sides of “Big?” Data” described it as a challenge with two facets of “storage” and “retrieval”. The following post, “The storage side of “Big?” data” discussed the opportunities in the storage space. These posts attempted to contextualize the big data opportunity without regard to technical implementations and/or constraints. The goal is to piece together a flexible solution approach with one that meets critical business needs using big data.
This post discusses “the retrieval (consumption or usage)” side of big data. Traditionally, retrieval is seen as a single function performing three different actions – data access, retrieval, and presentation. There are many data retrieval tools and the list keeps expanding. The challenge is that these capabilities lag the rapid pace of data and user demands. The result is that a few of the historical challenges still persist
-Lack of data awareness
Historically, enterprise users have suffered from lack of awareness about available data and how to use it. Much time and investment have gone into efforts to fix meaning, quality, and timeliness of data and to educate users about it. Such initiatives fell short as these could not keep pace with the growth of data. This has resulted in redundancy, duplication and silos, creating barriers to data sharing and ease of use.
–Uber data collection
The demands for data continue to grow as the new mantra is to catch it all – no matter when, where and how. We are adding new channels, devices and IoT to the mix. The variety, velocity, and volume are certain to scale new peaks without barriers.
We face a steep learning curve as we deploy big data. Users are demanding improved intelligence across operational, tactical and strategic levels. The pursuit of new paradigms such as predictive analytics, data patterns and visualization make it even steeper.
The learning challenge is multi-fold as it involves knowing data, understanding new analytics paradigm, and establishing right context. While you assimilate and ingest all of this, the business environment and technology landscape keeps evolving.
–Variety of data use or purpose
Everyone is a data user without exception. Today, users belong to every stakeholder class spanning the enterprise demanding a wide variety of purposes. This is the most disruptive aspect of big data. With so many users, the requirements become a riddle. Then, there is the lure of expected competitive gains. The hope of unearthing the unknowns and the thrill of predictions have created a buzz.
This list of challenges will keep growing as you find answers to questions like how to satisfy the diverse stakeholder groups and their demands? how to publish data through disparate channels? How and when to engage consumers via variety of devices? The answers may present new opportunities transforming business, people and processes in how they manage, execute and engage within or beyond the enterprise boundaries.
How can we solve the data retrieval challenge?
We started with the suggestion of separating “storage” and “retrieval” as independent activities without any interdependencies. This separation is possible with the creation of the enterprise data backdrop populated using “as is” data as discussed in the last post. It is given that such an enterprise data backdrop will hold much deeper and wider pool of data without boundaries. This poses a retrieval challenge of how to gather the most relevant data to support rich, sophisticated analytics?
This brings us “context pods” – to identify the data needed to support rich analytics processes. The context represents a holistic view of the analytics subject matter in terms of time period, subject mix, business rules and algorithms to derive an outcome. Such definition can lead to the shortest, speediest navigation paths for efficient retrieval of context data. A few guiding principles in definition of the context pods are –
- Context design is iterative and never stops
- Context is not data and cannot be created without data
- Context can only be as good as the available data
- Context design should start early and involve subject matter experts
The context pods are purposed to deliver actionable outcomes to support sophisticated analytics, visualization and customer engagement capabilities. Defining malleable and rich contexts spanning diverse data topics and sources is important to support diverse analytics portfolios. A few of the key success drivers in setting up quality context pods are –
- Trust in the available data – in terms of its meaning and purpose
- Data is free of transformation riddles – available in “as is” form
- On demand availability of the contextual data to support analytics
- Data source from variety of sources, systems and databases
- Different contexts leverage the same data – no duplication of data
The enterprise backdrop plays a critical role in fulfilling a few of the above success drivers. The guided retrieval of the context data from an expansive data pool delivers speed and efficiency. The goal is to maximize available capacity to support complex and intense data processing. The context pods present navigation paths to guide the data access configurations. They can act as vehicles to pre-empt navigation uncertainty through proactive understanding of expected outcomes.
This concludes the overview of the two sides of big data as we ponder how to fulfill the proposed promise while leveraging the existing investments.