Architecting advanced data pipelines using real-time streaming and batch processing technologies
Synopsis
Today, data is one of the most important company resources after human resources. It is therefore important to handle data in an adequate way. More and more organizations choose to advance their data strategies beyond simple data warehousing, so they can analyze bigger and bigger data that is produced in a more real-time basis, in a distributed manner, and with increasing variety. Over the past decade, different technologies have emerged which promise to be the answer to the big and fast data challenge. Organizations utilize these technologies in different ways, as well as different combinations (Dean & Ghemawat, 2008; Chauhan & Saxena, 2022; Arora & Talwar, 2023).
Traditional techniques that move data to a Data Warehouse to transform it with a set of users’ business rules and create a DW model for easy business reporting have proven not only to be slow, but also not to have the capabilities needed to support advanced data analysis. Organizations have been attempting to augment their DW or replace it altogether with best practices that take advantage of flexible techniques. In doing this, they have learned that while the front-end report access process is a critical part of a data architecture, it is not the only component. In addition to the DW (often augmented with big data capabilities), organizations have created separate and unique big data environments to support custom application development, business intelligence, and advanced analytics. Data from both environments are often served to business reports, for true enterprise reporting. Both types of environments have different requirements and play different roles, and we believe that the solutions to the new data challenges require a new data architecture that takes advantage of the strength of both types of environments in combination, each with different technologies tailored to their respective strengths and weaknesses (Elgendy & Elragal, 2021; Malik, 2023).