Why ETL Tools Might Hobble Your Real-Time Reporting & Analytics

Companies report large investments in their data warehousing (DW) or Business Intelligence (BI) systems with a large portion of the software budget spent on extract, transformation and load (ETL) tools that provide ease of populating such warehouses, and the ability to manipulate data to map to the new schemas and data structures.

Companies do need DW or BI for analytics on sales, forecasting, stock management and so much more, and they certainly wouldn’t want to run the additional analytics workloads on top of already taxed core systems, so keeping them isolated is a good idea and a solid architectural decision. Considering, however, the extended infrastructure and support costs involved with managing a new ETL layer, it’s not just the building of them that demands effort. There’s the ongoing scheduling of jobs, on-call support (when jobs fail), the challenge of an ever-shrinking batch window when such jobs are able to run without affecting other production workloads, and other such considerations which make the initial warehouse implementation expense look like penny candy.

So not only do such systems come with extravagant cost, but to make matters worse, the vast majority of jobs run overnight. That means, in most cases, the best possible scenario is that you’re looking at yesterday’s data (not today’s). All your expensive reports and analytics are at least a day later than what you require. What happened to the promised near-realtime information you were expecting?
Contrast the typical BI/DW architecture to the better option of building out your analytics and report processing using realtime routing and transformation with tools such as IBM MQ, DataPower and Integration Bus. Most of the application stacks that process this data in realtime have all the related keys in memory –customer number or ids, order numbers, account details, etc. – and are using them to create or update the core systems. Why duplicate all of that again in your BI/DW ETL layer? If you do, you’re dependent on ETLs going into the core systems to find what happened during that period and extracting all that data again to put it somewhere else.

Alongside this, most organizations are already running application messaging and notifications between applications. lf you have all the data keys in memory, use a DW object, method, function or macro to drop the data as an application message into your messaging layer. The message can then be routed to your DW or BI environment for Transformation and Loading there, no Extraction needed, and you can get rid of your ETL tools.

Simplify your environments and lower the cost of operation. If you have multiple DW or BI environments, use the Pub/Sub capabilities of IBM MQ to distribute the message. You may be exchanging a nominal increase in CPU for eliminating problems, headaches and costs to your DW or BI.

Rethinking your strategy in terms of EAI while removing the whole process and overhead of ETL may indeed bring your whole business analytics to the near-realtime reporting and analytics you expected. Consider that your strategic payoff. Best regards in your architecture endeavors!
Image by Mark Morgan.