ELT vs. ETL data integration; What is the difference?

0

This is part of Solutions Review’s Premium Content Series, a collection of columns written by industry experts in mature software categories. In this post, Matillion Manager of Developer Relations Ian Funnell provides a contrast to various data integration technologies; ELT vs. ETL.

SR Premium ContentCloud data warehouses are one of the most popular PaaS solutions out there – 54 percent of companies are using these services, and another 15 percent plan to do so in the near future. Since widespread adoption began around 2013, cloud data warehouses have provided almost infinitely scalable computing power. With massive and ever-expanding amounts of data, it is necessary to make sense of everything through an integration process known as Extract, Transform, Load (ETL).

While ETL has been around for many years and is commonly associated with on-premises technology, the accelerated shift to the cloud has made another approach more prevalent – Extract, Load, Transform (ELT). Despite the increasing popularity of ELT in the cloud-driven world, many people across the technology industry still use the common acronym ETL when referring to ELT, despite fundamental differences between the two. While the letters in each acronym represent the same words, order is important and mixing the two could prove very costly to the organization. Both ETL and ELT offer unique advantages – and organizations need to understand how to differentiate between the two when deciding which to apply to individual use cases.

ELT vs. ETL; What is the difference?

ETL is the traditional method and involves extracting data from various sources, transforming the data on an ETL server, and then loading the transformed and integrated data into a data warehouse, ready for analysis and visualization. Usually running on a local server, ETL is a very computationally intensive process that requires a lot of data transfers, analysis, transformations, and searches.

The newer, cloud-based approach – ELT – involves the same activities, but instead of transforming the data in a separate ETL engine, the power of the cloud warehouse itself is used to transform the raw data loaded. ELT is also very well compute intensive, but the work takes place within the cloud data warehouse, an environment optimized for these workloads.

Although both serve similar purposes, the location where the transformation takes place is an important differentiator between the two approaches. While there’s no clear-cut answer as to what’s “better,” there are various scenarios where one would make more sense — and cost-effectively — than the other.

The cost of ETL and ELT

Performing sophisticated transformation and integration tasks on a large amount of data is an intensive process that can get expensive. ETL required two powerful architectural components: an ETL server and a cloud data warehouse. Because the environments running ETL software aren’t as scalable as cloud data warehouses are today, these traditional environments tend to reach their limits as data volumes grow and workloads become more complex. This can create bottlenecks in the data supply chain and adversely affect reporting and analysis. As a result, companies could miss opportunities by making business decisions with outdated, incomplete, or inaccurate data.

On the ELT front, transformations are performed in the cloud database, meaning that only one powerful architectural component required – the cloud data warehouse itself. ELT requires the same computing power as ETL, but the data is less copied from place to place. Providing the right amount of disk space and performance can be expensive, and without it, performance and queries will suffer. Cloud data platforms are less expensive than on-premises architectures, but these are still significant costs that decision makers need to keep in mind. ETL can also be more time-consuming, requiring additional writes at each step of the process, costing data teams valuable time.

How to determine which process to use

Understanding the differences between ETL and ELT is critical to ensure an organization is using the right approach to meet their needs. Ideally, the choice between ETL and ELT should be decided on a project-by-project basis. Below are some scenarios where one would be a better option over the other:

ETL:

  • When the data in question is predictable, comes from few sources, and transformations are minimal, ETL can be a more effective strategy.
  • ETL is better adapted to work with traditional databases than modern MPP platforms.
  • Due to the specialization of ETL computing platforms, ETL is more easily able to process unstructured source data.

ELT:

  • If the specific use case of the data is not fully understood, ELT leaves the freedom to transform it at a later date once the use case becomes clearer.
  • ELT is better tuned to work with modern MPP platforms.
  • ELT provides the ability to simultaneously present the same data in multiple different ways, such as Data Vault, 3rd Normal Form, and Star Schema. This can be useful to enable data sharing and self-service.

It is clear that both ETL and ELT can be beneficial to an organization. Organizations need to be curious about the differences between the two and understand the best use cases for each, especially in a climate of increasingly tight IT budgets and the pursuit of more efficient and effective workflows. As we look to the future of data, analytics and the cloud, it is important to consider these differences and the associated costs to ensure business success.

Ian Funell
Recent posts by Ian Funnell (See everything)
Share.

Comments are closed.