The ETL process (extract, transform, load) is a core aspect of the greater data integration process. It is the driving force behind much of today’s analytic and business intelligence technology. Think of ETL as a framework for transporting data from one point to another and manipulating it as needed so that it can be used effectively. The concept of an ETL architecture framework can be thought of as a data pipeline, but one that works using the specific principles of ETL.
To understand what software can do within an ETL framework, let’s break down its three steps:
- Extract. Data extraction in the ETL process might be from an internal or external source. Any data source, from an Excel spreadsheet on a single computer to a massive company-wide database, could be a starting point for the ETL process, so long as valuable data is harvested.
- Transform. ETL software will take the extracted data and repackage it into a form suitable for the target. This could be a small matter of cleaning up unused records in a database entry or a larger process, such as converting the format of a data file entirely. Re-sorting or recontextualizing the data is often done during the transformational step.
- Load. The target of the ETL process is called the “warehouse.” The data warehouse is whatever system the newly transformed data gets loaded into. The data can then be analyzed or used as needed.
Building an ETL Architecture
Remember the principles when designing a new ETL process from the ground up. Target the data to be extracted intelligently so that the transform step (arguably the most complex step in the average ETL process) does not have more work than necessary. Eliminating redundancies and noise in the extraction step will make the transformation step more efficient and consistent and, in turn, make the load easier.
The blogs below highlight some common questions readers have about this process and provide insight from the experts at BroadPeak.