Data Warehouse ETL is a series of processes that:
- Extract data from across an organization
- Transform data to facilitate consumption
- Load data a relational or non-relational data store
- All for the purpose of facilitating business intelligence, reporting, analysis, data science, and other activities.
Let’s Unpack This:
It’s common business problem. Important data is siloed. As a result, it’s very hard to make meaningful business decisions. One is never certain that decisions are made on all the facts. For example, sales units are using multiple software systems. There are old legacy systems. Some employees are using Excel to keep important data. Data Warehouse ETL is all about getting this data and cleaning it up.
How do we get all the data ?
Adaptors “extract data.” For example, in K3, one adaptor is a “CSV Adaptor.” This adaptor points K3 to a folder such that any CSV dropped into that folder will automatically be ingested into K3. Modern Data Warehouse ETL applications use a wide variety of adaptors to accommodate nearly every data scenario.
How is Data Transformed in a Data Warehouse ?
But, getting all the data together is only part of the problem. All the data looks different. For example, one software system calls customers “Cust.” And another system calls customers “CS.” If we wanted to run a report of our customers we have to align these to be the same. Thus, the transformation part of ETL might take both of these and “harmonize” them under “Customer.” This is but one teeny tiny sliver of an enormous data transformation challenge. Data analysts spend 80% of their time wrangling data. In K3, we have abstracted this into a low-code environment (we put it in a really good user interface).
How is data loaded ?
Once the data is harmonized, canonical, prettied and just generally cleaned up the data is loaded into a SQL database or NO-SQL database. SQL is like Oracle, Postgres, MSSQL, etc. NO-SQL is like Snowflake, Redshift, Mongo, etc. There are trade-offs… It’s a long discussion. Modern ETL apps support a wide variety of load adaptors.
But if we’ve done our job right, Data Warehouse ETL makes decision support more meaningful.