Reverse ETL – Definition & analysis of this new category of tools

ETL (or ELT) solutions allow you to extract data from different applications and put it into a data warehouse. Reverse ETL processes, on the other hand, allow you to extract data from the data warehouse to feed all sorts of applications: CRM, advertising tools, customer service, etc.

The potential is enormous; reverse ETLs allow you to have a single source of truth for most business applications, which means no more recurring problems reconciling data from tool A to tool B, or managing flows between different applications.

Why is this type of solution emerging now if the potential is so significant?

Historically, the data warehouse is only the foundation of BI (Business Intelligence). It is used to build reports and large ad hoc queries that are not critical.

If you asked a CIO in the 2000s, it would be an aberration to supply a CRM, a critical application that uses hot data, from a data warehouse.

The new generation of cloud data warehouses (Snowflake, Google BigQuery, AWS Redshift, ..), and the ecosystem that goes around it, change the rules of the game.

The modern cloud data warehouse can become a complete operational repository because it’s much more powerful, easier to maintain, and adapted for all types of queries.

But reverse ETLs are the missing link to make it all happen.

This comprehensive guide will explain everything you need to know about this new element of the modern data stack.

What is reverse ETL? [Definition]

Reverse ETL genealogy: in the beginning was ETL

Reverse ETL is a new family of software that already plays a key role in the modern data stack. But to understand what a reverse ETL is, you first need to understand what’s an ETL because reverse ETL is a result of ETL.

The concept of ETL emerged in the 1970s.

google trends etl world
Source : Google Trends.

ETL stands for Extract, Transform & Load. Before assigning a family of tools, ETL designates a process that tools of the same name can accomplish.

ETL is the process of Extracting data from the organization’s different data sources, Transforming it, and finally Loading it into a Data Warehouse.

ETL tools are used to build the data pipeline between the data sources and the database in which the data is centralized and unified.

The data sources can be:

  • events from your applications,
  • data from your SaaS tools,
  • your various databases,
  • and even from your data lake.

ETL tools develop connectors with the primary data sources to build the data pipeline.

fivetran connecteurs
Fivetran offers over 150 connectors to data sources.

In the past, ETLs were heavy On-Premise solutions, running on heavy Data Warehouses installed on the company’s servers.

But with the advent of Cloud Data Warehouses (in 2012, with Amazon Redshift), a new category of ETL software has emerged: Cloud ETLs.

The cloudification of data warehouses, ushered in by Amazon, has led to a cloudification of ETL tools.

The two emblematic examples of Cloud ETL tools are Fivetran and Stitch Data.

Besides loading data into the data warehouse (or DWH – the destination), ETLs are also used to transform it before integrating it into the database. So it’s not just a data pipeline, but also a laboratory.

We can now understand what’s reverse ETL.

Reverse ETL is a solution for synchronizing DWH data with your business applications

In short, the ETL tool allows you to bring data from your different sources into the DWH to centralize and unify the company’s data. This data is then used to perform data analysis: Business Intelligence.

Reverse ETL has an inverted function of ETL, it is the technological solution that allows you to transfer centralized data from the data warehouse to business applications.

Reverse ETL finally solves a nagging problem encountered by companies.

They normally manage to centralize data in the data warehouse smoothly due to Cloud ETL, but once this data is in the DWH, it’s difficult to get it out of the database and use it in business tools.

Though ETLs are used for BI, they’re rarely used to feed business applications in the absence of simple synchronization solutions; this is where Reverse ETL comes into play.

 

Reverse ETL is a flexible data integration solution for synchronizing DWH data with applications used by marketing, sales, digital team, and customer service, to name a few.

Like Cloud ETL tools, flexibility and ease of use characterize reverse ETLs. Data is processed, transformed, mapped, and synchronized in the business applications using connectors and modulo SQL work.

Without using SQL queries, reverse ETLs allow you to edit the data flows from a visual interface. You choose the database column or table you want to use and create the mapping from the visual interface to specify where you want the data to appear in Salesforce, Zendesk, and so on. No more scripts or APIs.

Once the flow is set up, the data is synchronized in the applications, not in real-time, but in very short batches of about a minute.

Reverse ETLs, such as Octolis, are based on an approach known as “tabular data streaming” instead of the “event streaming” approach. What reverse ETL does is copy and paste tables from the source system (the DWH) into the target system (the business application) at very regular intervals.

Like ETL tools, reverse ETLs are not only data pipelines. They allow you to transform the DWH data and prepare it – i.e., clean the data, create segments, audiences, scorings, and build a unique customer repository.

So why are Reverse ETL solutions so popular today?

Now that we know what Reverse ETL is and how it works schematically, let’s dive deeper into the “why.”

Why do we want to get data out of the DWH?

It took years for companies to centralize and unify their data in a master base: the Data Warehouse Cloud.

Yet many companies are not there and still don’t have a single repository.

But why would you want to get the data that you have carefully centralized in the data warehouse out?

First of all, it is essential to remember that the data remains in the data warehouse in any case. Reverse ETL synchronizes data sets in business applications without moving them. Synchronizing does not mean migrating.

What reverse ETL does is to put this centralized DWH data at the service of business applications.

It is well known that medicine is both a cure and poison. Until now, DWH has been used as a remedy for data silos. But in many companies today, data is siloed in the data warehouse.

Without reverse ETL, the data stored in the DWH is slightly used or not used at all by the business applications.

What is the data used for? As we mentioned above, to do BI and dashboarding.

Thanks to all the work done with SQL, DWH leads to the creation of interesting definitions and aggregates of data: lifetime value, marketing qualified lead, product qualified lead, heat score, ARR, etc. But this business-relevant data is not used directly by the business teams and the tools they use.

With reverse ETL, you can use these definitions, and the associated columns in the DWH, to create customer profiles and audience segments.

stack data moderne
Modern Stack Data with the Cloud Data Warehouse at the core of the system.

With reverse ETL, the data warehouse is no longer just used to feed BI; it feeds business applications directly.

The reverse ETL was the missing piece of the data stack; the piece that prevented this data stack from being truly modern.

What are the use cases of a reverse ETL?

Let’s look closely at the use cases that the reverse ETL tool makes possible.

There are essentially three types of use cases:

#1 Operational Analytics

This new term refers to a new way of looking at analytics.

In the Operational Analytics approach, data is not only used to create reports and analyses but is also smartly distributed to business tools. It is the art to make data operational for business teams by integrating it into their daily tools.

If you think about it, this approach allows you and your teams to become data-driven in all decisions and actions. It’s smooth, easy, headache-free, and doesn’t involve reading indigestible BI reports.

How do you deploy this “Operational Analytics” approach? And how to become data-driven? By using reverse ETL, of course!

Reverse ETL allows you to transform data into the analysis (segments, aggregates) and analysis into action.

Imagine a salesperson who wants to know the key accounts, those on which to focus his efforts?

In the classic, old-fashioned approach, we call on a data analyst who will use SQL to identify high-value leads in the DWH and then display it all in a nice BI table…that no one will read or use.

You can train salespeople to read dashboards and reports, but in practice, it’s always complicated, which holds back many organizations from becoming data-driven. This difficulty in making data and analysis available to business teams prevents the full exploitation of the data available to the company.

With the Operational Analytics approach, there’s no need to train salespeople to use BI reports; the data analyst directly integrates the corresponding data from the data warehouse into a Salesforce custom field.

Reverse ETL allows a data analyst to deploy Operational Analytics as easily as creating a report.

#2 Data flow automation

Reverse ETL allows you to quickly and automatically provide business teams with the data they need at a specific time. Not only does it provide business teams with the data they need in their tools, but it also facilitates the work of data analysts and other data engineers.

For example, if your sales team asks IT which customers are at high risk of churn, reverse ETL makes it easy to answer without spending excessive time extracting data from the DWH.

We could also take the examples:

  • A salesperson who wants to visualize the customers with a lifetime value higher than X€ in Salesforce
  • A customer advisor who wants to see the accounts that have opted for premium support in Zendesk
  • A product manager who wants to access Slack feedback from users who have deployed a particular feature
  • An accountant who wants to synchronize customer attributes in his accounting software.
  • And so on.
Reverse ETL’s use cases.

Reverse ETL allows you to easily and automatically manage these everyday business requests that used to be the hell for the IT team.

In this sense, it addresses a recurring concern in organizations: communication, or rather miscommunication, between IT and business teams.

Harmony between IT and the business is restored without designing APIs.

#3 Reverse ETL, a solution to the increasing number of data sources

One of the modern data stack challenges is to manage the proliferation of data sources. But reverse ETL allows you to take advantage of this formidable data gold mine to create a memorable customer experience.

It serves both purposes:

  • For the customer: Offer them a richer and more relevant experience thanks to more personalized actions (targeted content, distribution channel, and production time). It generates more customer satisfaction.
  • For the company: Increase customer retention and revenue per customer.

Reverse ETL enables the transformation of customer knowledge produced by DWH and BI into an enriched experience for the customer.

Two alternatives to reverse ETL software: Customer Data Platform & iPaaS

There are alternatives to reverse ETL software; our article would not be complete without mentioning them.

Reverse ETL vs. CDP

Customer Data Platforms have been gaining momentum since the mid-2010s. A CDP is an off-the-shelf platform that allows you to build a single customer repository by connecting all of the organization’s data sources. As such, CDP is an alternative to the data warehouse.

The advantage over the data warehouse is that CDP is not just a database for BI purposes; the CDP offers advanced functionalities to:

  • Prepare data for business use cases: segmentation, creation of aggregates, scores, etc.
  • Redistribute it, via native or tailor-made connectors, to business applications.

In short, CDP plays the same role as DWH and reverse ETL. In fact, you don’t necessarily have to choose between CDP and DWH. The same company can indeed combine:

  • A Data Warehouse that will be used for BI.
  • A Customer Data Platform that will enable customer data to be activated and made available to business teams.

Compared to the Data Warehouse – reverse ETL combination, the Customer Data Platform is characterized by:

  • Greater rigidity. The CDP imposes its data models and limits the creation of customized models.
  • CDP is a costly solution, inaccessible to most small and medium-sized businesses.
  • CDP does not promote communication between IT and business teams. CDP is designed for business teams and especially for marketing.

The publishers’ objective is to make the business teams autonomous from IT. But in our opinion, the challenge is to make communication between IT and the business more fluid; not to destroy it.

To deploy complex data use cases, IT has a role to play.

That’s why we prefer the approach of combining the data warehouse with a reverse ETL tool. It offers more flexibility. In short, reverse ETL transforms your data warehouse into a Customer Data Platform.

Reverse ETL vs. iPaaS

An iPaaS is an integration solution in SaaS mode: Integration Platform as a Service. Integromat is probably the most iconic iPaaS solution on the market today.

iPaaS solutions generally offer easy-to-use, visual interfaces that connect applications and data sources.

The way it works is similar to reverse ETL: You select a source, select a destination tool, and edit the mapping to define where the data from the source will fit into the destination tool (the location and the “how”).

The example below shows the design of a mapping between emails and Google Spreadsheet:

Integromat Email Integration GSheets
Integromat – Email Integration – GSheets.

There is no need for APIs, scripts, or SQL, so iPaaS solutions are popular with non-technical people.

An iPaaS allows you to create 1:1 data flows directly between the sources and the destination without going through the data warehouse.

For this reason, iPaaS can be used by companies with limited data integration needs. But it’s not the preferred option for companies that want to build an IT infrastructure around a database that acts as a hub.

Conclusion

The most advanced companies in terms of data already use reverse ETL. And it’s destined to become the norm in companies that wish to exploit their data better. It is a solution that allows you to better use the data stored in the data warehouse.
We will come back in more detail on the issues surrounding this essential data brick.