Focus on Segment’s limitations and the best alternatives

While Segment is a powerful and relevant DMP and/or CDP solution, it is not the most appropriate for all business models.

The reason? Prices climb pretty quickly, especially for B2C players; the lack of an independent database and the rigidity of the data model limits your ability to strengthen your business intelligence.

Why are alternatives on the rise? The emergence of the modern data stack, through the crucial “single source of truth” role that your cloud data warehouse now plays, is an excellent opportunity to evolve towards a leaner, more flexible infrastructure, thanks to an independent database, and significantly less expensive for your customers’ data management.

Do you hesitate to choose Segment? Are you looking for alternatives? We have prepared a nice resource for you with a review of the alternatives to the main Segment modules: Connections, Personas, and Protocols.

Segment Alt resource

Access our comparison of the best alternatives to Segment

To directly access the comparison of the best alternatives to Segment, we invite you to click on the button above.

What is Segment?

From a web-tracking tool to a market-leading CDP

Founded in 2011, Segment was initially a web tracking tool in SaaS mode, allowing companies to track all the events that occur on the website, link them to a user ID, and store all weblogs in a data warehouse. With mid-market positioning (SME-ETI) and B2B, Segment was one of the first tools to democratize the extraction and storage of weblogs for BI purposes and customer experience personalization.

Slowly, Segment has broadened its functional spectrum. The platform has developed its integration capabilities with the company’s other data sources and tools. From a web-tracking tool, Segment has become a platform for managing CRM, marketing, sales, customer service data… In short, Segment has become a Customer Data Platform capable of connecting, unifying, and activating all customer data (essentially first-party) of the company.

Let’s go even further: Segment is one of the leading players in the CDP market. In 2020, Segment generated $144 million in revenue and was acquired by Twilio for a whopping $3.2 billion. The start-up has become a giant and has more than 20,000 clients, including IBM, GAP, Atlassian, and Time magazine.

segment alternatives
Source: Get Latka

Discovering the functional scope of Segment

Segment essentially allows for (1) connecting the different sources of customer data of the company, (2) building a unique customer vision and audience, and, finally, (3) monitoring the quality and integrity of data. These are the three main modules offered by the platform: Connections, Personas & Protocols.

#1 Connecting data [Connections]

“Connecting” a data source to a Customer Data Platform such as Segment involves generating events related to the behavior of visitors of the website or web application. Segment turns web behaviors into events and events into actionable data.

To set up connections, Segment offers a library of APIs but also, and this is its strength, a vast library of native connectors.

segment connections catalog
Segment offers an extensive library of native connectors (300+).

In addition to the impressive library of sources and destinations available, Segment handles very well:

  • The transformation of events. Some data types need to be transformed before being injected into other destination tools. The “Functions” module helps process basic event transformations before sending them to external applications with “only ten lines of JavaScript”. Segment also offers a no-code data transformation and enrichment feature only available as part of its Business offering.
  • Data synchronization in the data warehouse. Segment supports leading data warehouse solutions: Redshift, BigQuery, Postgres, Snowflake, or IBM DB2. However, the sync frequency is limited to 1 or 2 per day with the Free and Team plans. It can be much shorter, but you will have to upgrade to the Business plan, which is much more expensive.

Connecting to data sources is the most technical step in Segment. It requires the Tech/Data team’s involvement.

#2 The 360 ​​Customer View and Building Segments [Personas]

Once connected, data can be unified around a single customer ID. Segment offers a module (called “Personas”) that allows you to view all the data related to a particular customer and to access the famous “single customer view” or “360 customer view”. Customer data can then be used to build segments, i.e., lists of contacts sharing defined criteria (socio-demographic, behavioral, etc.). The audience segments can then be activated in the destination tools: MarTech and AdTech.

Segment’s “Personas” module is user-friendly, and usable by business teams with complete autonomy. Note that “Personas” is only accessible in the “Business” plan.

segment alternatives

Good to know

As with the vast majority of Segment’s advanced features, Personas is only available in the Business plan.

#3 Data Quality Management [Protocols]

The third key module of the Segment platform is called “Protocols” and is used to monitor data quality and integrity. It should be specified that there are many “Best of Breed” technological solutions offering advanced Data Quality functionalities. For example, in Metaplane or Telm.ai., Octolis, Data Quality functions are native, which means that you do not need to further invest in a third-party solution or module to manage the quality of your data.

Discover the alternatives to Segment

You now know the three main modules of Segment. For each of these modules, we offer you the best alternatives.

segment alternatives ressource notion

Access our comparison of the best alternatives to Segment

The main disadvantages of Segment

We have presented Segment, its history, and its features. Unquestionably, Segment is a good tool. It would be absurd to call this evidence into question, but Segment has several limitations to which we would like to draw your attention in this second part.

There are two main limitations: rapidly rising prices and a lack of data control.

Limitation #1 – Segment’s prices are increasing rapidly

Segment offers pricing based on the number of visitors tracked per month (MTU: monthly tracked users) on the different sources (website, mobile application, etc.). This pricing model is suitable for companies that generate significant revenue per user and have very active users (over 250 events per month). Beyond 250 events per month and per user on average, you must switch to the “Business” Segment plan with personalized prices (on quote).

If you plan to use Segment as your Customer Data Platform, you will quickly reach a budget of $100,000 per year, especially if you are a B2C company. In B2C, the number of events, segments, and properties is always higher than in B2B.

Segment has not been able to adapt its offer to suit the needs and constraints of companies wishing to use the platform to deploy CDP use cases.

alternatives segment pricing
The 3 formulas offered by Segment

Let’s take two examples:

  • You have a website that totals 100,000 unique visitors with three page views per month on average per visitor. The monthly subscription for 100,000 tracked visitors is around $1000 per month.
  • Let’s imagine that the site dedicated to your CRM generates around 8000 MTUs for an average of 200 events per MTU. In this case, Segment will cost you around $120 per month because you stay under the Team plan’s 10,000 MTU limit.

Limitation #2 – Segment does not give you complete control over your data

All logs are stored on Segment’s servers. You can send all the logs to your data warehouse if you have one, but you must pay a supplement. In our opinion, this is one of the main disadvantages of a solution like Segment.

Because of or thanks to the tightening of the law on personal data protection (RGPD in particular), first-party data has to be stored by the company in its data warehouse and not in the various software and SaaS services. This is the best way to keep complete control over your data.

The fact that the logs are stored in Segment also poses another problem: you are forced to comply with a data model that is not necessarily adapted to your company. Segment offers a data model limited to two objects: users and accounts, and in most cases, a user can belong to only one account.

In which cases can Segment remain a good choice?

Segment can remain a relevant choice despite the limits we just recalled in some instances. To simplify, we can say companies that meet the following criteria may find interest in choosing this platform:

  • You are a B2B company with few users/customers.
  • You have a small IT/Data team.
  • The volume of events is low or medium.
  • Money is not a problem for your business.
  • You want to deploy standard use cases.

From a certain level of maturity and development of your use cases, you will have more advanced needs in terms of tracking and aggregates. This means you will have to activate the “Personas” module that we presented to you above. Be aware that this additional module is charged extra… and is very expensive. At that point, you will be faced with an alternative: stay on Segment and be ready to pay 100k€ per year…or change architecture and opt for the implementation of a modern data stack.

Modern Data Stack offers more and more alternatives to Segment

Let’s repeat once again that Segment is undoubtedly a perfect tool; the problem is not there. Nonetheless, we believe it belongs to a family of tools (off-the-shelf CDPs) that is already outdated.

Off-the-shelf CDPs limits

Off-the-shelf Customer Data Platforms had their heyday in the late 2010s. For some time now, new approaches to collecting, unifying, and transforming customer data have emerged. We’ll walk you through the modern approach in a moment, but first, here are the main limitations of on-the-shelf Customer Data Platforms of which Segment is a part:

#1 CDPs are no longer the single source of truth

Increasingly, it’s the course of history as we have seen; data is stored and unified in cloud data warehouses like BigQuery, Snowflake, or Redshift. The data warehouse (DWH) centralizes ALL the data used for reporting and BI, unlike Customer Data Platforms which only contain data generated via connected sources: essentially customer data in the broad sense.

#2 CDPs tend to generate data silos

This happens for two main reasons. First, CDPs are built By Design for marketing teams. The publishers highlight this feature… except that it doesn’t offer only advantages. Why? Because it leads the marketing teams, on the one hand, and the data teams, on the other hand, to work each in their corner on different tools. We end up with two sources of truth:

  • The Customer Data Platform for the marketing team.
  • The data warehouse or data lake for the IT team.

A CDP empowers the marketing team from IT but promotes the compartmentalization of the two functions and their misalignment.

On the contrary, we are convinced that the marketing and IT/Data teams must work hand in hand.

#3 Standard CDPs have limited data preparation & transformation capabilities

Conventional Customer Data Platforms have limited data transformation capabilities. This problem echoes the data models’ issue. Data transformations are only possible within the framework of imposed data models.

The lack of data models’ flexibility offered (or imposed…) by the CDP leads to organizing the data in a way that does not always make sense from a business point of view.

#4 Lack of data control

We have already highlighted this problem. Storing all the data in your CDP poses privacy and security issues. It has become more and more essential to store data outside the software, in an autonomous database managed by the company itself. This brings us to the next point.

What is the purpose of data control?
Data control is not “nice to have”; it’s a must-have. Find out why it’s essential to stay in control of your data.

The Rise of Cloud Data Warehouses

A lot has changed in a decade in collecting, extracting, circulating, storing, preparing, transforming, redistributing, and activating data. The most significant development is that modern cloud data warehouses now play a central role. The DWH becomes the pivot of the information system, the center of the IT architecture around which all the other tools gravitate.

Amazon played a decisive role in this revolution with the launch of Redshift in 2012. The game was changed by the collapse of storage costs and the exponential increase in the computing power of machines. This has led to a democratization of data warehouses. Today, a small business with limited needs can use Redshift for a few hundred dollars a month. For information, the classic data warehouse annual license, “On-Premise”, easily reaches 100k€…

stack data moderne
Typical diagram of a modern data stack, with the cloud data warehouse as the pivot.

Cloud data warehouses have become the new standard for most organizations. They are used to store all data, including customer data, but not only. All company data can be centralized and organized there.

Understanding the role of Reverse ETL

Cloud data warehouse solutions have experienced significant growth since 2012. Gafams have almost all entered this market: Google has developed BigQuery, Microsoft has launched Azure, etc. We have also seen the emergence of pure players like Snowflake, which is experiencing spectacular growth.

alternatives segment snowflake stats
Source: Get Latka

But there was a lack of a functional brick allowing warehouse data to be synchronized in the activation software so that it would not only be used for reporting. A new family of tools appeared at the end of the 2010s to fulfill this function: Reverse ETLs.

A Reverse ETL synchronizes the data from the DWH in the operational tools: Ads, CRM, support, Marketing Automation… Therefore, it does the opposite of an ETL which is used to send data back to the data warehouse. Hence the name “Reverse ETL”. With a Reverse ETL:

  • You control your data because it remains in your data warehouse: the Reverse ETL is a synchronization tool. Your data never leaves the DWH.
  • You can create custom data models, far from being limited to the two objects offered by Segment (users and accounts).

Modern data warehouses and Reverse ETLs draw a new architecture: the modern data stack. With these two technologies combined, your data warehouse becomes your CDP. This architecture makes it possible to implement the “Operational Analytics” approach which, in a nutshell, consists of putting data at the service of business operations and no longer solely at the service of analytics.

Discovering Modern Data Stack

Modern data stack is the architecture of making the data warehouse the sole source of truth for the IS and using a Reverse ETL to activate DWH data in operational software. Check out our complete guide to Modern Data Stack.

Access our comparison of the best alternatives to Segment

Pour accéder à la ressource, il vous suffit de cliquer sur le bouton ci-dessous
Une fois dans notre espace dédiée, vous découvrirez d'autres ressources structurantes, les plus complètes nécessitent une inscription rapide mais sont toutes gratuites ! Avec un peu de chance, vous aurez une bonne surprise, il y aura d'autres ressources qui vous seront utiles 😊

👉 J'accède directement à la ressource

Definition and analysis of Modern Stack Data

A Data Engineer cryopreserved in 2010 and woken up by mischief today would no longer understand much about modern stack data.

Remember, in only a few years, the way of collecting, extracting, transporting, storing, preparing, transforming, redistributing, and activating data has completely changed.

We changed the world, and opportunities to generate business through data have never been greater.

What does the modern data stack look like?

We can start with a macro diagram.

stack data moderne

The most striking development is the centralized place occupied (gradually) by the Cloud Data Warehouse, which has become the pivotal system of the IT infrastructure. From this flow all other notable transformations:

  • The exponential increase in computing power and storage costs’ collapse.
  • The replacement of traditional ETL tools by EL(T) Cloud solutions.
  • “Self-service” Cloud BI solutions development.
  • The Reverse ETLs’ recent emergence that allows data from the Cloud Data Warehouse to be sent down into business tools, as to finally put the stack data at the service of the marketing stack.
stack data moderne dnowpl
Source: Snowplow Analytics.

Let’s get to the heart of the matter.

We’ll introduce you to the modern data stack outlines; we chose two angles:

  • The historical angle: What led to the emergence of modern data stack?
  • The geographic/topographic angle: We’ll review the different bricks that make up this modern data stack.

🌱 The changes behind the modern data stack

Modern data stack defines the set of tools and databases used to manage the data that feed business applications.

The Stack data’s architecture has undergone profound transformations in recent years, marked by:

  • The rise of Cloud Data Warehouses (DWH) which are gradually becoming the main source of data. The DWH is destined to become the pivot of the data stack, and we’ll have the opportunity to talk about it at length in our blog posts. If you still believe in the on-the-shelf Customer Data Platform, abandon all hope.
  • Switching from ETL (Extract-Transform-Load) to EL(T): Extract – Load – (Transform). “ETL” is a concept, process, as well as a term tool (ETL software). In a modern data stack, data is loaded into the master database before being transformed via EL(T) solutions in the cloud, which are lighter than traditional ETL tools.
  • The growing use of self-service analytics solutions (like Tableau) to do BI, generate reports, and other data visualizations.

The rise of the Cloud DataWarehouses (DWH)

Data Warehouse technology is as old as the world, or almost. In any case, it’s not a new word. And yet, we’ve seen a dramatic transformation of the Data Warehousing landscape over the past decade.

Traditional DWH solutions gradually give way to cloud solutions: the Cloud Data Warehouse. We can precisely date this evolution: October 2012, the date of marketing Amazon’s Cloud DWH solution: Redshift. There’s a clear before and an after, even if Redshift is losing ground today.

The main impetus that birthed the modern data stack came from Amazon, with Redshift. All the other solutions on the market that followed owe a debt to the American giant: Google BigQuery, Snowflake, and a few others. This development is linked to the difference between MPP (Massively parallel processing) or OLAP systems like Redshift and OLTP systems like PostgreSQL. But this discussion deserves a whole article that we’ll probably produce one day.

In short, Redshift can process SQL queries and perform joins on huge data volumes 10 to 10,000 times faster than OLTP databases.

But note that Redshift is not the first MPP database. The first ones appeared a decade earlier, but on the other hand, Redshift is:

  • The first cloud-based MPP database solution.
  • The first BDD MPP solution that’s financially accessible to all companies. A small business with limited needs can use Redshift for a few hundred euros per month while you have to count close to 100k€ of annual license with the classic On-Premise solutions.

In recent years, BigQuery and especially Snowflake have risen in power. These two solutions now have the best offers on the market, both in terms of price and computing power. Special mention for Snowflake, which offers a very interesting pricing model since storage billing is independent of computing billing.

But because we have to give back to Caesar his due – Caesar being Redshift here, let’s remember these few figures:

  • RedShift was launched in 2012.
  • BigQuery, Google’s Cloud DWH solution, only integrated the SQL standard in 2016.
  • Snowflake only became mature in 2017-2018.

What changes with Cloud Data Warehouse?

The advent of Redshift and other Cloud Data Warehouse solutions that followed have made it possible to improve on several levels:

  • Speed. This is what we have just seen. A Cloud DWH can significantly reduce the processing time of SQL queries. The slowness of calculations was the main obstacle to the massive exploitation of data. Redshift broke many barriers.
  • Connectivity. The Cloud makes it much easier to connect data sources to the Data Warehouse. More generally, a Cloud DWH manages far more formats & data sources than a traditional data warehouse installed on company servers (On-Premise).
  • User access. In a classic, “heavy” Data Warehouse installed on the company’s servers, the number of users is deliberately limited to reduce the number of requests and save server resources. This classic technological option, therefore, has repercussions at the organizational level:
    • DWH On-Premise: Managed by a central team. Restricted/indirect access for end-users.
    • Cloud DWH: Accessible and usable by all target users. Virtual servers allow the launching of simultaneous SQL queries on the same database.
  • Flexibility & Scalability. Cloud Data Warehouse solutions are much more affordable than traditional On-Premise solutions (such as Informatica or Oracle). They are also and above all far more flexible, with pricing models based on the volume of data stored and/or the computing resources consumed. In this sense, the advent of Cloud Data Warehouses has made it possible to democratize access to this type of solution. While classic DWHs were cumbersome solutions accessible only to large companies, Cloud DWHs are lightweight, flexible solutions accessible to a very small business/startup.

The transition from ETL solutions to EL(T)

Extract-Transform-Load = ETL while Extract-Load-(Transform) = EL(T).

Listing these acronyms makes the difference quite easy to understand:

  • When using an ETL process (and the ETL tools that allow this process to operate), the data is transformed before loading into the target database: the Data Warehouse.
  • When you use an EL(T) process, you start by loading all the structured or semi-structured data into the master database (DWH) before considering the transformations.

What are the underlying issues of such an inversion? It’s quite simple.

Transformations consist of:

  • adapting the format of data to the target database,
  • cleaning,
  • deduplicating,
  • carrying out some treatments on data from the sources to adapt them to the design of the Data Warehouse and avoid cluttering it too much.

That is the challenge.

Transforming before Loading helps to evacuate part of the data and therefore, avoids overloading the master database too much.

Indeed, that is why all traditional Data Warehouse solutions worked with heavy ETL solutions. It was vital to sort before loading into the DWH with limited storage capacities.

With Cloud Data Warehouses, storage cost has become a commodity, and computing power has increased dramatically.

Result: No need to transform before loading.

The DWH On-Premise – ETL On-Premise combination gradually gives way to the modern Cloud DWH – EL(T) Cloud combination.

Loading data into the Data Warehouse before any transformations avoids asking strategic and business questions when extracting and integrating data into the DWH.

Pipeline management’s cost is considerably reduced; we can afford to load everything into the DWH “without getting carried away” – and thus, we do not deprive ourselves of future use cases for the data.

The trend toward self-service Analytics

We have talked about the Cloud Data Warehouse, which is becoming the backbone of the modern data stack. Upstream, we have the EL(T) tools that connect the multiple data systems and the data warehouse. Data Warehouse Cloud data is then used for BI, data analysis, dashboarding, and reporting.

The advent of Cloud DWH has contributed to “cloudify” not only integration solutions (ETL/ELT) but also BI solutions.

Today, we have dozens of cloud-based BI solutions vendors on the market that are affordable and designed for business users. These easy-to-use solutions offer native connectors with the main Cloud Data Warehouses on the market.

Power BI, Looker, or Tableau are reference Cloud BI solutions:

Source: Medium.
  • A solution like Tableau allows you to connect all data sources in a few clicks and create tailor-made reports based on simplified data models.
  • A BI solution allows overall performance management based on omnichannel attribution models, unlike reporting modules offered by business applications or web analytics solutions (Google Analytics, etc.).
  • A tool like Looker, connected to the Data Warehouse, disrupts data analysis. BI is one of the main use cases of a Data Warehouse. With Cloud DWH’s advent, the development of BI SaaS solutions was inevitable. And it happened.
  • Cloud Data Warehouse, EL(T), “self-service” analytics solutions: closely linked, these three families of tools are the cornerstones of a modern data stack.

🔎 Zoom on modern data stack’s bricks

We will now review the main bricks that make up the modern data stack in more detail, starting from the diagram presented in the introduction.

Typical diagram of a modern data stack

We appreciate this modern data stack diagram proposed by a16z.

stack data moderne mapping
Source: a16z.

From left to right, we find:

  • The data source – all systems, bases, and tools providing data that can be internal or external (enrichment solutions, etc.). Linked to the development of digital, we see the explosion not only of data volumes but also of data sources – and therefore of formats and data structures. This effervescence is both an enormous potential and a great challenge.
  • Data ingestion and/or transformation solutions. Here we find all technologies for carrying out the process Extract – Load and possibly Transform: EL(T), which means the solutions that allow the routing (with or without transformations) of data coming from the sources in the master database(s).
  • The master database(s) for storing data. There are two families of solutions here: Cloud Data Warehouses and Data Lakes. The DWH stores structured or semi-structured data, while the Data Lake can store any data type.

The Data Lake is a bathtub in which all data is poured in bulk without any transformations or processing, in its raw state. This Data Scientists’ tool serves for very advanced data use cases such as Machine Learning.

The Data Warehouse remains a “warehouse” organizing data in a structured way, even if its capacities to integrate semi-structured data are clearly increasing. These capabilities’ development elsewhere makes DWH increasingly pivotal – unlike the “pure” Data Lake, which increasingly plays a secondary role. We shall return to it.

  • Data preparation and processing tools. We have seen the Cloud Data Warehouse tends to become the reference tool for transforming data via SQL. Many solutions can support the DWH in this data transformation process for BI or business uses. Preparation and transformation tools design the widest and most heterogeneous family of data solutions.
  • BI Tools and Enabling Tools, which are Cloud Data Warehouse data destination tools. The DWH, basically used for BI, is increasingly used to feed business applications in near real-time. This is where Reverse ETLs like Octolis come in. We will introduce you to Reverse ETLs’ role in modern data stack in a few moments.

Let’s now review each of these bricks of the modern data stack.

The Cloud Data Warehouse

The Cloud DWH is the foundation of the modern data stack, the pivotal solution around which all other tools revolve.

It stores structured and semi-structured enterprise data and is not just a database. It’s also a data laboratory, a veritable machine. It is a place of data preparation and transformation via one main tool: SQL, even if Python is increasingly used (but that is another subject).

Légende : Medium. Mai 2020. Redshift plafonne, BigQuery monte, Snowflake explose.

The Cloud Data Warehouse is sometimes built downstream of a Data Lake which serves, as a catch-all, a tub of data stored in its raw state.

We can well use both Data Lake and Cloud Data Warehouse. You don’t necessarily have to choose between the two technologies. To be honest, they fulfill different roles and can be complementary…even if it’s a safe bet that the two technologies are called to merge.

Also, note that some actors, such as Snowflake, offer integrated Cloud Data Warehouse and Data Lake solutions. It’s a possible article’s title: Are Data Lake and Cloud Data Warehouse destined to merge? Though it’s not the subject of this article, it is a debate that stirs many expert heads!

However, the entire modern data stack is organized around the Cloud Data Warehouse, connected or merged with the Data Lake.

EL(T) solutions

As seen in the first section of the article, EL(T) solutions prevail over traditional ETL tools. This evolution reflects a transformation in the data integration process, a significant evolution in building the data pipeline.

Source: AWS. ETL Vs ELT.

A question you may have asked yourself: Why put “T” into parentheses?

For a simple and good reason, the tool used to build the data pipeline between the source systems and the Cloud Data Warehouse no longer needs to transform the data.

EL(T) Cloud solutions (Portable, Fivetran, Stitch Data, etc.) are primarily used to organize piping. This is their main role. It is now Cloud Data Warehouse solutions and third-party tools that support the transformation phases.

A Cloud DWH solution helps transform data tables with a few lines of SQL.

We’ll have to talk about this evolution again: Most Data Preparation and Data Transformation operations can now be performed in the Cloud Data Warehouse itself, using SQL.

Data Engineers (in Java, Python, and other Scala) are increasingly leaving Data transformations to Data Analysts and business teams using SQL. This also raises a real question: What role for the Data Engineer tomorrow? Its role in the organization and maintenance of modern data stack is not assured.

The goal of a modern data stack is to empower the end-users of the data. It’s increasingly at the service of the business teams and the marketing stack they handle.

The modern data stack breaks down barriers between data and marketing; it is the sine qua non of efficient Data-Marketing, the condition for becoming truly “data-driven.”

Preparation/transformation solutions

In a modern data stack, data preparation, and transformation take place:

  • Either in the Cloud Data Warehouse itself, as we have seen.
  • Or downstream of the Cloud Data Warehouse, via ETL tools.

Or the most frequent case, by the DWH, reinforced by third-party tools.

Data preparation or transformation is the art of making data usable. This phase consists of answering a simple question: How to transform raw data into a data set that the business can use?

An example of a raw data preparation solution: Dataform.

Data transformation is a multifaceted process involving different processing types:

  • data cleaning,
  • deduplication,
  • setting up tailor-made data update rules,
  • data enrichment,
  • creation of aggregates,
  • dynamic segments, etc.

Preparation and transformation tools are also used to maintain Data Quality. Because data “transformation” refers to operations of a different nature, it’s not surprising that the modern data stack hosts several tools belonging to this large multifaceted family.

Data management solutions (Data governance)

Cloud Data Warehouse’s accessibility and usability by a large number of users is, of course, a positive thing. The potential problem is the chaos these expanded accesses can cause in terms of Data Management.

To avoid falling into this trap, the company must absolutely:

  • Integrate one or more Data Management tools into the data stack
  • Document and implement data governance rules.

Issues around Data Governance are more topical than ever

The first reason we have just mentioned is open access to and editing data stack solutions.

The second reason is the explosion of data volumes which requires the implementation of strict governance rules.

The third reason is the strengthening of the rules governing the use of personal data. The famous GDPR in particular…
Data governance is a sensitive subject that organizations generally deal with inadequately. It’s not the sexiest subject, but it clearly needs to be integrated into the roadmap.

Reverse ETL solutions

Let’s end with a whole new family of data solutions, much sexier and with a promising future: the Reverse ETL. We will soon publish a complete article on the Reverse ETL, its role, and its place in the modern data stack. Let’s summarize here in a few words the issues and functionalities offered by these new kinds of solutions.

The challenge is straightforward: Data from various and varied data sources goes up in the Cloud Data Warehouse but still has a lot of trouble going back down into the management activation tools: CRM, Marketing Automation, ticketing solutions, e-commerce, etc.

Reverse ETL is the solution that organizes and facilitates the transfer of DWH data into the tools used by operational teams.
With a Reverse ETL, data from the Cloud Data Warehouse is no longer just used to feed the BI solution; it is also used to benefit the business teams in their daily tools.

This is why we speak of “Reverse ETL”.

Where the ETL (or ELT) moves the data up in the DWH, the Reverse ETL does the opposite. It pushes the data down from the DWH into the tools.

The Reverse ETL is the solution that connects the data stack and the marketing stack in a broad sense. It’s at the interface of the two.

An example? With a Reverse ETL, you can feed web activity data (stored in the DWH) into the CRM software to help sales reps improve their prospect/customer relationship. But it is one use case among many others… The multiple cases are likely to increase in the coming months and years.

Census, HighTouch, and of course Octolis are three examples of Reverse ETL.

🏁 Conclusion

Infrastructures, technologies, practices, and even Data marketing professions have evolved at an incredible speed in the broadest sense. We’ve seen the central place this modern data stack gives to the Cloud Data Warehouse. Everything revolves around this point of gravity.

Certain recent developments and particularly including the fashion for on-the-shelf Customer Data Platforms, somewhat distort the understanding of what’s really going on.

But make no mistake, the arrow of the future is clearly pointing toward Data Warehouses (which no longer have anything to do with their On-Premise ancestors).

Towards the Cloud DWH side…and the whole ecosystem of tools revolving around it: EL(T), BI Cloud solutions…and of course Reverse ETL.

Reverse ETL – Definition & analysis of this new category of tools

ETL (or ELT) solutions allow you to extract data from different applications and put it into a data warehouse. Reverse ETL processes, on the other hand, allow you to extract data from the data warehouse to feed all sorts of applications: CRM, advertising tools, customer service, etc.

The potential is enormous; reverse ETLs allow you to have a single source of truth for most business applications, which means no more recurring problems reconciling data from tool A to tool B, or managing flows between different applications.

Why is this type of solution emerging now if the potential is so significant?

Historically, the data warehouse is only the foundation of BI (Business Intelligence). It is used to build reports and large ad hoc queries that are not critical.

If you asked a CIO in the 2000s, it would be an aberration to supply a CRM, a critical application that uses hot data, from a data warehouse.

The new generation of cloud data warehouses (Snowflake, Google BigQuery, AWS Redshift, ..), and the ecosystem that goes around it, change the rules of the game.

The modern cloud data warehouse can become a complete operational repository because it’s much more powerful, easier to maintain, and adapted for all types of queries.

But reverse ETLs are the missing link to make it all happen.

This comprehensive guide will explain everything you need to know about this new element of the modern data stack.

What is reverse ETL? [Definition]

Reverse ETL genealogy: in the beginning was ETL

Reverse ETL is a new family of software that already plays a key role in the modern data stack. But to understand what a reverse ETL is, you first need to understand what’s an ETL because reverse ETL is a result of ETL.

The concept of ETL emerged in the 1970s.

google trends etl world
Source : Google Trends.

ETL stands for Extract, Transform & Load. Before assigning a family of tools, ETL designates a process that tools of the same name can accomplish.

ETL is the process of Extracting data from the organization’s different data sources, Transforming it, and finally Loading it into a Data Warehouse.

ETL tools are used to build the data pipeline between the data sources and the database in which the data is centralized and unified.

The data sources can be:

  • events from your applications,
  • data from your SaaS tools,
  • your various databases,
  • and even from your data lake.

ETL tools develop connectors with the primary data sources to build the data pipeline.

fivetran connecteurs
Fivetran offers over 150 connectors to data sources.

In the past, ETLs were heavy On-Premise solutions, running on heavy Data Warehouses installed on the company’s servers.

But with the advent of Cloud Data Warehouses (in 2012, with Amazon Redshift), a new category of ETL software has emerged: Cloud ETLs.

The cloudification of data warehouses, ushered in by Amazon, has led to a cloudification of ETL tools.

The two emblematic examples of Cloud ETL tools are Fivetran and Stitch Data.

Besides loading data into the data warehouse (or DWH – the destination), ETLs are also used to transform it before integrating it into the database. So it’s not just a data pipeline, but also a laboratory.

We can now understand what’s reverse ETL.

Reverse ETL is a solution for synchronizing DWH data with your business applications

In short, the ETL tool allows you to bring data from your different sources into the DWH to centralize and unify the company’s data. This data is then used to perform data analysis: Business Intelligence.

Reverse ETL has an inverted function of ETL, it is the technological solution that allows you to transfer centralized data from the data warehouse to business applications.

Reverse ETL finally solves a nagging problem encountered by companies.

They normally manage to centralize data in the data warehouse smoothly due to Cloud ETL, but once this data is in the DWH, it’s difficult to get it out of the database and use it in business tools.

Though ETLs are used for BI, they’re rarely used to feed business applications in the absence of simple synchronization solutions; this is where Reverse ETL comes into play.

 

Reverse ETL is a flexible data integration solution for synchronizing DWH data with applications used by marketing, sales, digital team, and customer service, to name a few.

Like Cloud ETL tools, flexibility and ease of use characterize reverse ETLs. Data is processed, transformed, mapped, and synchronized in the business applications using connectors and modulo SQL work.

Without using SQL queries, reverse ETLs allow you to edit the data flows from a visual interface. You choose the database column or table you want to use and create the mapping from the visual interface to specify where you want the data to appear in Salesforce, Zendesk, and so on. No more scripts or APIs.

Once the flow is set up, the data is synchronized in the applications, not in real-time, but in very short batches of about a minute.

Reverse ETLs, such as Octolis, are based on an approach known as “tabular data streaming” instead of the “event streaming” approach. What reverse ETL does is copy and paste tables from the source system (the DWH) into the target system (the business application) at very regular intervals.

Like ETL tools, reverse ETLs are not only data pipelines. They allow you to transform the DWH data and prepare it – i.e., clean the data, create segments, audiences, scorings, and build a unique customer repository.

So why are Reverse ETL solutions so popular today?

Now that we know what Reverse ETL is and how it works schematically, let’s dive deeper into the “why.”

Why do we want to get data out of the DWH?

It took years for companies to centralize and unify their data in a master base: the Data Warehouse Cloud.

Yet many companies are not there and still don’t have a single repository.

But why would you want to get the data that you have carefully centralized in the data warehouse out?

First of all, it is essential to remember that the data remains in the data warehouse in any case. Reverse ETL synchronizes data sets in business applications without moving them. Synchronizing does not mean migrating.

What reverse ETL does is to put this centralized DWH data at the service of business applications.

It is well known that medicine is both a cure and poison. Until now, DWH has been used as a remedy for data silos. But in many companies today, data is siloed in the data warehouse.

Without reverse ETL, the data stored in the DWH is slightly used or not used at all by the business applications.

What is the data used for? As we mentioned above, to do BI and dashboarding.

Thanks to all the work done with SQL, DWH leads to the creation of interesting definitions and aggregates of data: lifetime value, marketing qualified lead, product qualified lead, heat score, ARR, etc. But this business-relevant data is not used directly by the business teams and the tools they use.

With reverse ETL, you can use these definitions, and the associated columns in the DWH, to create customer profiles and audience segments.

stack data moderne
Modern Stack Data with the Cloud Data Warehouse at the core of the system.

With reverse ETL, the data warehouse is no longer just used to feed BI; it feeds business applications directly.

The reverse ETL was the missing piece of the data stack; the piece that prevented this data stack from being truly modern.

What are the use cases of a reverse ETL?

Let’s look closely at the use cases that the reverse ETL tool makes possible.

There are essentially three types of use cases:

#1 Operational Analytics

This new term refers to a new way of looking at analytics.

In the Operational Analytics approach, data is not only used to create reports and analyses but is also smartly distributed to business tools. It is the art to make data operational for business teams by integrating it into their daily tools.

If you think about it, this approach allows you and your teams to become data-driven in all decisions and actions. It’s smooth, easy, headache-free, and doesn’t involve reading indigestible BI reports.

How do you deploy this “Operational Analytics” approach? And how to become data-driven? By using reverse ETL, of course!

Reverse ETL allows you to transform data into the analysis (segments, aggregates) and analysis into action.

Imagine a salesperson who wants to know the key accounts, those on which to focus his efforts?

In the classic, old-fashioned approach, we call on a data analyst who will use SQL to identify high-value leads in the DWH and then display it all in a nice BI table…that no one will read or use.

You can train salespeople to read dashboards and reports, but in practice, it’s always complicated, which holds back many organizations from becoming data-driven. This difficulty in making data and analysis available to business teams prevents the full exploitation of the data available to the company.

With the Operational Analytics approach, there’s no need to train salespeople to use BI reports; the data analyst directly integrates the corresponding data from the data warehouse into a Salesforce custom field.

Reverse ETL allows a data analyst to deploy Operational Analytics as easily as creating a report.

#2 Data flow automation

Reverse ETL allows you to quickly and automatically provide business teams with the data they need at a specific time. Not only does it provide business teams with the data they need in their tools, but it also facilitates the work of data analysts and other data engineers.

For example, if your sales team asks IT which customers are at high risk of churn, reverse ETL makes it easy to answer without spending excessive time extracting data from the DWH.

We could also take the examples:

  • A salesperson who wants to visualize the customers with a lifetime value higher than X€ in Salesforce
  • A customer advisor who wants to see the accounts that have opted for premium support in Zendesk
  • A product manager who wants to access Slack feedback from users who have deployed a particular feature
  • An accountant who wants to synchronize customer attributes in his accounting software.
  • And so on.
Reverse ETL’s use cases.

Reverse ETL allows you to easily and automatically manage these everyday business requests that used to be the hell for the IT team.

In this sense, it addresses a recurring concern in organizations: communication, or rather miscommunication, between IT and business teams.

Harmony between IT and the business is restored without designing APIs.

#3 Reverse ETL, a solution to the increasing number of data sources

One of the modern data stack challenges is to manage the proliferation of data sources. But reverse ETL allows you to take advantage of this formidable data gold mine to create a memorable customer experience.

It serves both purposes:

  • For the customer: Offer them a richer and more relevant experience thanks to more personalized actions (targeted content, distribution channel, and production time). It generates more customer satisfaction.
  • For the company: Increase customer retention and revenue per customer.

Reverse ETL enables the transformation of customer knowledge produced by DWH and BI into an enriched experience for the customer.

Two alternatives to reverse ETL software: Customer Data Platform & iPaaS

There are alternatives to reverse ETL software; our article would not be complete without mentioning them.

Reverse ETL vs. CDP

Customer Data Platforms have been gaining momentum since the mid-2010s. A CDP is an off-the-shelf platform that allows you to build a single customer repository by connecting all of the organization’s data sources. As such, CDP is an alternative to the data warehouse.

The advantage over the data warehouse is that CDP is not just a database for BI purposes; the CDP offers advanced functionalities to:

  • Prepare data for business use cases: segmentation, creation of aggregates, scores, etc.
  • Redistribute it, via native or tailor-made connectors, to business applications.

In short, CDP plays the same role as DWH and reverse ETL. In fact, you don’t necessarily have to choose between CDP and DWH. The same company can indeed combine:

  • A Data Warehouse that will be used for BI.
  • A Customer Data Platform that will enable customer data to be activated and made available to business teams.

Compared to the Data Warehouse – reverse ETL combination, the Customer Data Platform is characterized by:

  • Greater rigidity. The CDP imposes its data models and limits the creation of customized models.
  • CDP is a costly solution, inaccessible to most small and medium-sized businesses.
  • CDP does not promote communication between IT and business teams. CDP is designed for business teams and especially for marketing.

The publishers’ objective is to make the business teams autonomous from IT. But in our opinion, the challenge is to make communication between IT and the business more fluid; not to destroy it.

To deploy complex data use cases, IT has a role to play.

That’s why we prefer the approach of combining the data warehouse with a reverse ETL tool. It offers more flexibility. In short, reverse ETL transforms your data warehouse into a Customer Data Platform.

Reverse ETL vs. iPaaS

An iPaaS is an integration solution in SaaS mode: Integration Platform as a Service. Integromat is probably the most iconic iPaaS solution on the market today.

iPaaS solutions generally offer easy-to-use, visual interfaces that connect applications and data sources.

The way it works is similar to reverse ETL: You select a source, select a destination tool, and edit the mapping to define where the data from the source will fit into the destination tool (the location and the “how”).

The example below shows the design of a mapping between emails and Google Spreadsheet:

Integromat Email Integration GSheets
Integromat – Email Integration – GSheets.

There is no need for APIs, scripts, or SQL, so iPaaS solutions are popular with non-technical people.

An iPaaS allows you to create 1:1 data flows directly between the sources and the destination without going through the data warehouse.

For this reason, iPaaS can be used by companies with limited data integration needs. But it’s not the preferred option for companies that want to build an IT infrastructure around a database that acts as a hub.

Conclusion

The most advanced companies in terms of data already use reverse ETL. And it’s destined to become the norm in companies that wish to exploit their data better. It is a solution that allows you to better use the data stored in the data warehouse.
We will come back in more detail on the issues surrounding this essential data brick.

Why do we launch Octolis?

First user test after months of development

We are delighted to announce the official launch of Octolis in January 2022.

To avoid the disappointment experienced by the creator of that labyrinth, we have based the development of Octolis on our clients’ feedback.

We’ve had customers using the product’s first version for almost a year now, including major brands like KFC and Le Coq Sportif. We’ve been quiet while working hard with a few customers for months to improve our product, over and over again.
And now, the time has come! Octolis is now available to all companies who want it!

We have a lot to say about why we launched Octolis. But if you don’t have time to read it all, here’s what you can take away in a nutshell:

  • We believe that the growth of modern cloud data warehouses will profoundly transform organizations. When all the company’s data is stored in a warehouse, you can use this warehouse to manage all your teams and sync all your tools. Octolis acts as a sort of data logistician.
  • We will enable small businesses (SMBs) to become genuinely “data-driven”. Not to create reports that are barely used, not to create yet another machine learning POC that will never be put into practice, but to improve everyday operations.
  • We have developed the data management solution we wish we had in our previous experiences. It’s a simple enough solution to be used by marketers and flexible enough for tech/data teams.

The standard issue of data silos

Clément and I met at Cartelis, where we have been data consultants for years. We had the chance to work for companies of various sizes and levels of digital awareness, from great start-ups like Openclassrooms, Blablacar, or Sendinblue, to more traditional companies like RATP, Burger King, or Randstad.

In almost all the companies we worked for, there were significant challenges around customer data reconciliation.

The problem is simple, all teams would like to have as much information about their customer as possible, within the tools they use daily.

For example, sales teams want to see in their CRM software if the customer has used the product recently so they can trigger a follow-up at the right time. Marketing teams want to set up fully automated messages after a customer has complained to customer service or visited a specific page on the website. And customer service wants to prioritize client tickets based on the potential risk of losing a customer, just to name a few.

The tools that allow you to interact with your prospects/customers are more and more powerful, but they are under-exploited because it is difficult to sync them with all the data you need. The main reason is that we have valuable data everywhere. Interactions between the company and its customers happen on several channels and tools (e.g., mobile application, automated chat, marketing automation, advertising retargeting, customer service, etc.). These sources generate a phenomenal amount of data that businesses can use to personalize customer relationships.

Most companies start by trying to bind all of their tools to address this challenge; new connections are then established with apparently simple-looking tools like Zapier or Integromat, but shortcomings start to become evident when trying to manage them all at once or trying to scale.

Then comes the moment when we judge that it is time to centralize all customer data in the same place, we list the many advantages (customer knowledge, project acceleration, etc.) to justify the potential ROI, then fix a specific budget and finally, decide to launch a complex “Unified Customer Repository” or a “360° customer database” project, which can be pretty daunting and intimidating, to say the least.

The big question is, what format will this customer repository take? The main options considered most of the time are:

  • An already existing solution: CRM or ERP
  • A tailored made database (usually with an in-house team for support)
  • A software solution dedicated to this objective: “Customer Data Platform.”

 

However, this can be easily and cost-effectively solved with the new generation of data warehouses.

Historically, a data warehouse was a database that supported analysis, not operational uses. Solutions were built to support large punctual queries, with data updated once a day at most. Now, modern data warehouses can support all types of queries, in almost real-time, at a more competitive price, and with no maintenance effort. This changes everything.

The modern data stack creates a new paradigm

In the last few years, the major shift has been the emergence of a new generation of cloud data warehouses like Snowflake, Google BigQuery, Firebolt. Snowflake’s historic IPO in 2020 – with a valuation that continues to increase – is the financial reflection of this significant breakthrough, and yet, Oracle, IBM, and Microsoft have been offering data warehousing solutions for years. So what has changed?

The new generation of cloud data warehouses provides 3 significant advantages:

  • Speed/power: Phenomenal computing power compared to 2010 standards can be achieved in a few clicks.
  • Price: Decoupling storage and data processing has significantly reduced storage costs. Depending on the queries you make, you pay per use, but storing large volumes of data costs almost nothing.
  • Accessibility: Implementation and maintenance are more straightforward. It’s no longer necessary to have a team of network engineers to manage a data warehouse.

If you want to know more about data warehouses, here is an excellent article about it written by our friends at Castor.

Thanks to these innovations, cloud data warehouse adoption is booming, and a whole new ecosystem is emerging around it, including:

  • Extract Load (Transform) tools like Airbyte or Fivetran to sync the data warehouse with data from all internal applications.
  • Tools like DBT to transform data directly into the data warehouse.
  • Tools like Dataiku to perform data science projects directly in your data warehouse.
  • Reporting tools like Metabase or Qlik
  • And now software activation tools (or reverse ETL) like Octolis to enrich operational tools with data from the data warehouse.

You can learn more about the modern data stack in this article.

The modern data warehouse becomes a foundation for analysis and operations

It is now possible to use the data warehouse as an operational repository because it’s easy to build in it a Customer Data Platform equivalent. Some experts call this the Headless CDP approach. It is a growing trend in mature enterprises, which will significantly impact the entire SaaS value chain.

In this article, David Bessis, the founder of Tinyclues, insists that this shift will limit the dependency on full-featured software solutions offered by Adobe, Salesforce, or Oracle. This may explain why Salesforce has invested significantly in Snowflake…

  • There are many advantages to using the data warehouse as the foundation for operational tools.
  • Limit data integration/processing work: We import the data in one place, transform it once, and use it everywhere afterwards.
  • Keep control of the data, and facilitate the transition from one software solution to another.
    Align analysis and action; the same data is used to report and populate the tools. When an analyst calculates a purchase frequency, this can also be used in CRM or emailing tools.

This allows companies to speed up many previously complex projects. For instance, the classic use cases of a “Customer Data Platform”:

  • 360-degree view of each prospect/customer including all the interactions associated with each individual.
  • Advanced segmentation/scorings that can be used in marketing tools.
  • Use “first party” data in acquisition campaigns to create and target best customers look alike audiences, follow-up after no response, or use LTV as an indicator of campaign success.

Other examples include use cases that are less focused on customer data, such as:

  • Enriching a product recommendation engine with available product stock or margin per product.
  • Creating “web events” from phone calls or offline purchases to have a complete view of customer cycles in web analytics tools.
  • Generating Slack alerts when an Adword campaign is incorrectly set up or a lead is poorly completed in Salesforce.

Until now, companies that used their data warehouse for operational purposes set up custom connectors to send data to their different business tools. These connectors can be quite complex to implement because they deal with data format incompatibility issues, batch or real-time flows, API quotas, and more; and you also have to keep these connectors in place once they are set up.

A new category of tools is emerging to facilitate data synchronization from the data warehouse to business tools. Even if the term has not yet been agreed upon, the concept of “Reverse ETL” is most often used to refer to these tools.

Octolis allows all SMEs to effortlessly exploit the data from their existing tools

Unlike medium-sized companies, most mature start-ups or large companies with data engineers have already implemented this type of architecture. This will grow at full speed in the next few years.
The ecosystem around the “modern data stack” has matured a lot, and decision-makers are increasingly aware that data maturity is a priority in the coming years.

But the barrier is often human; data engineering skills are rare and expensive.

Octolis wants to become the benchmark solution for small and medium-sized companies that wish to take their data to the next level without having a team of data engineers.

We offer a turnkey solution that allows to:

Centralize data from different tools in a data warehouse
Cross-reference and prepare data efficiently, to have nice reference tables with customers, purchases, products, contracts, stores, etc.
Synchronize data with operational tools: CRM, Marketing Automation, Ads, Customer Service, Slack, etc.

octolis platform

At Octolis, we believe companies can give autonomy to marketing teams while leaving a certain level of control to IT teams.

The Octolis software interface is simple enough for a marketer to cross-reference/prepare data and send it wherever needed. But this simplicity does not mean that it’s a black box, the data is accessible by the IT teams at any time, hosted in each client’s database or data warehouse, and connected to a reporting tool.

With Octolis, an SME can have a solid base to set up its reporting and accelerate all its marketing/sales projects.

The potential is enormous, and the use cases are innumerable. We get up every morning highly motivated to further improve the product and help our clients fully exploit their data’s potential!