DaaS: Isolating Your Data Assets

Part 2 in the series: “I’m not a data company but I want to monetize my data”

DaaS: Isolating Your Data Assets

There’s a lot that’s been written on lessons for data startups and the modern, evolving data ecosystem. This series explores a specific niche: how can “non-data” companies (SaaS, marketplaces, etc) build out commercializable data businesses alongside their core offerings.

Previously: an intro on what we’re even talking about


As complicated as the path to Data Monetization may, there’s actually a pretty simple framework out there, courtesy of one of the truly foundational data companies, SafeGraph:

This first step, the Data Acquisiton, is where established companies are going to differentiate themselves vs pure-play DaaS startups that have to go out and find data to build around (for more on the process of external data acquisition, see DaaS Bible).

If you’re an established business thinking about monetizing your data assets, it’s likely because you think you’ve already acquired data somehow, even if it isn’t in pursuit of being a data company. (Also: if you’re an established business, and you’re pretty sure you don’t have any data worth monetizing, AND you still want to be a data company, well good luck as that’s a fundamental transformation of your business that probably won’t be solved with Medium articles).

Instead of “Data Acquisition”, we’re going to reframe this first step as “Data Isolation”

Data Isolation is the process of clearly identifying a data asset and establishing the systems/environment for cleanly delineating that asset from your other systems so that you can then work with it as a standalone asset.

There are a few forms those assets can take within your existing systems:

  1. Internal operating data: the data / metrics / facts generated by your business operations. Things like sales figures by client, product usage, system uptime. This data is extremely valuable for helping your business run and likely flows into BI tools, spreadsheets, and other systems. But it’s likely not terribly exciting to others.
  2. Input data: data your company either licenses or directly collects in order to make a core product operate. An example: location data for running apps. When you log into a GPS-linked running app, and give it permission to use your GPS, it is storing your location. It’s using that stored location to give you visual feedback about your split times, distance traveled, etc. Depending on the app (or underlying location SDK), you could also be giving permission for that company to resell that location data for marketing analytics and targetings.
  3. Derivative data: data generated by customers adopting your solutions that may or may not be used in your internal systems (kind of the opposite of point 2). An example: occupancy data from a property management system. An apartment PMS generates time-series data of the ebbs and flows of renters — a big enough PMS, adopted by a wide enough swathe of complexes would be able isolate a set of “real-time visibility into American rental behaviors”.

Recap: unlike a nascent DaaS startup that may be focused on acquiring new data, if you’re an established SaaS business, marketplace, etc. there’s already a ton of data flowing in and out of your systems. Step 1 is to understand those flows, isolate them into packages, and then identify if you think they’re valuable. That last bit is key, and that’s what we’ll cover in the next section.


Data Isolation 201

Some things to consider with your data asset

Do you have the rights to monetize?

  • On the legal end of this spectrum, you may be prohibited from sharing the data with third parties in its current state. Think things like Consumer Privacy, Insider Trading, and Healthcare regulations
  • Even if the data may not be regulated, you may create commercial issues by sharing it. For example, if TripAdvisor integrates into Hilton’s PMS to manage listings based on occupancy, selling that occupancy data to Marriott may create some not-so-fun conversations (in this case, this is likely blocked by commercial / API agreements)

Can you truly isolate the data from other systems?

  • One of the biggest pain points in monetizing data is maintaining shares while the underlying asset changes. Take loyalty card data. If a loyalty provider loses a major retailer on the loyalty services side of the house, the data asset — consumer transactions- will be significantly diminished.
  • Also painful: if your input data is extremely similar to what you’re selling in your core system, or your “system” can be easily derived from the input data, you’re not really isolating anything — you’re swapping one product for another.

Value Creation Lever: Data Company Economics

For nascent DaaS startups, the biggest hurdle is often the cost of data acquisition. DaaS businesses typically cannot monetize until they reach a critical mass of data, meaning significant upfront investment for years until break-even (double whammy: data acquisition typically sits in COGS and therefore destroys Gross Margin, making an early stage business look especially bad).

Established businesses get to skip this step and jump straight to the very attractive margin profile of pure data sales:

  • If you’re monetizing input data, chances are you’re already doing fine margin-wise on that data elsewhere. Expanding monetization can even unlock new capitalization so you’re getting a double benefit to the bottom-line.
  • If you’re monetizing derivative data (sometimes called exhaust data), you’re in great shape as you’re effectively double-dipping on revenue generated from systems usage.

Case Study: Isolating Data at an Insights SaaS company

I work at Numerator — a market leader in purchase-based consumer insights. Our Insights platform — a SaaS BI tool- is built on the foundation of receipts directly captured from individual Americans via a mobile app.

As we looked to directly monetize our data, defining and isolating assets was trickier than at first glance:

  • One asset would be the underlying, raw receipt data (individual items from 2B+ receipts). Because this data is the root of a lot of downstream systems, it was already well-defined and well-maintained, living in well-documented and clean Snowflake tables for internal systems to use.
  • Another asset would the projected shopper metrics, aka the core metrics in the platform. However, these metrics aren’t calculated in advance- they’re generated in real-time whenever users run reports - and aren’t stored anywhere.
  • Yet another asset was a set of projected sales metrics, similarly derived from the root receipt data, but transformed through a different methodology and, unlike the shopper metrics, pre-calculated and stored in their own tables.

So when talking to clients who say they “like the platform, but want our underlying data”, how do we steer them? All three of these data sets are related, they all fit within the purview of “consumer insights”, but are vastly different assets in both structure and purpose. During our DaaS journey, clarifying language, isolating individual assets, and establishing new systems and structures became critical (more in Part 4).


Series Contents