Transforming Healthcare Data with Tuva Health

The nitty, gritty, and shitty of working with health data

Looking to hire the best talent in healthcare? Check out the OOP Talent Collective - where vetted candidates are looking for their next gig. Learn more here or check it out yourself.

Hire from the Out-Of-Pocket talent collective

Value-based contracting: the basics

Learn about value-based contracting and the actuarial analytics behind it in an easy-to-understand and practical way. Get firsthand insights shared by a provider and a payer engaged in real-world value-based contracts.
Learn more

Featured Jobs

Finance Associate - Spark Advisors

  • Spark Advisors helps seniors enroll in Medicare and understand their benefits by monitoring coverage, figuring out the right benefits, and deal with insurance issues. They're hiring a finance associate.

Data Engineer - firsthand

  • firsthand is building technology and services to dramatically change the lives of those with serious mental illness who have fallen through the gaps in the safety net. They are hiring a data engineer to build first of its kind infrastructure to empower their peer-led care team.

Data Scientist - J2 Health

  • J2 Health brings together best in class data and purpose built software to enable healthcare organizations to optimize provider network performance. They're hiring a data scientist.

Looking for a job in health tech? Check out the other awesome healthcare jobs on the job board + give your preferences to get alerted to new postings.

Check Out The Job Board


Tuva Health transforms raw healthcare data, like claims and medical records, into analytics-ready data that data teams can use. Healthcare data is notoriously messy and non-standard and data teams spend tons of resources normalizing and enriching it before they can analyze it.  

Tuva has an open-source project (called the Tuva Project) to map all of your data sources to and make downstream data analysis easier. They are building a community for healthcare data people to get input about how to best transform and analyze data and improve the open source project. Tuva sells services which will do the data pre-processing and mapping for you. 

We walk through what all the technical mumbo jumbo means and what goes into making data ready for analysis. We’ll go through the parts I like about Tuva and the things I think they might struggle with as a company (becoming the main standard, difficulty of the business model, etc.).

This is a sponsored post - you can read more about my rules/thoughts on sponsored posts here. If you’re interested in having a sponsored post done, email Also I’m a small investor in Tuva, so double disclosures here.

Company Name - Tuva Health

Tuva Health makes it easier to analyze healthcare data by making the transformation of raw healthcare data easier. No longer will we be raw dogging the raw data.

A common scene for new data engineers

Tuva is named after the country of Tuva in the former Soviet Union and the subject of the book Tuva or Bust and an homage to Richard Feynman. I guess EMRs are the gulags? 

The company was started by Aaron and Coco. They were running data science teams at Health Catalyst and Strive and realized their teams were building the same solutions from scratch to transform raw medical records and claims into data they could analyze. They’ve spent enough time with healthcare data to develop Stockholm Syndrome, which is why they started Tuva.

They’ve raised a little over $4m from YC, Box Group, Virtue, and a bunch of awesome and funny health tech angels with a great smile and fun, informative newsletter. 

What pain point does it solve?

Let’s say you’re bringing in data from a bunch of different places. You’re succ-ing up EMR data from a bunch of different practices you’re integrated with, you’re getting claims data feeds from your always-easy-to-work-with payer partners, and you’re getting data in a FHIR format from the health information exchanges after 10 years of waiting. It’s a mf’in data party and everyone got the invite; even imaging showed up (on a CD).

Actually getting this data to all work together in a way to answer questions is extremely tricky. 

  • All these data sources have different formats and terminologies. Data fields might be named differently, and medications and labs may not be mapped to standard terminologies like LOINC and RxNorm.  
  • You might have the same patient in two datasets but no unique ID to join them, so you need a way to figure out how to link the datasets to the same patient. 
  • You’ll see a lot of duplicated, missing, or inaccurate data for a given patient. For example, did patients with end-stage renal disease have dialysis visits in the past month?  If not, they are either dead or (more likely) you’re missing important data. 

Once you’ve solved these problems, you have an even bigger challenge ahead: how do you convince investors this is proprietary *cough* I mean how do you enrich the data so you can quickly answer important healthcare questions?

Let’s say you wanted to answer the question, “which patients were readmitted in the past 30 days?”. While it seems like a simple question, you need to define what a readmission actually is, which is complex. We have a time window (30 days), but what if the patient comes back for something unrelated? Or what if they get discharged to another hospital for care instead? Or what if they have a certain disease (e.g., cancer) that requires them to come back regularly?

Almost every healthcare company deals with the above problems. This is usually solved by tricking/hiring a data engineering team to:

Extract + Load: Move all these raw healthcare data sources into a data warehouse, a place where the raves are way less fun.


  • Write SQL or python scripts to normalize the raw data sources - which all have different formats and terminologies - into a common data model with standard terminologies.
  • De-duplicate patients and encounters that are common across the raw data sources. 
  • Create definitions for concepts like “readmissions” or “risk scores” which combine different datasets together.  People will have different opinions about what should go into these definitions, which can be solved by whoever yells their opinion the loudest. Queries then need to be written to map the cleaned data to these different concepts.
From Tuva’s excellent presentation. DW = Data Warehouse

Often, healthcare companies outsource this transformation work to a vendor you found from one of your homies who didn’t even say they liked the vendor just that they used them. If a vendor “does analytics”, 9/10 times they’re basically just doing this transformation process. Sometimes behind the scenes a big chunk of it is outsourced overseas to people that don’t necessarily have expertise in healthcare data. Many of these vendors don’t explain how the data is actually being transformed, they just give the end result. This can make it difficult to trust the data you get back and if something changes, it can impact all of your downstream analyses.

If a company chooses to do this transformation process in-house, it can be really expensive. Usually it requires multiple full time employees usually focused on it and if you’re lucky, a handful will have worked with healthcare data before. The company will have to purchase different databases and software that let you do things like map billing codes to diseases or figure out if a patient is the same one across datasets. And that doesn’t include the opportunity cost of the data team answering Slack messages of “hey this dashboard isn’t working, any ideas?” The Tuva team estimated that a fully loaded cost for managing this process exceeds $500k and starts to approach $1M.

There’s no reason every company should be doing their own pre-processing of data and maintaining their own bespoke data model. What if someone made transforming the data easier, more standard, and more transparent?

What does the company do?

Tuva transforms raw healthcare data into analytics-ready data. They preprocess, normalize, augment, unify, and enrich your data. Oh you don’t understand what that means? Are you a little baby that needs it spoonfed to you?

Let’s start with the first part of Tuva’s product, which is this open-source software. In that open source software you have:

  • A core data model - this is a common data model designed to unify claims and medical records into a common format. Simple example - in patient records “female” might get represented as “F”, “f”, “female”, etc. even though they are referring to the same thing. The data model will have one concept for “female” that all of these will get mapped to.
  • Terminology Sets - This is a library of standard healthcare terminologies everyone doing analytics needs, but for some reason, are scattered all over the internet like horcruxes (e.g. ICD-10 codes).
  • Data Marts - These are smaller, more “opinionated” data models designed to answer specific types of questions (e.g. readmissions, risk, spend, preventable ED visits, chronic conditions, etc.).
Source: Tuva

The cool thing is that anyone can use these three parts for free without asking permission. You can use this data model yourself, or if you find that it’s missing stuff, you can contribute back to it and make fixes and future users will have those additions. If you have different ways you define some of those concepts like risk, spend, etc. you can contribute those and people can see how you did it.

The data model, marts, and terminology are free to use. You could do the whole process of taking your data sources and mapping it yourself. But how good are the mental health benefits at your company?

Tuva has a “Data Factory”, which is a managed service where they take care of all the data transformation for you. The Data Factory comprises of:

  • Data Ingestion: Tuva will ingest customers’ raw healthcare data from wherever it is. SFTP, S3 buckets, or a disgustingly disorganized Google Drive that would make a compliance person ask for God’s grace.
  • Preprocessing: Tuva will perform initial transformation of customers’ source data to Tuva’s common data model. I think they should call this “RIPPIN’ SCRIPTS” but they said no.
  • Data Quality Testing: Tuva will constantly test customers’ data to make sure that nothing is breaking and none of the data values are invalid. For example, if your concept of “readmission” is giving you a value of 0, something in the different datasets that lead up to readmission is broken and Tuva has systems to suss out exactly where that break is. Please don’t make your internal data teams do this, they’ll mutiny.
  • Data Curation: Tuva will normalize data to the standard terminology code sets, e.g., map custom lab terms to LOINC, custom medication descriptions to RxNorm, wearable data to the garbage, etc.
  • Data Augmentation: Tuva will augment customers’ data by joining it to 3rd party data sources they’ve partnered with including geo-coding, provider metadata sources, death data, and social determinants of health. Very soon, Tuva is also planning to offer benchmarks to customers around cost, quality, utilization, and risk from national 3rd party data sources that are mapped to Tuva’s data model. If you’re also mapped to Tuva’s model, then you can compare your data against the benchmarks, apples-to-apples.
  • Data Unification: Tuva will create a master patient index across patients and de-duplicate encounters to create a consolidated patient record. Someone actually named John Smith has been making this process really hard.
  • Data Enrichment: Tuva will make those data marts on top of customers’ healthcare data. They have pre- built data marts for concepts like “acute inpatient visit”, “preventable ED visit”, “preventable in-law visit” (jk), different chronic conditions, and more, which you can see here.

A lot of companies end up doing this Data Factory process internally - one Tuva customer was ingesting data from 350+ different feeds that were constantly making changes without announcing it. By making their Data Factory process a core competency, Tuva can stay on top of these things on behalf of that company. 

What is the business model and who is the end user?

Now you may be wondering to yourself, if the open-source is free then how does this company make money? Well, this is healthcare startups! They don’t need to!

Jokes. The core features are free for companies to use, contribute, and manipulate for your own use case. These include the data model, the terminology sets, and the data marts (e.g. “readmission”, “risk”, etc.) that build on top of the data model.

What you pay for is Tuva’s Data Factory.  It’s a mix of software (some open, some closed) and services. This is a $10K/month fee for up to 10 data sources or 10M patients, a number most of you dinky ass startups aren’t even coming close to. It includes constantly testing the data pipelines to make sure they’re working, creating rules that flag issues in the data which need to be corrected, etc., which need to be done on an ongoing basis. 

Tuva is mainly used by data scientists, data analysts, actuaries, PhD candidates in crisis about whether they should stay in academia, health researchers, and epidemiologists. A few examples of companies and use cases:

Providers at-risk - If you’re a healthcare startup providing care to patients who has risk-based contracts with payers - congrats and don’t mess this up. Typically you’re receiving claims data feeds from those payers weekly or monthly. You need to ingest and QA those claims feeds, map and unify them in a common data model, and build a bunch of data tables on top that enable analytics around patient risk, provider performance, etc.  With Tuva Data Factory, customers get cleaned and QA’d data out of the box to use for more targeted analyses and models.

Vendors selling to healthcare enterprises - Let’s say you sinned in a previous life and now need to do B2B healthcare sales. If you’re selling to payers, providers, or employers, chances are that claims or medical records data plays a role in your product or solution. This means ingesting and QA’ing data feeds and unifying them into a common data model that you can develop data products on top of. Designing and maintaining a common data model and ingestion / QA pipelines is serious work. Tuva Data Factory can do this for you, allowing the data team to build actually differentiated data products you can sell to customers.

Biopharma - Biotech and pharma companies are using lots of real-world data like claims and EMR data to better understand how their drugs are being used and who is prescribing them. A biotech company might want to track melanoma patients and understand how their utilization of healthcare services changes before and after they take a drug so they can make their case to get reimbursed an a$$load.

But that requires stitching together claims, EMR data, prescription data, etc. AND being able to identify things like a readmission, a planned visit, etc. That’s not a core competency for a biotech company, and they should stick to what they know. Like going public without any revenue and questionably funding patient foundations.

Job Openings

Tuva is hiring Healthcare Data Scientists and Data Engineers. You can see more here.


Out-Of-Pocket Take

I like Tuva and think it’s a good business idea, but also because it aligns with some stuff I believe ideologically.

Open Standards - I like that Tuva promotes the use of open standards because the downstream effect is it’s easier for small businesses to experiment with different data analyses. It also can be a very smart business decision because it creates a natural network effect if it takes off. More contributors means the standards stay up to date as new data sources get mapped to them. More people creating their own concepts by combining different parts of the data model means more companies can take them, improve them, and share their own version.

Beyond that, open data standards should be more transparent and auditable considering their impact on how care is delivered. If standards and concepts went through an open peer-review process like this, there could be more debate and discussion about whether we’re calculating things properly instead of each company having these debates behind closed doors.

Let’s go back to how to define a readmission. Tuva has a good explainer of how they define readmissions here. The public can now see how readmissions get defined and critique or suggest edits on the definition, which is much harder to do if the standard is in a black box.


Outsourcing high cost expertise - Tuva’s current path to monetization is expensive for them - it’s a high touch, white glove service to get a company's data in order. The hope is that customers will stay with them long enough to recoup the costs, or die tryin’.

Most companies would shy away from this, but I think it’s smart for Tuva because most of their customers do not have the data talent when they’re starting to do this process. This gives Tuva a great opportunity to get customers at their earliest stages, and get the flywheel going as the companies grow and build on top of their standard.

Plus, healthcare data has specific nuances that a team who’s spent a lot of time transforming will be able to see way quicker vs. someone that needs to learn all the different weird industry specific idiosyncrasies. As Aaron said:

“We’re the motherf#$!ing experts at this” - Aaron Neiderhiser, Tuva Health CEO

Micro communities - I’m a fan of micro communities, I mean I run one. Generalized social networks are becoming bad places to have real discussions thanks to algorithmic feeds. Micro interest based communities can be a fantastic way to meet others with similar interests, even if that interest is as sexually repelling as healthcare data.

Tuva actually has a Slack where contributors can ask questions, get help, make adjustments, and more.  You can see if other people are dealing with the same issues as you and if their gif reaction game is crazy or not.  I’ve been lurking in their Slack for a bit to see how this works and I’m happy to report that I have no f***ing idea what I’m reading because I’m a non-technical plebeian. 

But the other members seem to be getting their questions answered! Again, this is a great way to use high touch services at the beginning to gain trust and over time can lower the cost of customer service + improve the standard if other members of the channel are helping each other.

I’m just responding to everyone’s message with “damn that’s crazy” until they kick me out

As with any company, Tuva isn’t a sure shot. There are several questions and issues the company is going to have to grapple with.

Business model and scaling - Tuva’s business model is the Data Factory. You’re effectively outsourcing the work of an expensive data engineer. It’s pretty clear why this would be attractive to small companies that don’t have this expertise in-house, but will larger companies with their own data teams bite?

Tuva has to prove that it’s not worth spending internal resources on this. Their pitch is that companies should be deploying their data teams on more of the downstream analysis questions with their organized data in hand vs. the upstream questions of how to make the data usable.

On top of that, the business model is a combination of services + tech, with the idea that a lot of the upfront work involved in mapping your different data sources to Tuva will be repeatable for other companies and cheaper over time. These are necessary for the company to scale, and time will tell if this is true.

Becoming a standard and competition - I mean…you know how many broadly utilized standards there already are? Any time this discussion comes up, you’re morally obligated to post this infamous xKCD comic:


Becoming the de facto standard everyone trusts is extremely hard. Most of the widely used standards in healthcare today were developed when data was first becoming digitized and had an enormous first mover advantage. In other cases, the lack of coalescence around a standard actually pushed the government to mandate certain standards like CPT codes, the USCDI data standard, etc. Tuva essentially has to compete with all of the companies trying to become the “healthcare data standard”, which is a tall order.

Tuva’s idea is that an open-source approach will be more attractive than having to work with the existing data standards because they’re cheaper, built with the assumption you’re using a data warehouse, and you get updated more frequently with an active community.

Will companies contribute? - I think it would be awesome if companies shared things like how they calculate readmissions, what weights they use to assess patient risk, etc. But I think in healthcare sometimes companies feel like how they do that is proprietary to their company and part of their secret sauce. 

Open-source doesn’t work unless people contribute back to it, but hopefully companies realize it’s in their best interest to have that be public so others outside of their companies can improve on it for both parties' benefits.

And trust me, your company is not going to be more successful because you found a secret way to classify patients as “taking their medications”. For your sake, I hope not.

Conclusion and parting thoughts

I’m a big believer that healthcare companies should stick to their core competencies. Right now, companies are forced to build data extraction and manipulation processes from the ground up when they have better things to do. As more data becomes available thanks to the 21st Century Cures Act and interoperability rules, the demand for data cleaning and a flexible data model is going to only increase.

I’m an eensy teensy investor (my entire life savings) in Tuva so obviously I believe in them. But I also just generally believe we should have more open-source projects in healthcare, and this one is exciting. If you’re someone that spends a lot of time cleaning and manipulating data, the Tuva model could be a good fit.

Or just make custom emojis for their Slack, every contribution to open-source helps.

Thinkboi out,

Nikhil aka. “Crying In Data Mart”

Twitter: @nikillinit

IG: @outofpockethealth

Other posts:


‎‎If you’re enjoying the newsletter, do me a solid and shoot this over to a friend or healthcare slack channel and tell them to sign up. The line between unemployment and founder of a startup is traction and whether your parents believe you have a job.

Let's Keep In Touch

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
search icon