Ciitizen And The Patient Data Marketplace

The path to our personal health record


I wanted to use this as an opportunity to talk a little bit about patient data access, health data exchanges, and what Ciitizen is doing to help us take control of our healthcare data (with my take at the end).

This is a sponsored post - you can read more about my rules/thoughts on sponsored posts here. If you’re interested in having a sponsored post done, email

Company Name

Ciitizen is a personal health record with the goal of making it easier to choose where you want your health data to go. It decided on its name based on what would permanently mess up my autocorrect going forward.

Ciitizen was founded by Anil Sethi,Deven McGraw, Brian Carlsen, Farid Vij and Peeyush Rai. Initially they thought about forming a basketball team before landing on Ciitizen. Previously, Anil was the founder of Gliimpse, which is the personal health record company acquired by Apple which eventually became the underlying technology for Apple Health Records. Deven was previously the Deputy Director, Health Information Privacy at the HHS who wrote much of the HIPAA patient access guidance we operate under today. Brian Carlsen authored the NLM/NIH data standards that became SNOMED and other leading bioinformatics underpinnings. Farid and Peeyush bring the tech experience to the party. As you read more, I think you’ll realize this is as close to founder(s)-market-fit as you can possibly get.

The 100+ person company has raised more than $27 million so far from investors including Vijay Pande from a16z, Mike Pellini of Section 32, Verily, and Mubadala Ventures.

What does the company do and what pain points do they solve?

I knew the day would come where I’d have to explain a semi-complicated healthcare data product but I was hoping I’d have at least a few more months to live carefree first. Alas that is not the case. Just kidding - talking about Ciitizen is a great chance to explain how healthcare data moves around the ecosystem all without our knowledge.

Ciitizen gets your health records (think thousands of pages of incomprehensible, repetitive documents in one fat stack of a PDF) from the many parts of the healthcare system. Then it uses its fancy ML pipeline to create research-grade data from an otherwise hellish stack of PDFs and securely stores it under patient control. Simply sign a form that says “Ciitizen is allowed to get my data from hospitals on my behalf.” Finally, Ciitizen hits up hospitals, imaging centers, genetic labs, etc and gets all your records for you.

This is, at its core, what a patient’s HIPAA Right of Access is meant to do. As patients we are allowed total access to our complete health history, but that doesn’t mean it’s easy to get. Sure, you can log into your patient portal if you have one and if you remember the password. But it wouldn’t give you the complete story. If you want all of the records (and are brave enough), you could pester each one of your providers for your data and they might charge you a fee, take a super long time dragging their feet getting the data to you, and likely violate a bunch of HIPAA regulations in the process. Then finally they’ll get a CD-ROM or something and you’ll have to ask any of your friends if they have a CD-ROM drive and they’ll laugh at you. Or they’ll point you to their portal which only has a fragment of your total health record, which sort of defeats the purpose?

Ciitizen prevents you from getting laughed at by your friends. Ciitizen’s technology automatically bugs providers on your behalf, receives the documents that come in all sorts of formats like faxes, scanned PDFs, mailed boxes of paper (I hate this industry so much), emails, etc. Then Ciitizen gets those documents, structures the clinical narrative into computable data, and gives it to you as a patient so you have your very own complete personal health record. They do this fast (well, healthcare fast) getting all of your records together in a few days vs. weeks or months. Some data comes even in minutes but the complete record takes time. For people with advanced diseases, days versus weeks makes a huge difference.

Right now they’ve started with cancer and rare neurological conditions and are moving into other areas like autoimmune diseases, eventually serving all patients with all conditions.

[Below are screenshots of what Anil’s late sister Tania would see in her Ciitizen profile. Data and screenshots shared with permission.]

All extracted data is easily source verified to the original document.

Ciitizen’s Bets

There are a lot of companies building applications that retrieve and structure patient’s data, so I wanted to talk about some specific things Ciitizen is betting on as differentiation.

Data ingestion

Ciitizen’s secret sauce begins with the document ingestion and data structuring technology. First, they have some pretty fun automated processes that bother hospitals, clinics and labs at scale until they send over the patient’s health records to Ciitizen. They even made a scorecard on how well each hospital does this. This is my favorite part of the process because I, too, enjoy bothering people at scale.

Ciitizen then gathers the incoming documents, regardless of format and turns them into readable data using machine learning to understand the different sections of the forms, faxes, PDFs etc. This is necessary to contextualize the data in each section (e.g. histological subtype from pathology report, medications from chemo flow sheets and findings from imaging reports, etc.). The ML pipeline automatically parses unstructured text into semantically normalized data informed by Ciitizen’s multidimensional data models. Ciitizen then leverages clinical experts to quality control and make sure all the important fields are correctly placed and coded. This is that research-grade stuff, Heisenberg quality data.

Here is a general overview of the process. It has that newsletter-cute handwritten aesthetic. Shout out to Brian C, Ciitizen co-founder and czar of all things bioinformatics who did his best to explain this to my smooth-brained self.

By focusing on getting the source documents, Ciitizen is aiming to get deeper data on (initially) targeted groups of patients. By starting with the source documents and choosing how they can structure it, Ciitizen has mapped the data (they called it “ontologies”) so that clinical concepts that mean the same thing are mapped together (e.g. acetaminophen and tylenol are mapped to the same thing, breast cancer and invasive ductal carcinoma will have a parent-child relationship, etc.). No more extremely janky SQL queries with 50 permutations of cancer in your WHERE clause to make sure you got it all.

One question is whether this technology is defensible, and if improvements in off-the-shelf machine learning packages won’t make it easy for another company to copy. There’s a lot of focus on clinical natural language processing, which tons of companies are doing and which typically results in it finding words without much context. Data standardization is really the key differentiator. While the ability to ingest data and identify common text terms will become a commodity eventually, automating the accurate codification of this data to standard concepts based on the specific context of a patient's history is the hard part to replicate. Good biostatisticians make bank for a reason.

Ciitizen’s main bet is that their technology will effectively scale with as few humans as possible as more patients with different diseases use Ciitizen. The company started working with cancer patients, and has already moved into several other therapeutic areas. This is one of the core value-propositions of Ciitizen: it must work for everyone, globally. And if it requires a lot of humans to get the data, that’s going to be very expensive. “Calling an Uber on New Year’s” -expensive. So Ciitizen is going to have to leverage its AI and ML to expand into other therapeutic areas and industry verticals if it plans on doing it faster than its competitors.

The FHIR Extinguisher (Please don’t unsubscribe).

Ciitizen is also taking the interesting approach of not building exclusively on FHIR. You can read more about FHIR here, but the general gist is that FHIR creates a common data standard for healthcare organizations that outlines the types of data, their formats, the field name for that data, etc. so it’s easy to query via APIs. It’s sort of similar to how your web browser can go to any page and load because it knows we use standardized names for elements.

FHIR is still a relatively new concept slowly being rolled out. There are FHIR APIs that exist today, but historically there hasn’t been much guidance or pressure on types of data providers’ EMRs have to allow through them. However with new interoperability rules from the ONC, there will be a common set of data types that are mandated to be available through FHIR v4 by 2022. Many apps are choosing to pull patient data out of EMRs using FHIR APIs to build personal health records.

Ciitizen has looked at FHIR and said “😬😬”. Their belief is that the current data that EMRs are required to put out through their APIs are missing research grade data that is relevant for complex diagnoses like cancer, including (but not limited to):

  • CT/MRI, Images
  • Genetics/genomics
  • Tumor profiling details
  • Pathology and imaging reports
  • ECGs/device reading
  • Clinical Notes

Some of these like clinical notes will be rolled out in 2022, assuming the date sticks and EMRs are compliant. Several of these other data fields are presumably going to be included in round 2 of interoperability requirements, but TBD on how long it will take for that to happen and rollout. So the timeline is up in the air on getting all the data types necessary for research, and even when it comes through it will be unstructured. FHIR is also not a global standard - and Ciitizen is thinking about what their eventual international expansion will look like (even if it’s early days).

Ciitizen’s CEO Anil Sethi kept repeating this line as I asked him to explain things to me so I feel obligated to put it here: “FHIR is a cocktail straw in the ocean of information limiting how much data you can actually suck out.” Anil estimated that currently FHIR gets about ~10% of data required for a usable personal health record. I am left with questions about liquid volumes via straw sucking.

Ciitizen still ingests the data from FHIR, but has decided to combine that data with the much more laborious route of requesting the source documents and then use their data extraction process to turn it into a computable form instead. These raw documents contain a lot of the information missing in FHIR that is needed by sick patients with complex care across lots of different settings, with different data formats, and important data in the doctor’s notes. Then they actually take the unstructured data, both from FHIR and their own process, to create something queryable. This is much harder than going through FHIR APIs alone which would be faster though also more commoditized.

The Health Information Exchanges

So how do all these documents get transported to Ciitizen?Or let’s say your doctor asks for your profile to be imported into their EMR, how would Ciitizen push that data?

These are actually non-trivial tasks even though they sound simple, because every EMR at every provider is built differently. One way to deal with this is to build an integration adapter with each provider that wants to request data, but this takes a long time because you’re going one-by-one.

Another way to do this is to use middle layers that have already done a bunch of integrations with these providers and act as the “pipes'' that make it easy for documents to move in-and-out of EMRs. Said differently, this is how doctors currently exchange rich data with each other. Ciitizen is taking a bet here on the Health Information Exchanges (HIEs) aka. the data pipes.

HIEs are typically non-profits in each state that will connect the data from different health systems together. If two providers are on the same HIE, then one might be able to bring your historical data into their EMR while they’re seeing you. There’s a lot of different types of data exchange that can happen (directed exchange, querying-based exchange, and consumer-mediated exchange) which you can read more here.

But the gist is that these data pipes between providers already exist, and Ciitizen is betting that HIEs are up to the task of facilitating document sharing. This varies wildly from state to state, and there are multiple HIEs within a given state that might have different providers on each and require different levels of data access from networked providers on a given HIE. However, there are more and more state mandates for providers to share data into state HIEs (Arizona is a great example). Information blocking rules that are coming into effect this year will fuel this fire, and more providers are being urged to increase data accessibility to HIEs to better coordinate the COVID response in a given geography.

Another is whether HIEs are able to handle this kind of data throughput. Frankly most of the HIEs are non-profits that are 10+ years old and therefore based on older technology. That’s why Ciitizen acquired the HIE-related IP of Stella Technology. This company built the original servers and pipes powering many HIEs today, so the acquisition should make it faster for Ciitizen to connect into the different HIEs.

Ciitizen is creating a future where patients will control where the data goes to and HIEs will serve as the transportation system for most if not all of the data.

What is the business model and who is the end user?

For all patients, Ciitizen is and will always be free and patients will always control who gets access to their health data. Ciitizen intends to be the “App Store '' here, enabling the transaction between both parties where Ciitizen AND patients take a cut of revenue.

The first third-party app Ciitizen is focusing on is clinical trial matching. This is traditionally a very difficult area because you’re matching unstructured janky text data from the clinical trial inclusion/exclusion criteria to the unstructured janky text data in your doctors notes. They’ve got matching down to under 1 hour for the trials they’re working with.

Beyond trial matching, you can imagine a natural expansion into real-world evidence use cases for pharma like natural history studies, single-arm trials, post-marketing surveillance, and other things I’ve explained previously. They are doing some of these already.

Ciitizen is planning to expand to other use cases for providers, payers and retail health. Patients can already use that data to find doctors for second opinions. Providers can use it to get patients records from everywhere the patient has been, connect their EMR data to data being created in the home like wearables, or get self-reported data directly from patients via surveys. Governments can use this data at scale for epidemiological surveillance or powering biobanks like All of Us. Patients can donate it to non-profits and research foundations.

Basically, Ciitizen is a platform that will power any use case where you need a patient’s full record but gives patients control of that data. Right now, many healthcare services we use de-identify our data and sell it out the backdoor without our knowledge or compensation. My data is valuable, I’m trying to get paid for selling my body (...or the data related to it).

Ciitizen’s goal is to put its “data refinery” anywhere and everywhere it can be useful with the dreams of becoming the deepest and most comprehensive data repository for FDA and clinical-grade healthcare data in the world. My dreams involve more centaurs, but to each their own.

Job Openings

Ciitizen is hiring for lots of roles as it plans to scale up.

Out-Of-Pocket Take

A big part of Ciitizen’s success rests on its technology. If it really continues to work this well as it scales with minimal humans that feels crazy impressive to me on its own. But Ciitizen wants to create the ultimate data marketplace for patients.

Remember how I talked about how building a personal health record was a fool’s errand? Well Ciitizen is basically printing out that post, shoving it in my face, and saying “you’re an idiot”.

The truth is I do think we’ll have personal health records at some point, it’s more about when and how. Ciitizen is betting that the time is now, and they’re betting very heavily that their data extraction technology is how they become the data platform for the world.

One thing I’m hopeful companies like Ciitizen can do is demonstrate that it’s possible to have a healthy exchange between data and services, using your data as a currency. Even outside of healthcare, current sentiment around data usage is that it’s mined from us and used in ways we don’t fully understand (yeah I’m sure every single person reads their terms & conditions). Ciitizen gives patients the ability to control where their data goes, get compensated for it, and also potentially get better care by giving their data for research, second opinions, etc.

I also wonder about which has more value: the technology to turn the raw documents into a structured format or the two-sided marketplace with lots of patients on it willing to exchange their data with different stakeholders. If the value is in the data marketplace, then Ciitizen has to have more control over the end-user relationship. This might make it difficult if it pursues both a direct-to-consumer strategy and one that lives behind the scenes of other consumer applications while powering their data ingestion/processing needs.

And finally, the core question is whether the average, healthy patient is ready for this or not. This is clearly a good value proposition to people with life altering diseases that are interested in giving their data to research or participating in clinical trials. It’s smart for Ciitizen to start here in “sick” care.

But previous iterations of health records have stumbled getting mass adoption for the core reason that most people just don’t really care about having their health record in one place. It’s possible that today is different because the new interoperability rules make it much easier for patients to get their data and authorize third-parties to use it. It’s also possible that more people than ever care about their personal health record to show their COVID vaccination status/recent test status. I mean even I logged into MyChart twice in the last few weeks to see my COVID test results after almost never logging in in my entire life. But I’m also REALLY bored.

So it’s a lot of bets, but if they all happen then we might see the first true consumer-directed patient platform. IMO one of the reasons we haven’t seen a new gigantic healthcare business is because true platforms with vibrant third-party application ecosystems don’t exist in healthcare. Maybe Ciitizen will be the first, and I’m rooting for them.

Thinkboi out,

Nikhil aka. “third i blind”

Twitter: @nikillinit

IG: @outofpockethealth

Heading 1

Heading 2

Heading 3

Heading 4

Heading 5
Heading 6

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.

Block Quote
  1. Lorem ipsum
  2. Lorem ipsum
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.