Open Source Healthcare Categories I Like
Get Out-Of-Pocket in your email
Looking to hire the best talent in healthcare? Check out the OOP Talent Collective - where vetted candidates are looking for their next gig. Learn more here or check it out yourself.
Hire from the Out-Of-Pocket talent collectiveHealthcare 101 Crash Course
%2520(1).gif)
Featured Jobs
Finance Associate - Spark Advisors
- Spark Advisors helps seniors enroll in Medicare and understand their benefits by monitoring coverage, figuring out the right benefits, and deal with insurance issues. They're hiring a finance associate.
- firsthand is building technology and services to dramatically change the lives of those with serious mental illness who have fallen through the gaps in the safety net. They are hiring a data engineer to build first of its kind infrastructure to empower their peer-led care team.
- J2 Health brings together best in class data and purpose built software to enable healthcare organizations to optimize provider network performance. They're hiring a data scientist.
Looking for a job in health tech? Check out the other awesome healthcare jobs on the job board + give your preferences to get alerted to new postings.
This episode of Out-Of-Pocket is brought to you by…

MedOS is a next-generation AI-XR-Cobot medical system developed by the Stanford–Princeton AI Coscientist Team. Now featured at NVIDIA GTC 2026 and deployed at Stanford, it acts as a real-time clinical co-pilot — combining multi-agent AI, XR smart glasses, and intelligent robotics to assist doctors in live hospital workflows. With expanded medical coverage and near real-time response, MedOS brings AI from the lab into real-world medicine. Explore the future of clinical AI.
👉Learn more: https://ai4medos.com/
–
Open Source Healthcare On The Come Up
Last week we talked about why I think open source is going to have its moment in healthcare.
Below are some areas in open source healthcare I’m particularly interested in. This is not meant to be exhaustive of every open source project out there, but areas of particular interest.
Please don’t email me asking why your open source project isn’t there, the answer is because I’m not good at my job.

[I’m a non-technical person so give me some grace when explaining these projects. All companies with a * are ones that I’m an investor in]
Open source datasets
Similar to the light in my eyes, healthcare data is very inaccessible. It’s either expensive, siloed, extremely messy (and requires expertise or time to fix), or has insane legal gymnastics around how you use it.
Open datasets let people skip the data acquisition phase and jump straight to building. Not only does this speed up the time to test concepts, it also allows people to find anomalies or issues in the data and figure out if certain analyses are reproducible. I’ll frequently see companies that test their algorithms against these datasets to figure out a proof of concept, and then figure out if there’s something more commercial. But you need a sandbox to start testing.
A few examples:
Academic datasets - MIMIC is the OG of open healthcare datasets. It’s a massive database of de-identified ICU patient records (vitals, labs, medications, procedures, notes) from Beth Israel Deaconess. MIMIC is cool because there’s an entire ecosystem of shared code, derived datasets, and a community building around it.

You’re starting to see other centers doing similar things. Stanford's AIMI center has free repositories of AI-ready annotated medical imaging datasets. DeepLesion (32,000+ CT scans with annotated tumors), gnomAD (700,000+ exome sequences for genomics research), and OpenNeuro (500+ neuroimaging datasets) are some other examples.
Synthetic data - It’s pretty hard to get patient data that looks realistic to what you might encounter in the real-world. Synthea is an open source synthetic dataset you can use to prototype with. There’s probably way more “realistic but fake” synthetic datasets that would help a lot of people test ideas out without needing to deal with HIPAA.
Pharma datasets - Some pharma companies are realizing that there’s value to putting their internal datasets out there and letting people build on it. Either to find talented computational biology people to recruit, or find interesting new things for their pipelines.
For example, Recursion has their RxRx datasets, where they open source smaller versions of their microscopy image. Broad’s Cell Painting Gallery datasets are similar. DeepMind open sourced AlphaFold's code and released a database of predicted protein structures. The Open Targets Platform curates a lot of the publicly available datasets (e.g. genomics/transcriptomics) with pharma partners.
Government billing datasets - DOGE decided to do a Friday 13th drop of Medicaid claims. This led to a range of really interesting writeups to analyses so off the mark it would kill a Chartis consultant on the spot.
But I do think we should open source more government datasets like this. It was very cool to see people building visualizers for the data, asking good public questions about why there are so many intermediaries in claims billing, and pointing out anomalies in the data.

Food testing data - People really care about how the stuff they consume is impacting their health. Meanwhile trust in regulating authorities to do that is at an all-time low. You’re starting to see people use various testing kits and contribute to a database. Nat Friedman's PlasticList is a great example. He spent $500K to test 300 Bay Area food items for 18 plastic chemicals, published all methodologies and results on plasticlist.org, and let people vote on what to test next. OpenLabel is building an open source database for toxin and nutritional data for consumer products (e.g. testing baby formula). At this point, my body is a temple for Exxon byproducts but you might be able to avoid that fate.
{{interlude 4}}
Open source data infrastructure and tooling
Healthcare is full of infrastructure problems. Then a poor ops or data person is tasked with building that infrastructure internally, meanwhile 100 other data people have built very similar infrastructure at their own companies. Everyone is duplicating work that isn’t even part of their core business - trust me bro you are not the first to “normalize EHR data to a proprietary ontology”.
Open source infrastructure means companies can stop reinventing the same plumbing and go back to their core business of being services companies that masquerade as tech companies. You see this especially with building data infrastructure from scratch.
There are lots of companies doing some cool work here. A big shoutout to Jennifer Jiang-Kells, who put together a list of healthcare open source projects and analyzed categories they fell into in this post.

Data ontologies - What if we created common languages and schemas so that every company didn't have to spend months translating the same messy healthcare data into something usable?
- The OMOP Common Data Model is the OG in the space, and standardizes the schemas for clinical and claims data so researchers can run the same analyses across completely different databases
- On the drug data side, SageRx is an open source medication ontology that pulls from public drug data sources like RxNorm, the FDA NDC Directory, and DailyMed, and transforms them into clean, queryable tables.
- Tuva Health* has a core product is an open source data model that standardizes messy healthcare data into analytics-ready tables. For example, the demographic data for "Female" might be coded seventeen different ways. It’s not like each company has a unique way of mapping this, so might as well open source it and let people build on each other.

Rails for accessing external data - There’s a lot of data externally that you might want to bring into your org, but you probably don’t want to set up 1-by-1 integrations with every company that has access to that data. Several companies have open sourced the rails to access that data.
- Health Information Exchanges - Metriport built an open source on-ramp to pull data from health information exchanges that you can self-host. We did a free course walking through how this works, but you can look through their docs to see how they convert the different file formats into FHIR, medical code crosswalking, and a whole bunch of stuff your data team says when you’re wondering if they’re having a stroke.
- FHIR servers - A FHIR server sits between anyone that stores data and says "okay, everyone agrees to store and share data in THIS format, and you can request it in THIS way." Before you can do anything with FHIR data, someone has to actually run a FHIR server. HAPI FHIR is the most widely used open source FHIR server to make sure everyone is talking the same language between systems.
- Wearable data ingestion - More companies are trying to figure out how to include data coming from wearables into their medical workflows. The problem is that every wearable brand has its own API, its own data format, and its own authentication requirements. Open Wearables is an open source, self-hosted platform that unifies wearable device data through a single API.
- Claims data - Blue Button is an open source API built by CMS that lets Medicare beneficiaries authorize third-party apps to pull in their claims data. It uses the same kind of OAuth login flow you'd see with "Sign in with Google," and delivers the data in a FHIR format to use.
- MCP connectors - This seems to be an interesting new area. Model Context Protocol is a standard that essentially allows AI tools and agents to interact with applications. Isn’t it great to have yet another standard to learn in healthcare? I’m sure everyone will use it the same and correctly!
Any application with an MCP connector can allow agents a pathway to working in their system. This is particularly useful for getting data out of something and making it analyzable by AI. For example, this repo from Josh Mandel lets patients get their data out of a given EHR, store it in a lightweight database, and then allow AI agents to do some reasoning and analysis over that data
EHRs - In healthcare, people have to choose between ripping out their EHR or ripping out their eyeballs. Making your own EHR can sound crazy but for a long time the Veterans Affairs hospitals ran on an open source implementation of VistA. However implementation can be a pain and require a ton of technical people to get it up, running, and maintained.
.png)
Today there are new open source EHRs designed for companies that have engineering talent. Medplum is an open source headless EHR platform. You can self-host this, or pay them to use their cloud with other features out-of-the-box (integrations, HIPAA compliance, etc.). Ottehr has a similar approach, but with different feature sets available in the open source version vs. hosted versions. OpenCoreEMR and OpenEMR are other flavors with different tradeoffs.
Prepackaged scripts for data analytics - Another flavor of open source infrastructure is the "just give me a script that does the thing" category. People are basically writing common scripts that everyone needs to extract data, transform it in different ways, and run analyses on it. PhysioToolkit is an open source software library with tools for processing physiological signals. Mimilabs ingests thousands of public healthcare datasets from CMS, CDC, FDA, and other government sources, and makes them queryable. Their data engineering scripts are all open source. A lot of data consultants/founders/unemployed open source many of their own scripts to use on top of common datasets.
Clinical NLP pipelines A huge amount of valuable healthcare data is locked in unstructured clinical notes. Several open source natural language tools have become essential plumbing for extracting meaning from them. cTAKES, medspaCy and scispaCy for clinical and biomedical text processing. John Snow Labs uses the open-core move where the base NLP engine is open source, but the healthcare-specific models and pipelines are paid. The first hit is free.
Open Source AI Models and Algorithms
There seems to be a lot of interest in open source AI models in healthcare. My hunch is that there are a few reasons:
- Companies are tired of depending on platforms to build point solutions for them or managing across these point solutions. Using a fine-tuned/customized open source model allows them to build their own bespoke tools and workflows. This also avoids vendor lock-in.
- By understanding the weights and decisions of a model, it feels like there’s more auditability and understanding of how outputs came to be. Though let’s be honest, even knowing all of the innards can’t explain half of it.
- Data privacy becomes easier (and assuages internal concerns) if you’re not sending data out to third-parties and instead running it in your own secure cloud
And we’re starting to see more developments on the open source foundation model side.
- Google's MedGemma is an open weight model. It processes both medical text and images (X-rays, pathology slides, dermatology photos, ophthalmology images). Though it's governed by a different license than traditional open source.
- Meditron is a juiced-up LLM built on Llama and pretrained on PubMed that shows you how they trained it and what kind of data it was trained on.
- MONAI is open source and focused on deep learning for imaging analysis.
- Sophont is a newer company building open source foundation models focused on medical use cases (e.g. a pathology model).
- BioMistral is built on Mistral 7B and fine-tuned specifically on PubMed Central for biomedical text tasks.

You can test some of these out yourself on your own computer. Just be prepared to download a FAT file and for your computer to run very hot. Dual purpose radiator in the winter time.
Open Source Disease Measurement
Healthcare relies heavily on measurement tools. Measuring diseases, measuring values of things, measuring egos and metaphorical phalluses, etc. Many of the scales that are currently in use today are very old, were validated in a different time, and frequently rely on the subjectivity of the person administering the test.
Could open source make existing measurements better and less static?
- Brooklyn Health's* OpenWillis is an open source Python library for digital phenotyping. It has software packages that can quantify things like facial expressivity, voice characteristics, and motor functioning as objective markers of mental health. This feeds into their commercial eCOA platform for pharma companies running trials that want to add these digital measurements into their trials.
- Kintsugi built an AI that could detect depression and anxiety from 20 seconds of voice. They recently announced they were shutting down, and open sourced their models and methodology.
- Pfizer's Scikit Digital Health is an open source Python package for processing wearable sensor data. If you strap an accelerometer on a patient in a clinical trial, the raw data you get back is basically meaningless noise until you process it. Scikit Digital Health has algorithms that turn that raw data into actual clinical metrics. Things like gait speed, physical activity levels, sleep patterns, sit-to-stand transitions, etc. which might be digital endpoints you’d want to capture during a trial.
The obvious question is what validation looks like for a new measure and who maintains that validation over time. A traditional scale gets validated once and sits in a PDF forever. An open source digital measurement tool needs ongoing maintenance. If nobody's maintaining the repo or things keep getting added, how do we know it’s still a good measure? Will people continue to use it if they aren’t sure it’s still good?
The other question is…when is pharma going to finally give a shit??? We’ve been talking about using wearables in trials since I was a wee lad, get on with it.

Open source hardware
As regular newsletter readers know, I’m betting hardware x healthcare is in the takeoff stage. That’s why we’re doing this hardware hackathon IN APRIL!!! If I’m wrong I’ll catch you all in the permanent underclass.
One development I’ve been tracking is the open source movement in hardware. Three areas in particular:
Open source software that makes medical devices do more things - Some patients want to use software to modify medical devices for a purpose they weren’t intended for. For example, Loop and openAPS have open source software packages that connect continuous glucose monitors + insulin delivery pumps to create an artificial pancreas. Patients are taking on the risk themselves here.
As more devices become tech-enabled in some way, I assume you’ll see more of this kind of jailbreaking. But I want to highlight again, if things go wrong the patient is at risk.
Open source prosthetics - I was at an OSHWA conference recently (very cool org) and there were quite a few presentations on open source prosthetics. This included everything from the CAD files to 3D print them to the software needed to operate them. Component pieces like Raspberry Pis give them more functionality. People wanted to make add-ons and improvements to existing equipment they use.
Many presenters talked about how the durable medical equipment they’d get in their country wasn’t great, so they’d create adjustments. Or people who wanted to make add-ons that were quality of life improvements but not things insurance deemed as necessary enough to pay more for. Or they were in low-resource areas where they needed help creating something when it didn’t exist at all.
For example here are files for 3D printed add ons for crutches. E-Nable is a network of people with 3D printers who download CAD files for 3D printed prosthetic hands and give them to people who need it. There’s Open Source Leg, which is self explanatory unless you need an Open Source Brain. OpenBionics has a prosthetic hand with the software + hardware you’ll need to do it.
If you can’t pay an arm and a leg, well…now there’s 3D printing 🙃.

Conclusion and parting thoughts
A few assorted thoughts that didn’t fit neatly above or need their own section.
- I purposefully did not include open source AI evals or benchmarks in this post. This is an area I admittedly don’t know a ton about, but most people I talk to seem skeptical of this entire category because the evals keep changing or aren’t representative of real-world workflows. If you disagree I’d love to hear.
- A few people told me that they’ve built their own custom agents to integrate with different systems that specifically DO NOT want to be integrated with. For example, agents that mimic a human and force an integration with an EHR. They want to open source these, but there’s also a lot of open lawsuits around this that they’re waiting to resolve.
- Turquoise’s* PATIENTS Framework is open source payment rails. They basically consolidate all the services, materials, and fees for a healthcare procedure into a single easy to understand code. The idea is to simplify the payment rails with prenegotiated rates and a coding system to match it instead of using many CPT codes, extra fees, and X12 messages through clearinghouses. Idk didn’t fit nicely but the idea is cool!
- IMO the government angle to this is particularly interesting. We talked about open government datasets, but I think there’s more opportunity to continue the ethos of the Blue Button initiative and create more open source projects for areas like data transformation, access, etc.
- Whoever makes an open source provider directory that’s halfway decent will receive the blessing of the healthcare gods and granted entrance to Valhalla. And we all need to contribute back to it, or I’ll f*** you up.
If you have your own projects or thoughts you think I should take a look at, let me know. In part 3 we will discuss the current issues with open source in healthcare and how we might try to fix them.
Thinkboi out,
Nikhil aka. “All your code are belong to us” aka. “hopin’ source”
Thanks to Colin Durant, Uzair Khan, and Juhan Sonin for reading drafts of this
Twitter: @nikillinit
IG: @outofpockethealth
Other posts: outofpocket.health/posts
If you’re enjoying the newsletter, do me a solid and shoot this over to a friend or healthcare slack channel and tell them to sign up. The line between unemployment and founder of a startup is traction and whether your parents believe you have a job.
Quick interlude - course ends soon! Happy hour!
See All Courses →Healthcare 101 course signups END NEXT WEEK!!! I’ll teach you everything you need to know about how US healthcare works. And an added bonus for this round only is we’ll teach you some basics of how to use Claude for healthcare stuff. Learn more and sign up here.

We’re hosting a happy hour/RCM trivia night with Nirvana and Joyful Health on 3/26 in NY. You should come if you:
- Are involved in revenue cycle at all at your current company
- Are senior at your company (everyone's title is made up, so whatever your equivalent of Director and up is)
- Will laugh if I come up to you and say “haven’t I seen UB-04?”
More details here - we have limited space so sign up sooner than later

Quick interlude - course ends soon! Happy hour!
See All Courses →Healthcare 101 course signups END NEXT WEEK!!! I’ll teach you everything you need to know about how US healthcare works. And an added bonus for this round only is we’ll teach you some basics of how to use Claude for healthcare stuff. Learn more and sign up here.

We’re hosting a happy hour/RCM trivia night with Nirvana and Joyful Health on 3/26 in NY. You should come if you:
- Are involved in revenue cycle at all at your current company
- Are senior at your company (everyone's title is made up, so whatever your equivalent of Director and up is)
- Will laugh if I come up to you and say “haven’t I seen UB-04?”
More details here - we have limited space so sign up sooner than later

Quick interlude - course ends soon! Happy hour!
See All Courses →Healthcare 101 course signups END NEXT WEEK!!! I’ll teach you everything you need to know about how US healthcare works. And an added bonus for this round only is we’ll teach you some basics of how to use Claude for healthcare stuff. Learn more and sign up here.

We’re hosting a happy hour/RCM trivia night with Nirvana and Joyful Health on 3/26 in NY. You should come if you:
- Are involved in revenue cycle at all at your current company
- Are senior at your company (everyone's title is made up, so whatever your equivalent of Director and up is)
- Will laugh if I come up to you and say “haven’t I seen UB-04?”
More details here - we have limited space so sign up sooner than later

