To me, data science is the combination of better decision-making with technology that enables us to actually execute those better decisions in real time. Understanding the science behind better decision making is not on its own sufficient. Technology is required to gather the relevant information, present it to people, and enable them to act upon it in real time. I learned this first-hand five years ago when I was working at the global health NGO, Partners In Health (PIH).
In rural Lesotho, PIH developed an app for feature phones that enabled community health workers to verify whether patients had actually taken their antiretroviral (ARV) medications via SMS. The text messages were uploaded directly into PIH’s electronic medical record (EMR) system, and allowed the medical staff to monitor ARV adherence rates in their catchment areas in real time. By using a simple feature phone app, PIH leapfrogged over the serious infrastructure challenges that are present in Lesotho– a landlocked country where there are a lot of mountains and very few roads.
Seeing PIH pioneer this solution inspired me to take a more technical turn in my career. When it came time to do my final project for my masters degree, I reached out to PIH to see if we could find an opportunity to put data science to work for PIH. We identified the following problem:
When patients arrive at PIH clinics, someone has to document their visit in the EMR. This person may not have much medical expertise; they might not be very familiar with the in’s and out’s of the EMR system. Still, the way the system is set up, that person has to record a presumed diagnosis into the EMR system. To deal with the problem currently, PIH allows entry technicians to enter whatever they want into the system as a presumed diagnosis. If the system doesn’t recognize the user input, the input is stored as a placeholder value that someone has to go back and manually code at some point in the future. Manually coding the data must be done by someone with greater medical/system knowledge–perhaps even a doctor. Looking at the entire process, we concluded these workers would be more effective seeing patients instead of dealing with data entry.
To fix the problem, we built Diagknowzit, a recommendation engine built on top of Open-MRS, the open source EMR used by PIH. Diagknowzit works similarly to Google’s “Did you mean…?” feature–except whereas “Did you mean…?” gets the user to spell things correctly, Diagknowzit matches the medical condition meant by the user to the EMR system’s official representation of that condition.
The challenges in building Diagknowzit could fill many blog posts, but here I’ll only mention a few. For one thing, the core codebase of Open-MRS–built using Spring/Hibernate and Java–was built in the mid 2000s and is a bit dated at this point. My partners and I had to spend a significant amount of time porting our knowledge of more Pythonic frameworks back in time to be able to build with the requisite toolset.
Perhaps the biggest challenge we faced was the fact that we only received the data we needed to train our recommendation engine two weeks before the final project was due. Sharing data across organizations is never easy, and when that information contains sensitive medical information, the risks are particularly high. It took a long time for our request to filter through all the necessary layers of oversight at PIH, but we’re very happy and grateful that it worked out in the end.
Because of the compressed timeline, we didn’t have quite as much time to explore and experiment with the data as I would have liked. Still, even with a relatively simple machine learning model, we were able to achieve pretty decent performance. Using multinomial logistic regression, our engine guessed correctly 71% of the time, which is in the ballpark of similar projects we found during our literature review.
Ultimately, our goal in doing this project was more than just to build something useful for PIH. I mentioned before how PIH’s EMR system, Open-MRS, is open source; it was actually developed by PIH and others as a response to the growing need for global health NGOs to manage their information effectively. The Open-MRS development community is thriving, but little attention has been paid thus far to the potential of data science-based tools to improve clinical decision support. We hope that our promising results inspire more development in this direction. In fact, if you are a member of the Open-MRS community and are interested in Diagknowzit or more tools like it, please don’t hesitate to get in touch 🙂
Sounds wonderful. Would like to learn more. Did you use the CIEL dictionary?