Since 2007, the Social Security Administration has collected information on medication use among applicants for Social Security Disability Insurance (SSDI), but the size and unstructured nature of the data posed problems for analyzing trends. Recently, Mathematica used supervised machine learning to do what would have been too costly and too time intensive for human coders: comb through medication data reported by nearly six million SSDI applicants to estimate the prevalence of opioid use in the population.
On this episode of On the Evidence, we discuss the project with April Yanyuan Wu, a researcher at Mathematica. This episode is part of a series produced by Mathematica in support of a fall research conference hosted by the Association for Public Policy Analysis & Management in November. Wu presented her findings at the conference.
Click here to listen to the full interview with Wu. You can also read edited excerpts of the conversation in the following transcript.
Why are policymakers interested in opioid use among SSDI applicants?
Several factors make us believe that the use of opioids may be common among this population. First, opioid use is prevalent in the United States, and since the 1990s, there has been a marked increase in opioid use, opioid addiction, and opioid overdose deaths. Second, there is a suspected link between opioid use and declining work, [which might affect applications for a safety net program like SSDI that serves people who are currently unable to work because of a medical condition]. And third, a large share of new SSDI awardees have conditions associated with opioid use.
What are some of those conditions?
For example, a musculoskeletal condition [such as osteoarthritis or pain in the neck or back]. Also, recent research shows that the wider availability of opioids is associated with increasing the application rate for the SSDI program. Although applicants cannot qualify for the disability insurance program solely based on a drug addiction, using opioids may exacerbate the effects of other conditions that meet the SSDI qualifications. Therefore, understanding opioid use among Social Security Disability Insurance applicants is important for the program. It can help the Social Security Administration understand the health care needs and use among this group and also identify ways to help them.
Okay, so that’s why policymakers would be interested in this topic and this population. What about you? Why are you personally interested in this research topic?
Not only because this topic is, in itself, exciting, but also because there is a big knowledge gap. Since 2007, SSDI applications were collected and stored electronically. At the time of application, those applicants are asked, what kind of medication are you taking, and can you report the medication name? Applicants are provided a drop-down list with 630 prefilled drug names and asked to choose. If you don’t choose from the prefilled list, then you enter the drug name yourself. We call the manually entered responses unstructured data, and it’s normally very difficult to use.
When we look at the unstructured data, there are a lot of misspellings. Even for the hydrocodone, I see over 1,000 variations in how people report it. There are also lengthy sentences, such as, “I don’t even know the drug name because Michael did not tell me,” or “I don’t have my drug bottle with me.” That type of information is very difficult to handle. Previously, we would have had to train human coders. We would have given them a manual on how to categorize medication information as opioids or non-opioids, and it would have been very labor intensive, time consuming, prone to error, and also cost-prohibitive given how big the program is. It’s almost impossible to get information just by manual coding. This is why there’s a huge knowledge gap. When we became interested in understanding this population, we found that there were no existing statistics for them, and we started to research what we could do with this data.
I want to talk in a moment about the novel approach that you took to estimate the opioid use among SSDI applicants, but I don’t want to bury the lead too much. Do you mind sharing some of your main findings from the research?
Of course. We found that opioid use is common among SSDI applicants. From 2007 to 2017, over 30 percent of those applicants reported using opioids. This is high compared with the general population. In 2016, about 19 percent of the U.S. population filled opioid prescriptions. We also found substantial variation by subgroup. For example, women are more likely to report opioid use than men. Those age 40 to 49 are the most likely age group to report opioid use. Those with some college education are the most likely education group to report opioid use.
One point of clarification: We’re saying opioid use. Do we have information about what SSDI applicants do with the opioids after they’re prescribed?
We don’t know that information. What we know is the reported medication they are currently using, so it’s really use of opioids [reported by the applicant] at the time of application.
What’s one major policy implication of the findings?
We think the SSDI application provides a potential opportunity for us to identify opioids users and to connect to them to support services. During our analysis period, we estimate from our data that there are nearly about six million opioid users applying to the program. We think we can use the application as a contact point for interventions that might be helpful to curtail opioid addiction or opioid overdoses.
How would that work? Would the Social Security Administration say, “You meet a certain profile for being at higher risk of opioid misuse?”
We don’t really know the dosage information. We only know if you use opioids or not. We think that if we identify the user, the Social Security Administration can potentially connect [opioid users] to treatment or another type of help, such as an education program. There are people who definitely need to use opioids for pain management but could still benefit from education on how not to overuse or misuse them and risk an overdose.
What was different about your methodological approach, and what did you learn about the benefits and limitations of that approach?
For this project, we used the medication information from the Social Security Administration. That information, as I mentioned earlier, is collected at the time of application and is stored in the Structured Data Repository of the Social Security Administration. We used a random sample of 30 percent of applicants from 2007 to 2017. In our data, we had about six million applicants. On average, each applicant in our sample reported close to five medications they were using at the time of application. Then we developed a machine-learning approach to categorize the unstructured medication data into opioids versus nonopioids, and we produced statistics based on what we found.
In terms of the benefits of this approach, the algorithm is much more time efficient. Depending on how big the data set is, we can handle millions of observations in days, which is quite quick compared to human coders. It’s also a very cost-efficient approach. The natural language processing literature indicated that the cost reduction is over 65 percent compared to a manual coding approach—and when your data set is big, [saving 65 percent of costs] could be huge. Our algorithm was also 99 percent accurate, which means it reduces the probability of a human making mistake. (In the industry standard for human coders, 70 percent to 80 percent accuracy is accepted as normal.)
And at least in this case, you’re providing insights from data that the Social Security Administration couldn’t use before.
Yes, they did not have the capacity to use it because of the nature of the data, but they were interested in the research questions [about opioid use]. So we were able to use our approach to help them use the data they have and understand their applicants better.
What questions do you hope future studies explore in this area?
The findings from this project only represent the tip of the iceberg of knowledge that we can obtain by using this methodology to mine rich data from the Social Security Administration. The success of this methodology makes us think that we can apply it to other unstructured medication data stored in the same data set. For example, Social Security Disability Insurance applicants also report using medications for mental health. We could help the Social Security Administration identify the mental health medication use among this group.
Based on what we found with the high prevalence of opioid use among SSDI applicants, we also wanted to further understand the association between the opioid use at the time of application and the application decision. In other words, given your use of opioids at the time application, were you more likely to get on Social Security Disability Insurance, or were you more likely to get rejected? And how are these related to your later employment outcomes? Because there has been some debate about the effects of opioid use; when you take opioids, it’s managing the pain, so people can stay at work longer, but it might also make your condition worse. You also might not be able to come back [to work] if you become addicted.
So we would like to see, of those who report using opioids at the time of application, do they come back to work? Luckily, the Social Security Administration has continued to fund our project, and we have the opportunity to keep exploring these exciting questions.
Want to hear more episodes of On the Evidence? Visit our podcast landing page or subscribe for future episodes on Apple Podcasts or SoundCloud.