Machine learning (ML) models can be used to make predictions on unseen datasets based on patterns they have learned from data they have seen before. These models are increasingly being used to facilitate human decision making, allocate resources, and provide recommendations for a variety of programs and services. As a result, it’s particularly important to ensure that ML algorithms are based on unbiased inputs.
While individuals often make biased decisions that could result in negative outcomes for their families and communities, a biased ML algorithm might produce results that could have negative impacts on even larger groups of people. In addition, ML algorithms are implemented across more sectors than can be easily quantified (for example, healthcare, transportation, education, and so on), so determining how many people they affect is challenging. If algorithmic bias is not considered in policy development, these ML models could harm the very people the policies and programs are designed to help.
Building fair machine learning models is critical to equitable policy research. At Mathematica, we have identified and implemented several approaches for building fair ML models. Along with our partners and clients, we’ve learned that creating equitable models requires being proactive about how you (1) select and prepare data, (2) choose models, (3) build in transparency and explainability, and (4) monitor the model results and their application over time.
Select the right data and prepare it properly. Training ML models on collected or synthetically produced data that reflect historical biases can perpetuate systemic inequities; the resulting model will produce predictions that reflect those same biases if steps are not taken to adjust for them. Furthermore, if a model is trained on data that does not fully represent a population, it may produce inaccurate predictions for that population.
In the CMS AI Challenge, Mathematica used a sample of Medicare claims data to train ML models and predict unplanned hospital admissions and mortality. As we explored and analyzed the data, we noticed an imbalance when comparing the Medicare claims data sample to overall Medicare fee-for-services (FFS) beneficiaries by age, gender, and race. In the sample data, the Hispanic population was underrepresented, and the White population was overrepresented. To account for this imbalance, Mathematica weighted observations in the sample data to better represent the Medicare FFS population. Had we not addressed the data imbalance early on, we could have created biased predictions that might have negatively affected the amount of resources allocated to a population. It is difficult to obtain perfectly balanced data, but addressing imbalances can aid in fairer predictions.
Choose the right model to help address bias. Typically, model choices are made based on the nature of the data and the type of problem that needs to be solved. Simulation models can capture individual interactions and adjust for inequities at the individual level. We used an agent-based model (ABM), a type of simulation, to evaluate the cost-effectiveness and health equity impact of the Ryan White HIV/AIDS Program. Well-specified ABMs are a good choice for understanding equity-related issues because they can faithfully represent the complex socioecological structures that drive inequities. Further, ABMs can easily assess impacts on various groups, and provide insights into differential impacts. The complexity of human behavior demands that we choose a model that captures the interactions between humans and groups and can represent bidirectional causality.
Be transparent about the model development process and ensure that predictions from ML models are explainable. Even when model teams address sources of bias in the development process, models may still produce biased results due to the way they are implemented and used. To prevent this, developers must be transparent about the model development process, ensure that users can understand the variables that influenced the prediction, and clarify how the prediction should be used. Transparency and explainability benefit not only the individuals applying the model, but the individuals affected by the decisions the model makes. For example, SHAP (SHapley Additive exPlanations), an explainability metric, quantifies the impact that each feature (for example, race, age, sex, and so on) has on a model’s prediction. In addition, interactive dashboards that provide users with information about the data and methodology used to produce a prediction, like Mathematica’s 19 and Me, increase transparency and build trust with users.
Adopt monitoring tools to ensure bias doesn’t creep in over time. Monitoring enables us to track the performance of an ML model over time, shining a light on any potential bias due to model degradation. Negative impacts on model performance can result from not retraining or updating models when there are changes to input data, or policy updates. For example, if a new health care policy affecting readmission risk for patients is enacted, a new feature may need to be added or an old one may need to be removed from model input data. In such a case, if the model is not retrained or updated, its performance could decline, as it cannot account for this new policy. One Mathematica project that incorporates monitoring aims to replace some manual coding of survey responses with autocoding, using both rule-based and machine learning approaches. For this task, new data is obtained each year and the manual coding guidelines are subject to change. To account for model drift, the machine learning module of the autocoding strategy is tested on the new wave of data and may be retrained or adjusted if the model’s accuracy has decreased significantly since the previous year. This example shows how monitoring enables us to have more confidence in the ongoing performance and longevity of our algorithms.
There is no overnight solution to incorporating equity in data analytics. Rather, building equitable and fair ML models is a continuous process. However, given the potential for these models to negatively affect outcomes—particularly for large, underserved populations—model developers and users should make certain to refine the predictive performance of these models before they are implemented and while they are in use. As Mathematica continues to work to improve our processes for developing fair and equitable models, we are committed to sharing our approaches and to learning from others to ensure that we develop and deploy ML models in socially responsible ways.