Time Series Forecasting with Kinsa Health and Expero

Using recurrent neural networks, we predict the spread of illness in the United States.

Time Series Forecasting with Kinsa Health and Expero

Fill out form to continue
All fields required.
Enter your info once to access all resources.
By submitting this form, you agree to Expero’s Privacy Policy.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Graham Ganssle, Ph.D., P.G.

February 4, 2019

Time Series Forecasting with Kinsa Health and Expero

Using recurrent neural networks, we predict the spread of illness in the United States.


By Sarah Pilewski: quantitative analytics cat herder and Graham Ganssle: time-series machine learning nerd

How widespread is the flu right now? In my state, are people generally getting more or less sick this week? What about in my county? What about next month?

These are all questions the CDC has been attempting to answer for twenty years, and they’re doing a pretty good job at it. They regularly poll hospitals for data to answer these questions, and they’re reasonably accurate to within a month or so. But if you’re trying to decide when to get your kid a flu shot, you don’t want to know what happened a month ago; you want to know what’s happening now, and what’s likely to happen in the next days, weeks, months.

Figure 1: The Kinsa Health web app. Note that the illness data varies nonlinearly both spatially and temporally.

That’s where Kinsa Health comes in. They aggregate illness data from all over the country in real time to provide accurate measurements of illness levels from nationwide, all the way down to the county level. Log into their app and you’ll see up-to-the-minute population illness percentages. Want to see the future? Now, Kinsa does that, too. Kinsa worked with Expero to build an illness forecast so their customers can see how the flu is going to spread for up to a year in the future.

Seems Simple Enough, Right?

Well, it’s not. Expero’s data science team has worked on many time series forecasting problems, and the spread of illness is a doozie. In fact, it’s a doozie in both the temporal and spatial domains!

Illness spread is a highly underdetermined, ill-constrained spatiotemporal system characterized by nonlinear propagation pathways. Modeling this system is tough. Really tough. Like, the CDC has a contest to do this, tough. Like, Google tried to do this in 2016 and failed, tough. Now it’s 2019, and working together with Expero, the experts at Kinsa were able to crack this nut.

Figure 2: The Kinsa app - a digital doctor in your pocket. Giving you advice in real time about what to do next to get better faster, or stay healthy longer.

For decades people have tried applying techniques like seasonal decomposition and ARIMA to build models of disease spread. If your desired level of accuracy is low enough, these models perform satisfactorily and have the added benefit of being simple enough that most analysts know how to build and maintain them. If, however, you require a higher level of accuracy which adapts to perturbations in your data patterns (read: illness spreads unexpectedly), you need machine learning.

Deep learning, in particular, does an excellent job at adapting to nonlinearities and inconsistencies in real world data. By using a geospatially linked mesh of recurrent neural networks, the Kinsa and Expero data science teams were able to build a model which forecasts the spread of illness in the United States to a surprisingly high level of accuracy. How do we know the system is accurate? We rolled back our input data ten times, once for every year in the 2008 - 2018 range, and built forward looking illness forecasts. Since we had the data about what actually occurred in those flu seasons, we were able to calculate the variance between the forecasted and actual seasons.

Figure 3: A validation run of the Kinsa long term forecast. Each panel represents one training+validation run of the model; we simply chop off the data after year n-1, train, and forecast for year n. The black signal is the true illness percentage and the colored signals are multiple forecasts run with different start dates. The closer the colored lines are to the black line, the more accurate the forecasts.

Now individuals have the ability to see when illness is on the rise in their area, and how fast it’s likely to increase or decrease. Schools and community centers can see when illness will peak, to determine when it’s time to start putting out the hand sanitizer! Moreover, clinics have the ability to look ahead to plan how much vaccine to purchase, enabling them to never run short. So whether you’re buying tissues, disinfectants, or vaccines, Kinsa’s got you covered with their illness forecast.

Illness forecasts are currently in beta, so get yourself a Kinsa profile and smart thermometer today!

Have questions about this blog? Email us at info@experoinc.com.

User Audience


Project Details