Science-Watching: Forecasting New Diseases in Low-Data Settings Using Transfer Learning

[from London Mathematical Laboratory]

by Kirstin Roster, Colm Connaughton & Francisco A. Rodrigues


Recent infectious disease outbreaks, such as the COVID-19 pandemic and the Zika epidemic in Brazil, have demonstrated both the importance and difficulty of accurately forecasting novel infectious diseases. When new diseases first emerge, we have little knowledge of the transmission process, the level and duration of immunity to reinfection, or other parameters required to build realistic epidemiological models. Time series forecasts and machine learning, while less reliant on assumptions about the disease, require large amounts of data that are also not available in early stages of an outbreak. In this study, we examine how knowledge of related diseases can help make predictions of new diseases in data-scarce environments using transfer learning. We implement both an empirical and a synthetic approach. Using data from Brazil, we compare how well different machine learning models transfer knowledge between two different dataset pairs: case counts of (i) dengue and Zika, and (ii) influenza and COVID-19. In the synthetic analysis, we generate data with an SIR model using different transmission and recovery rates, and then compare the effectiveness of different transfer learning methods. We find that transfer learning offers the potential to improve predictions, even beyond a model based on data from the target disease, though the appropriate source disease must be chosen carefully. While imperfect, these models offer an additional input for decision makers for pandemic response.


Epidemic models can be divided into two broad categories: data-driven models aim to fit an epidemic curve to past data in order to make predictions about the future; mechanistic models simulate scenarios based on different underlying assumptions, such as varying contact rates or vaccine effectiveness. Both model types aid in the public health response: forecasts serve as an early warning system of an outbreak in the near future, while mechanistic models help us better understand the causes of spread and potential remedial interventions to prevent further infections. Many different data-driven and mechanistic models were proposed during the early stages of the COVID-19 pandemic and informed decision-making with varying levels of success. This range of predictive performance underscores both the difficulty and importance of epidemic forecasting, especially early in an outbreak. Yet the COVID-19 pandemic also led to unprecedented levels of data-sharing and collaboration across disciplines, so that several novel approaches to epidemic forecasting continue to be explored, including models that incorporate machine learning and real-time big data data streams. In addition to the COVID-19 pandemic, recent infectious disease outbreaks include Zika virus in Brazil in 2015, Ebola virus in West Africa in 2014–16, Middle East respiratory syndrome (MERS) in 2012, and coronavirus associated with severe acute respiratory syndrome (SARS-CoV) in 2003. This trajectory suggests that further improvements to epidemic forecasting will be important for global public health. Exploring the value of new methodologies can help broaden the modeler’s toolkit to prepare for the next outbreak. In this study, we consider the role of transfer learning for pandemic response.

Transfer learning refers to a collection of techniques that apply knowledge from one prediction problem to solve another, often using machine learning and with many recent applications in domains such as computer vision and natural language processing. Transfer learning leverages a model trained to execute a particular task in a particular domain, in order to perform a different task or extrapolate to a different domain. This allows the model to learn the new task with less data than would normally be required, and is therefore well-suited to data-scarce prediction problems. The underlying idea is that skills developed in one task, for example the features that are relevant to recognize human faces in images, may be useful in other situations, such as classification of emotions from facial expressions. Similarly, there may be shared features in the patterns of observed cases among similar diseases.

The value of transfer learning for the study of infectious diseases is relatively under-explored. The majority of existing studies on diseases remain in the domain of computer vision and leverage pre-trained neural networks to make diagnoses from medical images, such as retinal diseases, dental diseases, or COVID-19. Coelho and colleagues (2020) explore the potential of transfer learning for disease forecasts. They train a Long Short-Term Memory (LSTM) neural network on dengue fever time series and make forecasts directly for two other mosquito-borne diseases, Zika and Chikungunya, in two Brazilian cities. Even without any data on the two target diseases, their model achieves high prediction accuracy four weeks ahead. Gautam (2021) uses COVID-19 data from Italy and the USA to build an LSTM transfer model that predicts COVID-19 cases in countries that experienced a later pandemic onset.

These studies provide empirical evidence that transfer learning may be a valuable tool for epidemic forecasting in low-data situations, though research is still limited. In this study, we aim to contribute to this empirical literature not only by comparing different types of knowledge transfer and forecasting algorithms, but also by considering two different pairs of endemic and novel diseases observed in Brazilian cities, specifically (i) dengue and Zika, and (ii) influenza and COVID-19. With an additional analysis on simulated time series, we hope to provide theoretical guidance on the selection of appropriate disease pairs, by better understanding how different characteristics of the source and target diseases affect the viability of transfer learning.

Zika and COVID-19 are two recent examples of novel emerging diseases. Brazil experienced a Zika epidemic in 2015–16 and the WHO declared a public health emergency of global concern in February 2016. Zika is caused by an arbovirus spread primarily by mosquitoes, though other transmission methods, including congenital and sexual have also been observed. Zika belongs to the family of viral hemorrhagic fevers and symptoms of infection share some commonalities with other mosquito-borne arboviruses, such as yellow fever, dengue fever, or chikungunya. Illness tends to be asymptomatic or mild but can lead to complications, including microcephaly and other brain defects in the case of congenital transmission.

Given the similarity of the pathogen and primary transmission route, dengue fever is an appropriate choice of source disease for Zika forecasting. Not only does the shared mosquito vector result in similar seasonal patterns of annual outbreaks, but consistent, geographically and temporally granular data on dengue cases is available publicly via the open data initiative of the Brazilian government.

COVID-19 is an acute respiratory infection caused by the novel coronavirus SARS-CoV-2, which was first detected in Wuhan, China, in 2019. It is transmitted directly between humans via airborne respiratory droplets and particles. Symptoms range from mild to severe and may affect the respiratory tract and central nervous system. Several variants of the virus have emerged, which differ in their severity, transmissibility, and level of immune evasion.

Influenza is also a contagious respiratory disease that is spread primarily via respiratory droplets. Infection with the influenza virus also follows patterns of human contact and seasonality. There are two types of influenza (A and B) and new strains of each type emerge regularly. Given the similarity in transmission routes and to a lesser extent in clinical manifestations, influenza is chosen as the source disease for knowledge transfer to model COVID-19.

For each of these disease pairs, we collect time series data from Brazilian cities. Data on the target disease from half the cities is retained for testing. To ensure comparability, the test set is the same for all models. Using this empirical data, as well as the simulated time series, we implement the following transfer models to make predictions.

  • Random forest: First, we implement a random forest model which was recently found to capture well the time series characteristics of dengue in Brazil. We use this model to make predictions for Zika without re-training. We also train a random forest model on influenza data to make predictions for COVID-19. This is a direct transfer method, where models are trained only on data from the source disease.
  • Random forest with TrAdaBoost: We then incorporate data from the target disease (i.e., Zika and COVID-19) using the TrAdaBoost algorithm together with the random forest model. This is an instance-based transfer learning method, which selects relevant examples from the source disease to improve predictions on the target disease.
  • Neural network: The second machine learning algorithm we deploy is a feed-forward neural network, which is first trained on data of the endemic disease (dengue/influenza) and applied directly to forecast the new disease.
  • Neural network with re-training and fine-tuning: We then retrain only the last layer of the neural network using data from the new disease and make predictions on the test set. Finally, we fine-tune all the layers’ parameters using a small learning rate and low number of epochs. These models are examples of parameter-based transfer methods, since they leverage the weights generated by the source disease model to accelerate and improve learning in the target disease model.
  • Aspirational baseline: We compare these transfer methods to a model trained only on the target disease (Zika/COVID-19) without any data on the source disease. Specifically, we use half the cities in the target dataset for training and the other half for testing. This gives a benchmark of the performance in a large-data scenario, which would occur after a longer period of disease surveillance.

The remainder of this paper is organized as follows. The models are described in more technical detail in Section 2. Section 3 shows the results of the synthetic and empirical predictions. Finally, Section 4 discusses practical implications of the analyses.

Access the full paper [via institutional access or paid download].

Essay 89: Physics AI Predicts That Earth Goes Around the Sun

from Nature Briefing:

Hello Nature readers,

Today we learn that a computer Copernicus has rediscovered that Earth orbits the Sun, ponder the size of the proton and see a scientific glassblower at work.

Physicists have designed artificial intelligence that thinks like the astronomer Nicolaus Copernicus by realizing the Sun must be at the center of the Solar System. (NASA/JPL/SPL)

AI ‘Discovers’ That Earth Orbits the Sun [PDF]

A neural network that teaches itself the laws of physics could help to solve some of physics’ deepest questions. But first it has to start with the basics, just like the rest of us. The algorithm has worked out that it should place the Sun at the centre of the Solar System, based on how movements of the Sun and Mars appear from Earth.

The machine-learning system differs from others because it’s not a black that spits out a result based on reasoning that’s almost impossible to unpick. Instead, researchers designed a kind of ‘lobotomizedneural network that is split into two halves and joined by just a handful of connections. That forces the learning half to simplify its findings before handing them over to the half that makes and tests new predictions.

Next FDA Chief Will Face Ongoing Challenges

U.S. President Donald Trump has nominated radiation oncologist Stephen Hahn to lead the Food and Drug Administration (FDA). If the Senate confirms Hahn, who is the chief medical executive of the University of Texas MD Anderson Cancer Center, he’ll be leading the agency at the centre of a national debate over e-cigarettes, prompted by a mysterious vaping-related illness [archived PDF] that has made more than 2,000 people sick. A former FDA chief says Hahn’s biggest challenge will be navigating a regulatory agency under the Trump administration, which has pledged to roll back regulations.

Do We Know How Big a Proton Is?

A long-awaited experimental result has found the proton to be about 5% smaller than the previously accepted value. The finding seems to spell the end of the ‘proton radius puzzle’: the measurements disagreed if you probed the proton with ordinary hydrogen, or with exotic hydrogen built out of muons instead of electrons. But solving the mystery will be bittersweet: some scientists had hoped the difference might have indicated exciting new physics behind how electrons and muons behave.

Contingency Plans for Research After Brexit

The United Kingdom should boost funding for basic research and create an equivalent of the prestigious European Research Council (ERC) if it doesn’t remain part of the European Union’s flagship Horizon Europe research-funding program [archived PDF]. That’s the conclusion of an independent review of how UK science could adapt and collaborate internationally after Brexit — now scheduled for January 31, 2020.

Nature’s 150th anniversary

A Century and a Half of Research and Discovery

This week is a special one for all of us at Nature: it’s 150 years since our first issue, published in November 1869. We’ve been working for well over a year on the delights of our anniversary issue, which you can explore in full online.

10 Extraordinary Nature Papers

A series of in-depth articles from specialists in the relevant fields assesses the importance and lasting impact of 10 key papers from Nature’s archive. Among them, the structure of DNA, the discovery of the hole in the ozone layer above Antarctica, our first meeting with Australopithecus and this year’s Nobel-winning work detecting an exoplanet around a Sun-like star.

A Network of Science

The multidisciplinary scope of Nature is revealed by an analysis of more than 88,000 papers Nature has published since 1900, and their co-citations in other articles. Take a journey through a 3D network of Nature’s archive in an interactive graphic. Or, let us fly you through it in this spectacular 5-minute video.

Then dig deeper into what scientists learnt from analyzing tens of millions of scientific articles for this project.

150 Years of Nature, in Graphics

An analysis of the Nature archive reveals the rise of multi-author papers, the boom in biochemistry and cell biology, and the ebb and flow of physical chemistry since the journal’s first issue in 1869. The evolution in science is mirrored in the top keywords used in titles and abstracts: they were ‘aurora’, ‘Sun’, ‘meteor’, ‘water’ and ‘Earth’ in the 1870s, and ‘cell’, ‘quantum’, ‘DNA’, ‘protein’ and ‘receptor’ in the 2010s.

Evidence in Pursuit of Truth

A century and a half has seen momentous changes in science, and Nature has changed along with it in many ways, says an Editorial in the anniversary edition. But in other respects, Nature now is just the same as it was at the start: it will continue in its mission to stand up for research, serve the global research community and communicate the results of science around the world.

Features & Opinion

Nature covers: from paste-up to Photoshop

Nature creative director Kelly Krause takes you on a tour of the archive to enjoy some of the journal’s most iconic covers, each of which speaks to how science itself has evolved. Plus, she touches on those that didn’t quite hit the mark, such as an occasion of “Photoshop malfeasance” that led to Dolly the sheep sporting the wrong leg.

Podcast: Nature bigwigs spill the tea

In this anniversary edition of BackchatNature editor-in-chief Magdalena Skipper, chief magazine editor Helen Pearson and editorial vice president Ritu Dhand take a look back at how the journal has evolved over 150 years, and discuss the part that Nature can play in today’s society. The panel also pick a few of their favorite research papers that Nature has published, and think about where science might be headed in the next 150 years.

Where I Work

Scientific glassblower Terri Adams uses fire and heavy machinery to hand-craft delicate scientific glass apparatus. “My workbench hosts an array of tools for working with glass, many of which were custom-made for specific jobs,” says Adams. “Each tool reminds me of what I first used it for and makes me consider how I might use it again.” (Leonora Saunders for Nature)

Quote of the Day

“At the very least … we should probably consider no longer naming *new* species after awful humans.”

Scientists should stop naming animals after terrible people — and consider renaming the ones that already are, argues marine conservation biologist and science writer David Shiffman. (Scientific American)

Yesterday was Marie Skłodowska Curie’s birthday, and for the occasion, digital colorist Marina Amaral breathed new life into a photo of Curie in her laboratory

(If you have recommended people before and you want them to count, please ask them to email me with your details and I will make it happen!) Your feedback, as always, is very welcome at

Flora Graham, senior editor, Nature Briefing