Understanding how local tweets about symptoms correlate with case numbers

In this study, we aim to create a model of how tweets about self-reported COVID19 symptoms can help predict upcoming pandemic waves, and more generally the rise and fall of the disease. To that end, we crawled public tweets from the Paris region filtered by symptoms keywords, and plotted them in time. However, this filtering is very crude, e.g people don’t only tweet about symptoms when they are currently falling sick, but also about that one time a year ago when they fell sick, or when talking about the general news.
We provide a replicable, data-driven approach to investigate a potential second wave of the COVID-19 pandemic. In an open science spirit, we are particularly interested in how citizens can take part in research projects at all levels, from crowdsourcing to machine learning algorithms. This is why we tried to open the products of this study as much as possible:
- The input data is available for download
- All the crowdsourced annotations are available for download
- Our analyses are shared as Jupyter Notebooks
- All analyses are directly editable notebooks that can be run through MyBinder
The crowdsourcing platform is available on this link: https://covid-twitter.thecommons.science/