On February 27, a teenager in the Seattle area was diagnosed with Covid-19. Shortly after, researchers at the Seattle Flu Study shared genomic data about his strain of the virus with other researchers on an “open science” site. Armed with that data, researchers involved with a second open science project determined that the teenager’s strain was a direct descendent of a strain of Covid-19 found in an unrelated patient in the Seattle area on January 20. The discovery was a key link in concluding that the virus had been spreading in the Seattle area for weeks.

The way researchers connected those dots highlights the role of open science projects in tracking the evolution of Covid-19 and other diseases. Sharing data and working collaboratively across the web, scientists are quickly analyzing genetic samples, helping to shape the public response. But the rush to interpret the data also creates new risks.

Viruses like Covid-19 spread by making copies of themselves. Each time they replicate, there’s a chance that an error will be made, making the latest copy slightly different from the previous one. Emma Hodcroft, a postdoctoral quantitative genetics researcher at the University of Basel in Switzerland, likens these errors, known as mutations, to typos in the virus’s DNA.

Most of these mutations are trivial, and don’t change how the virus affects the body. But scientists can use mutations to track the spread of a virus. If two people in different places are infected with a version of the virus with particular mutations, it’s a safe bet those two cases are related, even if the two people never met each other.

In the case of the Seattle area teenager, genetic data about his strain of Covid-19 was uploaded to Gisaid, a platform for sharing genomic data. Then researchers at Nextstrain made the connection with the earlier patient.

an abstract depiction of screens and bubbles connected

The WIRED Guide to Open Source Software

Everything you ever wanted to know about Linux, GNU, and how big companies are making money off of free, collaboration-based software.

Nextstrain is an open source application that tracks the evolution of viruses and bacteria, including Covid-19, Ebola, and lesser-known outbreaks such as Enterovirus D68 using data sourced largely from Gisaid. Hodcroft and other researchers involved with the project analyze the data shared on Gisaid for mutations and visualize the results. That’s how the team was able to spot the connection between the two Covid-19 cases in Washington.

Nextstrain’s work is enabled by the widespread sharing of data by scientists and health professionals. Duncan MacCannell, the chief science officer for the Center for Disease Control’s Office of Advanced Molecular Detection, says public health authorities, universities, and clinical laboratories are releasing genomic data from Covid-19 specimens at unprecedented speed–often within 48 hours of a specimen arriving at a sequencing laboratory.

“Nextstrain can be used to give a quick snapshot of how the virus has spread across regions and how local outbreaks are connected,” says Kristian G. Andersen, a computational biologist at Scripps Research.

Because the underlying code used by the Nextstrain team is open source, other researchers could build their own versions of the Nextstrain site or use Nextstrain’s code as the foundation for new projects. More importantly, it also lets other scientists evaluate the scientific validity of the team’s work says contributor James Hadfield.

The sort of genetic analysis that Nextstrain does isn’t new, in and of itself. Researchers traditionally publish their work primarily through academic journals. But the explosion of genomic data available on Gisaid, and the speed with which it’s uploaded, creates new opportunities to bridge the gap between public health and academia, and to enable novice users to explore the data as well.

Skipping the traditional peer review phase has disadvantages. On March 3, Nextstrain cofounder Trevor Bedford, a researcher at the Fred Hutchinson Cancer Research Center in Seattle, wrote on Twitter that a strain circulating in Lombardy, Italy, was related to one found in Munich, Germany, that public health officials had said was contained.

Illustrated woman, speech bubble, virus cell

What Is the Coronavirus?

Plus: How can I avoid catching it? Is Covid-19 more deadly than the flu? Our in-house Know-It-Alls answer your questions.

Other scientists disagreed with Bedford’s analysis, as noted by Science magazine. For example, Christian Drosten, the virologist at the Charite University Hospital in Berlin who sequenced the Munich strain, spotted the similarities between the German and Italian strains last month and wrote on Twitter that it was “not sufficient to claim a link between Munich and Italy.” It’s possible that the strain arrived in both Munich and Italy from the same outside source, Drosten noted.