A new open dataset holding nearly 30,000 scientific articles was released to the public last week. The database, called CORD-19, is a step towards helping artificial intelligence combat the spread of COVID-19.
The dataset was released to encourage AI experts to “develop new techniques for mining data and text that could help answer some of the most pressing questions about the novel coronavirus and the disease it causes.” Out of the vast collection, around 13,000 articles contain usable, machine-readable data.
Many tech-giants have collaborated with the researchers to create CORD-19. Microsoft contributed its literature tools to transform the articles into machine-readable content, while National Institutes of Health’s National Library of Medicine made its literature content accessible. Facebook has also stepped in to provide access to articles that have been posted on pre-print servers through an initiative they call the “Chan Zuckerberg Initiative.”
With information readily available, the organizations that introduced CORD-19 now hope that the machine learning community can apply recent advanced natural language processing to answer important questions about COVID-19.
The database can be downloaded from Georgetown University’s Center for Security and Emerging Technology website.