Apply by Aug. 20
Event Dates and Time
Sept. 27-Oct. 1, 1-5 p.m. EDT
Solicitation
The National Institutes of Health (NIH) Office of Data Science Strategy, the National Center for Biotechnology and Information at the National Library of Medicine, and the Department of Energy's (DOE) Office of Biological and Environmental Research invite you to apply to the virtual Petabyte-Scale Sequence Search: Metagenomics Benchmarking Codeathon. NIH’s Sequence Read Archive now makes more than 14 petabytes of data available in the cloud. To make full use of this data, the scientific community needs high-performance search tools that can work efficiently up to the petabyte scale.
The focus of the codeathon is creating publicly available resources that make it easy for scientists to compare sequence search methods across a standardized set of benchmarks and datasets. This event is part of a series bringing together a diverse group (biologists, bioinformaticians, statisticians, mathematicians, computer scientists, and engineers) of collaborators to develop and test new approaches for sequence search. Codethon projects will lay the groundwork for future events focused on methods development.
Throughout the online, interactive codeathon, participants will:
- Have access to Zoom meetings and team breakout rooms for collaboration and access to experts from the NIH and DOE.
- Participate in a dedicated Slack channel to source assistance and collaboration.
- Hear from topic experts about Petabyte-Scale Sequence Search.
- Work together to solve biological problems and create computational tools.
At the end of the codeathon event, teams will have authorship over their projects, which can be submitted for publication, presented at conferences, or hosted online for public access.
If you are interested in attending the codeathon, please apply by Aug. 20.
Apply Now
Contacts
If you have questions about the Petabyte-Scale Sequence Search Codeathon, please contact [email protected].
Frequently Asked Questions
What are the topics for codeathon projects?
Each codeathon team will focus on developing benchmarks for a sequence search problem in metagenomics, suggested by participants in the Emerging Solutions in Petabyte Scale Sequence Search Workshop:
- Retrieve metagenomic samples with user-provided short queries, such as kmers, genes, or sequences that are <= 5Kb.
- Retrieve metagenomic samples with user-provided long queries, such as genomes for viruses, bacteria, or fungi.
- Retrieve contigs and/or reads with a read, contig, genome as a search query.
- Generate known-species catalogs for a large number of metagenomes.
- Identify reads within metagenomic samples that match DNA or protein sequences.
What are the objectives for codeathon teams?
Each team will work together to:
- Identify a well-curated, gold-standard dataset appropriate for their problem.
- Define performance benchmarks for methods.
- Create an automated pipeline to access data and benchmark methods.
We will provide cloud computing resources before, during, and after the event to facilitate data access and benchmarking.
What is expected of codeathon participants?
All team members are expected to be collegial and available throughout the codeathon and will work with their team to plan, delegate, and execute tasks related to the project. Participants with strong writing, communication, data visualization, programming, and/or scientific knowledge are valuable to this event.
What are the expected outcomes of the codeathon?
- Products: Ideally, teams will create some or all of the following:
- A curated dataset, available on the cloud
- A set of performance benchmarks for the scientific task
- An image with all relevant software installed
- An automated workflow for benchmarking new software
These products can be hosted on a website for public use and be incorporated in a post-event publication.
- Networking: Participants will have the opportunity to network within their codeathon team and with all attendees during meetings, lightning stand ups, and final presentation.
- Education: Codeathons are unique educational opportunities for participants at all stages and everyone is expected to share and acquire knowledge throughout the event.