ASBCB Omics Codeathon

Friday, September 9, 2022

Omics Codeathon is a biannual event where life scientists work on research projects. Olaitan I. Awe, the current training officer for the African Society for Bioinformatics and Computational Biology (ASBCB), leads the Codeathon.

Codeathons have been organized through a collaboration between ASBCB and the National Institutes of Health (NIH) Office of Data Science Strategy (ODSS), with support from the National Center for Biotechnology Information (NCBI).

Codeathon applicants and participants come from all over the world, including South Africa, Nigeria, Kenya, Zimbabwe, Morocco, Tunisia, Egypt, Senegal, Mali, Ghana, Brazil, Uganda, Tunis, Poland, Tanzania, Dakar, Mozambique, Bangladesh, United States, Algeria, Mexico, Burkina Faso, Sweden, and China.

The April 2022 Codeathon was held virtually, with projects categorized into Bulk Transcriptomics, Metagenomics, Pathogen Genomics, Text Mining, Population Genomics, Epigenomics, Clinical Applications, Oncology, and Plant Genomics. There are plans for omics categories like ATAC-seq, CHiP-seq, and single-cell omics.

Projects
TitleTeamProject Description
ReGo: A Mobile Application for Reading Research Papers on the Go

Wilson Mudaki, Temiloluwa Dele-Alimi, Julius Mwakosya and Olaitan I. Awe

The availability of biomedical literature has increased in the last decade. PubMed holds about 30 million papers with an additional one million papers added annually. However, the enormous resources being produced make it hard to find relevant documents from databases. Without the right skillset, traditional databases can be difficult to use. We developed ReGo, a mobile application that provides research papers to users in real-time, with a graphical user interface designed with React Native and functionality provided through Node JS and Mongo DB. ReGo uses GraphQl as the middleware. Data is retrieved using the E-Utilities API. ReGo finds publications based on the user’s keywords, runs queries, allows the user to save favorite papers, and provides abstracts and produces full-text publications for articles that are open access. ReGo also provides a more personalized user experience.
Exploring Genomic Newborn Screening Using Public Next Generation Sequencing Data

Olaitan I. Awe, Rissy M. Wesonga, and Fatima Z. Annassiri

Inborn errors of metabolism are debilitating heritable disorders commonly manifesting in infancy and early childhood. Traditional newborn screening favors the diagnosis of certain diseases over others while genomic newborn screening (gNBS) can diagnose multiple disorders at scale. We explored gNBS by using public neonatal datasets obtained from targeted sequencing experiments and deposited in the Sequence Read Archive. We then analyzed the datasets and annotated them using standard bioinformatics techniques. Finally, we did variant calling on neonatal sequences to validate our curated gene panel. We observed that certain pathogenic variants are associated with congenital conditions and can be used for newborn screening. Our goal is to have a well-curated gene panel with potential utility in clinical practice. Our data exploration highlights the role of gNBS in the identification of known and unknown congenital disorders.
Transcriptomic and Epigenomic Changes in Autoimmune Demyelinating Diseases: A Bioinformatics Analysis

Hiba Ben Aribi, Farah Ayadi, Careen Naitore, and Souheila Guerbouj

The identification of differentially expressed genes (DEGs) in human pathologies is crucial to understanding the biological differences between healthy and diseased states because DEGs are potential biomarkers and therapeutic targets for treatment. Epigenomic changes play a pivotal role in pathogenesis. The study of methylated CpG sites in promoter regions and their interfering miRNAs can help to explain the differential expression of a gene.


Prevalence areas are important in influencing the changes in genomic, epigenomic, and transcriptomic profiles in relation to pathologies but it can be difficult to find a specific tool that will filter these three omic data types from multiple kinds of studies or datasets in a single workflow. We investigated population-specific transcriptomic and epigenomic changes in two autoimmune demyelinating diseases; Multiple Sclerosis (MS) and Systemic Lupus Erythematosus (SLE). The study also investigated MS and SLE in order to determine the common and disease-specific epigenetic factors and biological processes.


The developed pipeline can be used to find potential biomarkers of any targeted pathology.

Comparative Study Between Molecular and Genetic Evolutionary Analysis Tools Using African SARS-CoV-2 Variants

Olaitan I. Awe, Nouhaila En najih, and Latifah Benta Mukanga

Most phylogenetic analysis tools are complex and require a lot of expertise to use them. Our study compares 4 phylogenetic analysis tools (MEGA, GALAXY, Geneious, and Seaview) and informs on their features to help the user choose the appropriate tool for their research and their resources, especially for people who want to get into bioinformatics in Africa. Our study will save biologists enormous time and resources by choosing phylogenetic analysis tools to understand biological data. We discovered a gap in using the African SARS-CoV-2 genome for research and this informed our choice of datasets.
Enhanced Deep Convolutional Neural Network for SARS-CoV-2 Variants Classification

Mike Mwanga, Hesborn Omwandho, and Evans Mudibo

High-throughput sequencing provides an unbiased identification of viruses present in samples. However, it requires large-scaled reference sequence databases to compare against, resulting in considerable computing requirements. Machine learning methods are employed as an alternative in sequence analysis. These methods can extract important features for classification in a computationally efficient manner. We applied the Convolutional Neural Networks (CNN) model to classify SARS-CoV-2 variants. Spike protein sequences were extracted from publicly available SARS-CoV-2 genomes. Aligned sequences were then split into 4 k-mers and converted to binary form and fed into the CNN model for feature extraction, model training, fitting, and validation. Ultimately, this will enable effective monitoring and tracking of SARS-CoV-2 variants and contribute to the control of current and future pandemics.
Expression Level Analysis of ACE2 Receptor Gene in African American and Non-African CoVID-19 Patients

Marion Nyaboke, Kauthar M. Omar, Ayorinde F. Fayehun, Oumaima Dachi, and Billiah Kemunto Bwana

The incidence and mortality rate of CoVID-19 caused by SARS-CoV-2 has been reported to be lower in African populations, where malaria is endemic. ACE2 receptors are required for SARS-CoV-2 entry into host cells, whereas downregulation of ACE2 gene leads to Angiotensin-II, which impairs Plasmodium development. Low ACE2 expression may be responsible for the observed low incidence of CoVID-19 in African populations, as well as protection against malaria. Little is known about ACE2 expression in African CoVID-19 patients compared to non-African patients. Our hypothesis is that there is a potential correlation between high malaria incidence in African populations and the reported low CoVID-19 incidence. We analyzed RNA-seq data of African and non-African CoVID-19 patients for ACE2 gene expression and conducted differential gene expression and gene enrichment analysis for the two populations.
Microbiome Data Mining of CoVID-19

Sofia Sehli, Nihal Habib, and Adijat O. Jimoh

Antibiotic treatment is the primary therapeutic method utilized to treat CoVID-19. Given that such a method quickly produces antibiotic-resistant strains of opportunistic microorganisms, improved antibiotic therapy is essential to effectively control long-term symptoms and future pandemics, notably in patients infected with SARS-CoV-2.

Playing a key role in precision medicine, the gut microbiome is one important component to study among different microbial environments to develop effective adjunct treatments for targeted probiotics or FMT that can be of great use for the CoVID-19 or long CoVID, one helping to seize symptoms such as diarrhea.

We analyzed 732 stool/nasal/saliva microbiota samples from 8 regions — United States, Italy, Germany, France, China, India, Japan, and North America — that were publicly available in the Sequence Read Archive. We performed 16S rRNA amplicon analysis using the DADA2 pipeline. We then clustered the resulting microorganisms that were most abundant in or common between each region’s samples.

Multi-omics Data Analytics Integration in Prostate Cancer

Zedias Chikwambi, Marie Hidjo, Lawrence Afolabi, Vincent Aketch, Pageneck Chikondowa, and David Enoma

Prostate cancer (PCa) is one of the most common malignancies and the second leading cause of tumor-related death among males worldwide. Studies have shown that African men have a poor PCa prognosis compared to their Caucasian counterparts, and this is suspected to be caused by genetic diversity of the African race. High-throughput omics technologies have identified and shed light on the mechanisms of prostate cancer, but a systems biology approach is needed for a holistic molecular perspective of the mechanisms. In this study we applied a multi-omics approach to data analysis using three publicly available PCa omics datasets from genomics, transcriptomics, and metabolomics experiments to explain the PCa mechanism, and we provided a simplified workflow for its implementation.
Comparative Analysis of Plant and Animal Multiomic Data for the Detection of Genomic Features

Abdellah Idrissi Azami, Douae El Ghoubali, Zainab El Ouafi, and Mustapha Lemsyah

Eukaryotic cells are known to have several genomes, including nuclear and organelle genomes, the latter of which are thought to be derived from prokaryotic origin via endosymbiosis, making them behave more like prokaryotic cells than eukaryotes. Genome communication via conjugation, transformation, and transduction is one of the recognized prokaryotic behaviors. In this study, we hoped to uncover the communication between various eukaryotic genomes and discover how it manifests itself in diverse living groups. To achieve our goal, we mined diverse data from the NCBI genome database using several genome parameters (Size, Number of genes, GC count) and examined the frequency of horizontal gene transfer (HGT) for each species.
Identifying LncRNA Biomarkers in Adult Febrile Patients of Malaria and CoVID-19 Using RNA-seq

Nzungize Lambert, Jonas A. Kengne-Ouafo, Brenda Kiage Nyarang'o, Umuhoza Diane, Brenda Muthoni, Kivumbi Mark Tefero, and Margaret Wanjiku

The clinical diagnosis and distinction between malaria and CoVID-19 in febrile patients of both diseases at a health care facility is a challenge due to their overlapping symptoms. This causes a potential risk of misdiagnosis and in turn inappropriate treatment, therapy provision, or untimely preventable death. Non-coding RNAs (ncRNA) have been shown to play a crucial role in gene regulation. RNA-seq gives the potential for exploring the distinctiveness of each etiological presentation at the transcriptome level. However, there are few studies on the role of lncRNA in the immune response of the host with either malaria or CoVID-19. Thus, we hypothesized that the identification of lncRNA patterns across malaria and CoVID-19 patients involved in immune responses of the host may provide insights into a novel biomarker.

Transcriptomic profiling of host response to CoVID-19 and malaria infection could help in the design of a combined molecular diagnostic tool.

Human Gut Microbiome Investigation within Colorectal Cancer Patients Using Shotgun Sequencing Approach

Soumaya Jbara, Kasambula Arthur Shem, Bright Opoku Asante, Walid Baba, Meryem Jafari, and Sara Fadel

Colorectal cancer (CRC) incidence has been increasing worldwide during the last decade. It is the third most diagnosed cancer and the second leading cause of cancer-related deaths. In China, rates have increased for colorectal cancer largely due to lifestyle and environmental factors, which severely affect the CRC-associated gut microbiota. Bacteria are often difficult to treat in the laboratory, which can confuse the profiling of the taxa and the understanding of how communities function. Therefore, shotgun metagenomics anticipates a means to study unculturable microorganisms that are difficult or impossible to investigate. Despite 16S rRNA gene sequencing being the most used method to describe intestinal composition, it allows only taxonomic identification of bacterial communities. To understand gut microbiome function and the specific role of the different bacterial populations, shotgun sequencing techniques can make taxa investigation possible and permit the exploration of the metabolic potential of the intestinal microbiota.

This page last reviewed on April 6, 2023