Thu, 05 Apr 2018
Long Read Metagenomics

Time: 10.30 AM - 11.30 AM

Venue: Cysteine, Level 7 (30 Biopolis Street, Matrix)

Speaker: Prof. Daniel Huson, Algorithms in Bioinformatics, University of Tuebingen and Visiting Professor, Life Sciences Institute, NUS


There is increasing interest in using long read sequencing techniques, as provided by Oxford Nanopore and PacBio, in the context of environmental sequencing. With read lengths ranging up to hundreds of kilobases, the assembly of complete genomes from metagenomic sequencing data will be a much easier problem.

In long read datasets from mixed communities, read coverage of rare taxa will often be low and many reads will be singletons that do not overlap with any other reads. Singleton reads cannot be corrected in the usual way by comparison to overlapping reads or shared k-mers. Thus, coding sequence prediction will performs poorly on uncorrected long reads when there is a high rate of indel sequencing errors.

One way to overcome these problems is to analyze long reads by performing frame-shift aware DNA-to-protein alignment against a database of protein reference sequences (for example, NCBI-nr). Such alignments can be computed by LAST (Kiełbasa et al, 2011), and now, most recently, also by using the latest release of DIAMOND (Buchfink et al, 2015).

We will present a computationally efficient and easy to use pipeline for the analysis of long reads from mixed communities, that is based on the use of DIAMOND and MEGAN-LR (MEGAN Long Read, Huson et al, under review). We will illustrate its use on a number of Nanopore sequencing datasets obtained from bio-rectors at SCELSE and on a published PacBio mock community data (Singer et al, 2016).

This is joint work Rohan Williams at SCELSE-NUS.

About The Speaker
Daniel Huson holds the Chair of Algorithms in Bioinformatics at the University of Tuebingen and is currently a Visiting Professor at the Life Sciences Institute of the National University of Singapore. He studied mathematics at Bielefeld University, obtaining a PhD in 1990 and Habilitation in 1997. As a postdoc, he worked with Tandy Warnow at UPenn and Princeton University 1997-1999. He joined Celera Genomics Corp. in 1999 to work as a senior staff scientist in Gene Myers' group and is one of the authors of Celera's human genome paper. He moved to the University of Tuebingen in 2002.

His work is focused on the design and implementation of algorithms to address questions in phylogenetics, genomics and microbiome analysis. His is the author of a number of popular tools, including SplitsTree, Dendroscope, MALT and MEGAN.

Dr. Sebastian Maurer-Stroh, Senior Principal Investigator / Programme Director Human Infectious Diseases.

