Analytics of Biological Sequence Data

Wong Wing Cheong
Principal Investigator

Tantoso Erwin
Senior Bioinformatics Specialist

Tay Wei Hong
Research Manager (Software)

Lee Chee Lam Joanne
Research Officer

Richard Hamming’s philosophy on scientific computation that 'The purpose of computing is insight, not numbers' is very apt in the fields of bioinformatics. Though bioinformatics is not a replacement for lifescience experiments, it nevertheless has an important role in targeting and cordoning off a manageable solution space for the biologists to focus on.

The Analytics of Biological Sequence Data (ABSD) group started in August 2014 and focuses on the development of computational and statistical methodology that aims to improve the biological interpretation of sequencebased ( collaborations with clinicians and biologists from the immunology to the cancer domain. Since October 2017, our team is officially deployed in Singhealth DukeNUS Institute of Precision Medicine (PRISM) under joint appointments to foster strong collaborations between Singhealth and A*STAR under the MOU established in December 2017. [ Link ]

Analytics of Multi-OMICs Data Towards Unraveling Molecular Mechanisms

Over the past decade, the OMICs frenzy from arrays to sequencing has swarmed genomic research with voluminous amount of data and elucidated lists of candidate genes/proteins. Yet, many of these genes/proteins remained not wellunderstood in relation to the observed phenotype.

Function annotation: The major gap to our full understanding stems from our lack in complete gene/protein functions which, in turn, impedes researchers from assembling the sets of biomolecular mechanisms that can sufficiently explain the observed phenotype in these OMICs experiments; the knowledge of gene/protein functions is a premise necessary for delivering the big promises in personalized medicine. Despite so, experimental characterization of gene/protein function still receives insufficient attention nowadays. This can be attested by the dwindling number of characterized genes/proteins reported over the past decade. Sadly, one can be expected this number to continue to grow at a slow rate. On this basis, the only viable approach is to computationally transfer function annotation of the wellstudied gene/protein sequences to the less studied or novel ones for functional hints. As a consequence, our group has developed novel concepts and methods in sequence annotations like TMSOC, dissectHMMER and xHMMER3x2 over the years.

Platform bias correction: Meanwhile, it is often that bias in gene profiling via current OMICs technology is often ignored during the actual analysis. In particular, earlier works on modelling transcript abundance from vertebrates to lower eukaroytes have specifically singled out the Zip's law. But the observed distributions often deviate from a single powerlaw slope. If transcript abundance is truly powerlaw distributed, the varying exponent signifies changing mathematical moments (e.g., mean, variance) and creates heteroskedasticity which compromises statistical rigor in analysis.

Clinical Translational Informatics

Since 2015, the team has directed its main effort on the software/pipeline development of an OMICs management system call the Translational Informatics Management System, herein TIMS. TIMS is developed on open source software like Java, Postgres and R and is being hosted on BII’s own ISO27001 certified network. The main purpose of TIMS to help our clinician partners to preprocess/store/secure clinical data. In the long run, TIMS is positioned for datasharing and crosscohort analysis within the largescale research consortium. The pilot launch of TIMS has successfully supported a Bayer/NCC/CTRAD collaboration with a feature story from the Bayer Innovation Center Singapore. Then under a joint partnership in 2016, TIMS was deployed in Singhealth DukeNUS Institute of Precision Medicine (PRISM) to focus on bioinformatics activities in the area of data management and data integrity improvement of clinical entities. This joint partnership resulted in one of the themes, i.e., Big data and Precision medicine (see illustration) covered under SinghealthA*STAR MOU agreement in December 2017. [ Link ]

Figure 1

Analytics of Biological Sequence Data Members

Dr. WONG Wing-Cheong
Principal Investigator
  Biography Details
Dr. WONG Wing CheongPrincipal Investigator
Mr. TANTOSO ErwinBioinformatics Specialist
Ms. LEE Chee Lam, JoanneResearch Officer
Mr. TAY Wei HongSoftware Architect

This section is still work in progress.