Research Details
 
Protein Sequence Analysis Group
 
Although the complete sequencing of the human and other genomes was almost a decade ago, we find ourselves in the exciting situation that still only a subset of the identified genes is functionally well characterized. Our group embarks on the journey to map the functions of the many remaining uncharacterized genes, and the proteins they are translated to. Furthermore, it is not sufficient to look at the individual components of complex biological systems but rather consider the context of their interaction with each other to better understand how our cells, organs and bodies work.

Based primarily on protein sequence analysis and the analysis of other sequence-associated data, the various aspects of molecular and cellular function (enzymatic activities, posttranslational modifications, cleavage, translocation signals, 3D structures, pathway relationships, etc.) are predicted. Our work is strongly facilitated by the ANNOTATOR group which provides a software service for efficient sequence-analytic workflows. Finally, the hypotheses about protein functions are either followed up by experimental collaborators or they are validated in the Division's own protein biochemical laboratory.
 
Our team focuses on the following areas of research
 
  1. Protein sequence analysis and studies of genome-associated data
 
  Embedded in the life science cluster Biopolis, we are open to requests from local and international collaborators. The interaction with a Bioinformatics group like ours can have dramatic impact on the research and performance of experimental labs. Often our analyses and predictions complement the experimental work, trigger new experiments or speed up findings of biological and medical importance which can be critical in certain competitive fields. Typical projects can be categorized as:
 
  1.1  Function prediction for selected targets (single uncharacterized genes/proteins)
 
  Example scenario: A collaborating lab works on a protein that produces a certain phenotype but the molecular function and mechanism leading to the phenotype are not known. They give the sequence and associated information to us.

First, we run a set of established bioinformatics tools that detect low complexity, coiled coil and transmembrane regions, hits to known domains, short motifs for posttranslational modifications, translocation signals, etc. This standard task can be executed and results visualized over a unified interface provided by the in-house developed ANNOTATOR software. Next, we search for evolutionary relationships and distant homologues through intelligent ANNOTATOR-based workflows, resulting in a multiple sequence alignment and phylogenetic tree (see Figure 1). We also try to predict the 3D structure using consensus methods and estimate importance of individual residues by their conservation (see colored surface residues on predicted structure in Figure 1).
 
 
Figure 1
 
  Finally, the most creative task for our team is to interpret all these results, scavange the literature and put the puzzle pieces together to formulate a hypothesis about the function of the protein (or at least parts of it) and discuss it in the greater context of pathways, biological mechanisms and phenotypes. In case no trivial prediction can be made, we have to eventually create new tools and workflows or modify existing ones to suit our needs.
 
  Classical example of such a successful project:
Rea S, Eisenhaber F, O'Carroll D, Strahl BD, Sun ZW, Schmid M, Opravil S, Mechtler K, Ponting CP, Allis CD, Jenuwein T.
Regulation of chromatin structure by site-specific histone H3 methyltransferases.
Nature. 2000 Aug 10;406(6796):593-9.
PMID: 10949293
 
  1.2   Function prediction for target sets (multiple genes/proteins)
 
  Example scenario: A collaborating lab launches a large-scale screen (expression profiling, proteomic mass spectrometry, mutation screens, etc.) and needs support in the analysis. They give their sets of genes/proteins and associated data to us.

We can help in the evaluation of statistically significant over- and underrepresentation in different samples/experiments, identify set-specific biomarkers, map the target lists to biological pathways, gene ontology annotations, and analyze domain and other feature compositions.
 
  Example of such a successful project:
Hackl H, Burkard TR, Sturn A, Rubio R, Schleiffer A, Tian S, Quackenbush J, Eisenhaber F, Trajanoski Z.
Molecular processes during fat cell development revealed by gene expression profiling and functional annotation.
Genome Biol. 2005;6(13):R108. Epub 2005 Dec 19.
PMID: 16420668
 
  1.3  Finding candidate genes/proteins for known biochemical activities or cell biological phenotypes
 
  Example scenario: A collaborating lab is working on a biochemical activity or observes a special phenotype without knowing the genes/proteins responsible at the molecular level. They give all associated information to us.

We start with the whole proteome of the organism in question and remove unlikely candidates step-by-step through rational filters that can include criteria like similarity to a certain functional protein family, specific subcellular localization, expression in certain tissues or cell types, etc. In the ideal case, a list of candidates can be provided that is reasonably small in order to be analyzed in detail by the requesting experimental lab.
 
  Example of such a successful project:
Zimmermann R, Strauss JG, Haemmerle G, Schoiswohl G, Birner-Gruenberger R, Riederer M, Lass A, Neuberger G, Eisenhaber F, Hermetter A, Zechner R.
Fat mobilization in adipose tissue is promoted by adipose triglyceride lipase
Science. 2004 Nov 19;306(5700):1383-6.
PMID: 15550674
 
  2.  Prediction of functional motifs in non-globular regions
 
  Given the huge number of sequences of otherwise uncharacterized protein sequences, computer-aided prediction of posttranslational modifications (PTMs) and translocation signals from amino acid sequence becomes a necessity. We have contributed to this multi-faceted, worldwide effort with the development of predictors for GPI lipid anchor sites, for N-terminal N-myristoylation sites, for farnesyl and geranylgeranyl anchor attachment as well as for the PTS1 peroxisomal signal. Although the substrate protein sequence signals for various PTMs or translocation systems vary dramatically, we found that their principal architecture is similar for all the cases studied (see Figure 2).
 
 
Figure 2
 
  Typically, a small stretch of the amino acid residues is buried in the catalytic cleft of the protein-modifying enzyme (or the binding site of the transporter). This piece most intensely interacts with the enzyme and its sequence variability is most restricted. This stretch is surrounded by linker segments that connect the part bound by the enzyme with the rest of the substrate protein. These residues are, as a trend, small with a flexible backbone and polar. Due to the mechanistic requirements of binding to the enzyme, we suggest that most PTM sites are necessarily embedded into intrinsically disordered regions (except for cases of autocatalytic PTMs, PTMs executed in the unfolded state or non-enzymatic PTMs) and this issue requires consideration in structural studies of proteins with complex architecture. Surprisingly, some proteins carry sequence signals for posttranslational modification or translocation that remain hidden in the normal biological context but can become fully functional in certain conditions.

We continue to develop predictors for short sequence motifs based on sequence, physical property and, if available, structural information.
 
  Example of such a successful project:
Maurer-Stroh S, Eisenhaber F.
Refinement and prediction of protein prenylation motifs.
Genome Biol. 2005;6(6):R55. Epub 2005 May 27.
PMID: 15960807
 
Feedback Login Site Map