Protein Bioinformatics - M. Michael Gromiha

Protein Bioinformatics (eBook)

From Sequence to Function

M. Michael Gromiha (Autor)

eBook Download: PDF | EPUB

2011 | 1. Auflage
339 Seiten
Elsevier Science (Verlag)
978-0-12-388424-4 (ISBN)

One of the most pressing tasks in biotechnology today is to unlock the function of each of the thousands of new genes identified every day. Scientists do this by analyzing and interpreting proteins, which are considered the task force of a gene. This single source reference covers all aspects of proteins, explaining fundamentals, synthesizing the latest literature, and demonstrating the most important bioinformatics tools available today for protein analysis, interpretation and prediction. Students and researchers of biotechnology, bioinformatics, proteomics, protein engineering, biophysics, computational biology, molecular modeling, and drug design will find this a ready reference for staying current and productive in this fast evolving interdisciplinary field. - Explains all aspects of proteins including sequence and structure analysis, prediction of protein structures, protein folding, protein stability, and protein interactions - Presents a cohesive and accessible overview of the field, using illustrations to explain key concepts and detailed exercises for students.

Michael is a frequent invited speaker to local conferences and universities in India and tointernational conferences focused on bioinformatics, computational biology andmolecular biology. He maintains close connections with research and teachingcolleagues in India and contributes to international publications including handbooks,encyclopedias and journals. He began his research on Computational MolecularBiophysics in 1989, earning the PhD in BioPhysics from Bharathidasan University,India. He gained his first Post Doctoral experience on DNA bending and protein-DNAinteractions at International Center for Genetic Engineering and Biotechnology (ICGEB),Italy. He developed databases for proteins and computer simulation of protein-DNAinteractions during his subsequent postdoc at The Institute of Physical and ChemicalResearch (RIKEN), Japan. At AIST he continues to focus on various aspects of proteinbioinformatics.

Front cover
1
Protein Bioinformatics: From Sequence to Function 4
Copyright page
5
Contents 8
Foreword 16
Preface 18
Acknowledgments 20
Chapter 1: Proteins 22
1.1 Building blocks 22
1.2 Hierarchical representation of proteins 25
1.3 Structural classification of proteins 28
1.4 Databases for protein sequences 30
1.5 Protein structure databases 37
1.6 Literature databases 44
1.7 Exercises 45
References 47
Chapter 2: Protein Sequence Analysis 50
2.1 Sequence alignment 50
2.2 Programs for aligning sequences 51
2.3 Amino acid properties 60
2.4 Amphipathic character of a-helices and ß-strands 69
2.5 Amino acid properties for sequence analysis 73
2.6 Exercises 77
References 79
Chapter 3: Protein Structure Analysis 84
3.1 Assignment of secondary structures 84
3.2 Computation of solvent accessibility 85
3.3 Representation of solvent accessibility 89
3.4 Residue–residue contacts 92
3.5 Amino acid clusters in protein structures 94
3.6 Contact potentials 95
3.7 Cation-p interactions in protein structures 99
3.8 Noncanonical interactions 101
3.9 Free energy calculations 101
3.10 Amino acid properties derived from protein structural data 106
3.11 Parameters for proteins 110
3.12 Protein structure comparison 115
3.13 Exercises 118
References 119
Chapter 4: Protein Folding Kinetics 128
4.1 .-value analysis 128
4.2 Folding nuclei and .-values 136
4.3 Relationship between amino acid properties and .-values
139
4.4 .-value analysis with hydrophobic clusters and long-range contact networks 141
4.5 Kinetic database for proteins 144
4.6 Prediction of protein folding rates 145
4.7 Relationship between .-values and folding rates
156
4.8 Exercises 158
References 159
Chapter 5: Protein Structure Prediction 164
5.1 Protein structural class 164
5.2 Secondary structure content 167
5.3 Secondary structural regions 168
5.4 Discrimination of transmembrane helical proteins and predicting their membrane-spanning segments 180
5.5 Discrimination of transmembrane strand proteins 187
5.6 Identification of membrane-spanning ß-strand segments
191
5.7 Discrimination of disordered proteins and domains 197
5.8 Solvent accessibility 200
5.9 Inter-residue contact prediction 206
5.10 Protein tertiary structure prediction 208
5.11 Exercises 216
References 218
Chapter 6: Protein Stability 230
6.1 Determination of protein stability 230
6.2 Thermodynamic database for proteins and mutants 232
6.3 Relative contribution of noncovalent interactions to protein stability 239
6.4 Stability of thermophilic proteins 241
6.5 Analysis and prediction of protein mutant stability 251
6.6 Exercises 261
References 262
Chapter 7: Protein Interactions 268
7.1 Protein–protein interactions 268
7.2 Protein–DNA interactions 287
7.3 Protein–RNA interactions 299
7.4 Protein–ligand interactions 303
7.5 Quantitative structure activity relationship in protein–ligand interactions 310
7.6 Exercises 313
References 314
Appendix A 324
List of protein databases 324
List of protein Web servers 326
Index 334

Chapter 2

Protein Sequence Analysis

Publisher Summary

This chapter discusses the protein sequence analysis. The analysis of protein sequences provides the information about the preference of amino acid residues and their distribution along the sequences for understanding the secondary and tertiary structures of proteins and their functions. The identification of similar motifs in protein sequences would help to predict the structurally or functionally important regions. The profiles obtained with the single amino acid properties based on amino acid sequence would reveal the clustering of amino acids with similar property. Amino acid sequences have a lot of hidden information, which can be used for developing sequence-based prediction methods. The comparison of different amino acid sequences using alignment methods would enhances the knowledge about the availability of similar sequences, and these sequences could be used as a template for protein three-dimensional structure prediction. Aligning the sequences or structures mainly carries out the comparison of two proteins. In this method, a one-to-one correspondence is set up between the residues of the two proteins. The simplest observation is the global alignment of two sequences, in which the two proteins have maintained a correspondence over the entire length. An alternative is the local alignment in which the alignment is made only with the most similar part of the proteins.

The analysis of protein sequences provides the information about the preference of amino acid residues and their distribution along the sequences for understanding the secondary and tertiary structures of proteins and their functions. The identification of similar motifs in protein sequences would help to predict the structurally or functionally important regions. The profiles obtained with the single amino acid properties based on amino acid sequence would reveal the clustering of amino acids with similar property. Furthermore, the comparison of different amino acid sequences using alignment methods would enhance our knowledge about the availability of similar sequences, and these sequences could be used as a template for protein three-dimensional structure prediction.

2.1 Sequence alignment

The comparison of two proteins is mainly carried out by aligning the sequences or structures. In this method, a one-to-one correspondence is set up between the residues of the two proteins. The simplest observation is the global alignment of two sequences, in which the two proteins have maintained a correspondence over the entire length. An alternative is the local alignment in which the alignment is made only with the most similar part of the proteins.

An alignment of two sequences A and B must obey the following conditions: (i) All residues should be used in the alignment and all should be in the same order, (ii) align one residue from A with another from B, (iii) a residue can be aligned with a blank (-), and (iv) two blanks cannot be aligned. The different ways of aligning two sequences, VEITGEIST and PRETERIT, are shown in Figure 2.1. From these alignments, one could estimate the score for each aligned positions and hence the total score. The scoring scheme will be as follows: (i) Score = 1, if both the residues in the same positions of the sequences A and B are the same (e.g., in Alignment 1 [Figure 2.1], both the sequences A and B at position 3 are E, and hence it will have the score of 1), (ii) if the residues are different, score = 0 (e.g., position 1, the residues are V and P, respectively in sequences A and B), and (iii) score = − 1 if there is a gap in the alignment (e.g., positions 2 and 4 in Alignment 1). The added score for all the residues gives the net score for the aligned sequences. In alignments, the positioning of residues with similar properties (e.g., Val and Ile are hydrophobic, Glu and Asp are negatively charged, etc.) is used to find similar sequences (Eidhammer et al. 2004).

Figure 2.1 Sequence alignment and scoring schemes for two typical sequences: score = 1 for same residue (shown in boxes); score = 0 for different residues and score = - 1 for gap.

2.2 Programs for aligning sequences

Several computer programs have been developed for estimating the similarity score of two sequences and for finding similar sequences from available databases using pairwise and multiple alignments.

2.2.1 Basic Local Alignment Search Tool (BLAST)

Altschul et al. (1990) developed an approach for a rapid sequence comparison, basic local alignment search tool (BLAST), which directly approximates alignments that optimize a measure of local similarity and the maximal segment pair score. This algorithm has been applied in a variety of contexts, including straightforward DNA and protein sequence database searches, motif searches, gene identification searches, and in the analysis of multiple regions of similarity in long DNA sequences. In this method, the query protein sequence can be searched with several databases, including the nonredundant structures available in PDB, protein sequences at SWISS-PROT, etc. Furthermore, BLAST has several features such as (i) identifying protein sequences similar to the query, (ii) finding members of a protein family or building a custom position-specific scoring matrix, (iii) finding proteins similar to the query around a given pattern, (iv) finding conserved domains in the query, and (v) searching for peptide motifs. BLAST is available at http://www.ncbi.nlm.nih.gov/BLAST/. An example to identify protein sequences similar to the query is shown in Figure 2.2. BLAST has several options for querying a sequence:

Figure 2.2 Retrieval of similar sequences using BLAST: (a) the input page showing the query sequence and other options, (b) the sequences that are showing high sequence identity with the query sequence, and (c) the sequence alignment of the two homologous sequences.

(i) Accepts the sequence with accession number, gi, and FASTA format. The input data can be given by copying and pasting the details directly on the Web or by uploading a file from a local computer. Accession number is the number allotted in UniProt for each sequence (e.g., P61626); gi is a bar-separated NCBI sequence identifier (e.g., gi|48428995). A sequence in FASTA format begins with a single-line description, followed by lines of sequence data. The description line is distinguished from the sequence data by a greater than (“>”) symbol at the beginning. An example sequence in FASTA format is given below:
> gi|48428995|sp|P61626.1|LYSC_HUMAN RecName: Full=Lysozyme C MKALIVLGLVLLSVTVQGKVFERCELARTLKRLGMDGYRGISLANWMCLAKWESGYNTRATNYNAGDRSTDYGIFQINSRYWCNDGKTPGAVNACHLSCSALLQDNIADAVACAKRVVRDPQGIRAWVAWRNRCQNRDVRQYVQGCGV
The complete amino acid sequence in FASTA format has been provided in Figure 2.2a. It is also possible to specify a fragment of the sequence by providing a sub-range of the query sequence.

(ii) Allows selecting from a database to search against the input sequence. The nonredundant protein sequences (nr) have been selected as the database in Figure 2.2a.

(iii) The algorithm of the program can be selected and, for finding similar sequences, BLASTP is used.

(iv) It is possible to adjust several parameters: (a) displaying the maximum number of aligned sequences, expect threshold, and word size. Expect threshold (e-value) is the expected number of chance matches in a random model, and it is set at 10 as the default value. Word size is the length of the seed that initiates an alignment. In addition, scoring parameters can be selected for matrix, gap cost, and compositional adjustments. The substitution matrix is a key element in evaluating the quality of a pairwise sequence alignment, which assigns a score for aligning any possible pair of residues. Generally BLOSUM62 is used as the substitution matrix, which is a 20 × 20 matrix obtained for all possible substitutions of 20 amino acid residues (Table 2.1). It is based on a likelihood method by estimating the occurrence of each possible pairwise substitution using the biochemical character of amino acid residues (aliphatic, aromatic, positive charged, negative charged, polar, sulfur containing, etc., see Figure 1.2), and the development of BLOSUM62 has been described in Eddy (2004). The gap cost is a cost to create and extend a gap in an alignment. Furthermore, options are available to filter the low-complexity regions and mask query and lowercase letters in the sequence.

Table 2.1

Blosum62...

Erscheint lt. Verlag	21.4.2011
Sprache	englisch
Themenwelt	Mathematik / Informatik ► Informatik ► Theorie / Studium
	Medizin / Pharmazie ► Allgemeines / Lexika
	Naturwissenschaften ► Biologie ► Biochemie
	Naturwissenschaften ► Biologie ► Genetik / Molekularbiologie
	Technik ► Medizintechnik
	Technik ► Umwelttechnik / Biotechnologie
ISBN-10	0-12-388424-1 / 0123884241
ISBN-13	978-0-12-388424-4 / 9780123884244

Haben Sie eine Frage zum Produkt?

PDF (Adobe DRM)
Größe: 36,0 MB

Kopierschutz: Adobe-DRM
Adobe-DRM ist ein Kopierschutz, der das eBook vor Mißbrauch schützen soll. Dabei wird das eBook bereits beim Download auf Ihre persönliche Adobe-ID autorisiert. Lesen können Sie das eBook dann nur auf den Geräten, welche ebenfalls auf Ihre Adobe-ID registriert sind.
Details zum Adobe-DRM

Dateiformat: PDF (Portable Document Format)
Mit einem festen Seitenlayout eignet sich die PDF besonders für Fachbücher mit Spalten, Tabellen und Abbildungen. Eine PDF kann auf fast allen Geräten angezeigt werden, ist aber für kleine Displays (Smartphone, eReader) nur eingeschränkt geeignet.

Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen eine Adobe-ID und die Software Adobe Digital Editions (kostenlos). Von der Benutzung der OverDrive Media Console raten wir Ihnen ab. Erfahrungsgemäß treten hier gehäuft Probleme mit dem Adobe DRM auf.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen eine Adobe-ID sowie eine kostenlose App.
Geräteliste und zusätzliche Hinweise

Zusätzliches Feature: Online Lesen
Dieses eBook können Sie zusätzlich zum Download auch online im Webbrowser lesen.

Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.

EPUB (Adobe DRM)
Größe: 12,6 MB

Dateiformat: EPUB (Electronic Publication)
EPUB ist ein offener Standard für eBooks und eignet sich besonders zur Darstellung von Belletristik und Sachbüchern. Der Fließtext wird dynamisch an die Display- und Schriftgröße angepasst. Auch für mobile Lesegeräte ist EPUB daher gut geeignet.

Zusätzliches Feature: Online Lesen
Dieses eBook können Sie zusätzlich zum Download auch online im Webbrowser lesen.

Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.

Print-Ausgabe

Buch | Softcover

109,95 €