Abstract
SDAP (Structural Database of Allergenic Proteins) is a web server that provides rapid, cross-referenced access to the sequences, structures and IgE epitopes of allergenic proteins. The SDAP core is a series of CGI scripts that process the user queries, interrogate the database, perform various computations related to protein allergenic determinants and prepare the output HTML pages. The database component of SDAP contains information about the allergen name, source, sequence, structure, IgE epitopes and literature references and easy links to the major protein (PDB, SWISS-PROT/TrEMBL, PIR-ALN, NCBI Taxonomy Browser) and literature (PubMed, MEDLINE) on-line servers. The computational component in SDAP uses an original algorithm based on conserved properties of amino acid side chains to identify regions of known allergens similar to user-supplied peptides or selected from the SDAP database of IgE epitopes. This and other bioinformatics tools can be used to rapidly determine potential cross-reactivities between allergens and to screen novel proteins for the presence of IgE epitopes they may share with known allergens. SDAP is available via the World Wide Web at http://fermi.utmb.edu/SDAP/.
INTRODUCTION
Allergic diseases, including allergic rhinitis, asthma and atopic dermatitis, are among the most common chronic health problems (1). As recombinant proteins, products of the genomic revolution are introduced into foods, medications and other products of our daily life, distinguishing allergens from other proteins becomes a pressing issue (2). This need was recently illustrated by the concern over possible allergenic effects of Starlink corn (3). The sequences and structures of many allergenic proteins have been determined. Most of these proteins can be grouped into relatively few families (4,5), suggesting that they share characteristics that contribute to their ability to bind IgE and trigger an allergic response (6,7). The web server SDAP (Structural Database of Allergenic Proteins) was created to aid in identifying these sequence commonalities. Computational and statistical tools to analyse sequences and structures, assessable within SDAP, have been designed to develop correlations that can be used to predict allergenicity of novel proteins and cross-reactivity between allergens.
The computational approach is briefly depicted in Figure 1. The sequences, 3D structures and IgE epitopes of known allergens, collected from databases and literature, are included in lists in the server. For allergens where no experimental structure has been determined, models will be computed with our MASIA/EXDIS/DIAMOD/FANTOM suite of programs (8–17).
Figure 1.
SDAP combines information from many sources and computational tools to rapidly determine potential cross-reactions among allergens and the allergenicity of novel proteins.
Using this database and the computational tools from SDAP we will develop predictive models for IgE epitopes. This will allow a user to compare not just sequence or property similarity of a possible epitope homologue, but also factors such as local structure and relative surface exposure. SDAP is available at http://fermi.utmb.edu/sdap/.
DATABASE STRUCTURE
SDAP is designed as a web server (Fig. 2) controlled by a set of CGI scripts. These scripts mediate interaction with the user, the database and the computational tools. The database component of SDAP is implemented with MySQL under Linux. The information is collected in tables according to: allergen type; species; systematic name; brief description; sequence accession numbers from SWISS-PROT/TrEMBL, PIR-ALN, NCBI Taxonomy Browser and, where available, PDB. Sequences and IgE epitopes are collected into text files. The current lists of allergens were assembled from literature and from major sequence [SWISS-PROT/TrEMBL (18), PIR-ALN (19) and NCBI Taxonomy Browser (20)] and structure [PDB (21)] databases, guided by the list of allergen names from the IUIS website, http://www.allergen.org. As there is no public database summarizing information on known epitopes of allergenic proteins, the IgE epitope list in SDAP is based solely on primary literature sources. In its present version, SDAP allows searches restricted to the following fields: allergen name (according to the IUIS website listing), scientific and common name for the species and general source of the allergens. SDAP is the first allergen database that allows a user to retrieve IgE epitopes and identify similar regions in other allergenic proteins.
Figure 2.
Structural blocks and main functions of SDAP.
COMPUTATIONAL TOOLS
Special programs have been incorporated in SDAP to compare the sequences and structures in the database. In the present release, users can compare a given peptide sequence to all the sequences in SDAP, using either an exact match search or a similarity search based on physicochemical properties (22). The SDAP peptide exact match function is useful to identify allergens closely related to a peptide, for example when an IgE epitope may be responsible for clinically-defined cross-sensitivities to several allergens.
One use of SDAP is to quickly identify cross-reactivities between known allergens. For example, an exact match search using the Pen i 1 IgE epitope MQQLENDLDQVQESLLK from shrimp topomyosin identified the same sequence in the allergens Met e 1 (from another shrimp species) and Pan s 1 (lobster), consistent with the clinically observed cross-reactivity among these crustaceans (see Table 1). However, sequence identity, even among know cross-reactive allergens, is rare. To identify more distantly related sequences, the user can access a tool that uses the amino acids descriptors E1–E5 (22) to locate sequences with similar chemical properties. The vectors were derived by multidimensional scaling of 237 physical–chemical properties for all 20 naturally occurring amino acids. The mathematical procedure used ensures that the main variations of all 237 properties for the 20 amino acids are reflected by the five descriptors E1–E5. Using the E1–E5 descriptors, the similarity between two sequences A and B, each one consisting of N residues, is computed with the property distance function PD (23):
Table 1. SDAP search results for allergens that contain regions similar to the Pen i 1 IgE epitope MQQLENDLDQVQESLLK.
NCBI Taxonomy Browser Accession | Allergen | Species | Seq. Len. | Position | Sequence | PD |
---|---|---|---|---|---|---|
TPM1_METEN | Met e 1 | Metapenaeus ensis, shrimp | 274 | 40–56 | MQQLENDLDQVQESLLK | 0 |
TPM1_PANST | Pan s 1 | Panulirus stimpsoni, lobster | 274 | 40–56 | MQQLENDLDQVQESLLK | 0 |
TPM_LEPDS | Lep d 10 | Lepidoglyphus destructor, storage mite | 284 | 50–66 | IQQIENELDQVQESLTQ | 3.345 |
TPM1_HOMAM | Hom a 1 | Homarus americanus, American lobster | 284 | 50–66 | MQQVENELDQVQEQLSL | 4.731 |
TPM_PERAM | Per a 7 | Periplaneta Americana, American cockroach | 284 | 50–66 | IQQIENDLDQTMEQLMQ | 5.062 |
BAA04557 | Der f 10 | Dermatophagoides farinae, American house dust mite | 299 | 65–81 | IQQIENELDQVQEQLSA | 5.074 |
TPM_DERPT | Der p 10 | Dermatophagoides pteronyssinus, European house dust mite | 284 | 50–66 | IQQIENELDQVQEQLSA | 5.074 |
TPM1_CHAFE | Cha f 1 | Charybdis feriatus, crab | 264 | 50–66 | MQQVENELDQAQEQLSA | 5.389 |
AF216518_1 | Hal d 1 | Haliotis diversicolor, abalone | 284 | 50–66 | CANLENDFDNVNEQLQE | 6.490 |
TPMM_ANISI | Ani s 3 | Anisakis simplex | 284 | 50–66 | MMQTENDLDKAQEDLST | 6.661 |
JE0229 | Tur c 1 | Batillus cornutus | 146 | 1–17 | AANLENDFDNVNEQLQD | 6.710 |
where λj is the eigenvalue of the jth E component, Ej(Ai) is the Ej value for the amino acid in the ith position from sequence A, and Ej(B)i is the Ej value for the amino acid in the ith position from sequence B.
The SDAP tool calculates the PD similarity index between the query sequence and each sequence-window with the same length from all allergens collected in the SDAP protein database. The search result is a list of similar sequences identified in allergenic proteins, presented in decreasing order of similarity (increasing PD) with the query sequence. Besides epitope identification, this tool can be used to find conserved regions in the allergens from the SDAP protein database.
The results of a sequence similarity search for the above Pen i 1 IgE epitope are shown in Table 1. Besides the two exact matches with Met e 1 and Pan s 1, similar regions were found in allergenic tropomyosins from insects (storage mite, American cockroach, American house dust mite and European house dust mite), lobster, crab, abalone and snail. Clinical tests with sera from sensitive patients indicate that the cross-reactivity between crustacean, mollusk and insect allergens is mediated by tropomyosin (24,25). Sensitization to house dust mites had been linked to oral allergic response to snails (7) and allergen immunotherapy with European house dust mite (Dermatophagoides pteronyssinus) may trigger severe reactions to mollusks and crustacea (26). These results demonstrate the utility of SDAP in identifying potential cross-reactivity among allergens.
The significance of any sequence match can be determined from a histogram of the distribution of PD values for all the sequences in SDAP to the shrimp epitope (Fig. 3). The sequences in Table 1 clearly have a lower PD value than the bulk of entries in SDAP. The histogram suggests that a significance cut-off value between 7.5 and 9 would be most appropriate for determining peptides with similar properties.
Figure 3.
Histogram of PD values for an IgE epitope from shrimp for all SDAP entries. Only the most similar sequence region (lowest PD value) was recorded for each SDAP entry.
DATA SUBMISSION
Researchers in the allergy field are welcome to submit their published data by email to oiivanci@utmb.edu. Comments, corrections and suggestions for new computational tools for allergenic determinants should be sent to the same address.
FUTURE DEVELOPMENTS
The SDAP server will be maintained on a regular basis. The database sequence, structure and IgE epitopes lists will be updated with information as it becomes available in other databases and the literature. The next major addition to SDAP will be data and software to compare the structures of epitopes. The 3D structure is known for only about 10% of the sequences in SDAP. For the sequences without structures, homology models will be prepared with our self-correcting distance geometry based EXDIS/DIAMOD/FANTOM suite. To determine whether suitable templates were available, the sequences of 180 allergens in SDAP with unknown 3D structure were submitted to the fold recognition server 3D-PSSM (27) (http://www.sbg.bio.ac.uk/~3dpssm) and a histogram of the distribution of the log(E) values for the best template found in the PDB (Fig. 4). For 150 sequences, a good template [log(E)<−1] was identified. For the remaining allergens, eight have a log(E) between −1 and 0, meaning that the modeling may require combined information from other fold recognition servers. For these, we plan to use our MASIA program (8) (http://www.scsb.utmb.edu/masia/masia.html) to identify conserved motifs, which can be used to identify possible templates and improve alignments with distantly related proteins. Only 22 sequences, with log(E)>0, will require an alternative approach. Here, we will use secondary structure prediction methods to model the proteins abinitio.
Figure 4.
Distribution of log(E) values for PDB templates identified by 3D-PSSM for 180 allergen sequences. All structural alignments with log(E)<−1 can be modeled with good precision.
Acknowledgments
ACKNOWLEDGEMENT
This work was supported by a Research Development Grant (#2535-01) from the John Sealy Memorial Endowment Fund for Biomedical Research.
REFERENCES
- 1.Malone D.C., Lawson,K.A., Smith,D.H., Arrighi,H.M. and Battista,C. (1997) A cost of illness study of allergic rhinitis in the United States. J. Allergy Clin. Immunol., 99, 22–27. [DOI] [PubMed] [Google Scholar]
- 2.Gendel S.M. (1998) Sequence databases for assessing the potential allergenicity of proteins used in transgenic foods. Adv. Food Nutrition Res., 42, 63–92. [DOI] [PubMed] [Google Scholar]
- 3.FIFRA (2000) A set of scientific issues being considered by the environmental protection agency regarding: assessment of scientific information concerning StarLink Corn. Rep. No. SAP Report No. 2000-06. FIFRA.
- 4.Aalberse R.C. (2000) Structural biology of allergens. J. Allergy Clin. Immunol., 106, 228–238. [DOI] [PubMed] [Google Scholar]
- 5.Breiteneder H. and Ebner,C. (2000) Molecular and biochemical classification of plant-derived food allergens. J. Allergy Clin. Immunol., 106, 27–36. [DOI] [PubMed] [Google Scholar]
- 6.Ipsen H. and Lowenstein,H. (1997) Basic features of crossreactivity in tree and grass pollen allergy. Clin. Rev. Allergy Immunol., 15, 389–396. [DOI] [PubMed] [Google Scholar]
- 7.Sicherer S.H. (2001) Clinical implications of cross-reactive food allergens. J. Allergy Clin. Immunol., 108, 881–890. [DOI] [PubMed] [Google Scholar]
- 8.Zhu H., Schein,C.H. and Braun,W. (2000) MASIA: recognition of common patterns and properties in multiple aligned protein sequences. Bioinformatics, 16, 950–951. [DOI] [PubMed] [Google Scholar]
- 9.Schaumann T., Braun,W. and Wuthrich,K. (1990) The program FANTOM for energy refinement of polypeptides and proteins using a Newton–Raphson minimizer in torsion angle space. Biopolymers, 29, 679–694. [Google Scholar]
- 10.Mumenthaler C. and Braun,W. (1995) Predicting the helix packing of globular proteins by self-correcting distance geometry. Protein Sci., 4, 863–871. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Fraczkiewicz R. and Braun,W. (1998) Exact and efficient analytical calculation of the accessible surface areas and their gradients for macromolecules. J. Comput. Chem., 19, 319–333. [Google Scholar]
- 12.Zhu H. and Braun,W. (1999) Sequence specificity, statistical potentials and 3D structure prediction with self-correcting distance geometry calculations of beta-sheet formation in proteins. Protein Sci., 8, 326–342. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Zhu H., Schein,C.H. and Braun,W. (1999) Homology modeling and molecular dynamics simulations of PBCV-1 glycosylase complexed with UV-damaged DNA. J. Mol. Model., 5, 302–316. [Google Scholar]
- 14.Soman K., Midoro-Horiuti,T., Ferreon,J., Goldblum,R., Brooks,E., Kurosky,A., Braun,W. and Schein,C.H. (2000) Homology modeling and characterization of IgE epitopes of mountain cedar allergen Jun a 3. Biophys. J., 79, 1601–1609. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Soman K.V., Schein,C.H., Zhu,H. and Braun,W. (2001) Homology modeling and simulations of nuclease structures. In Schein,C.H.(ed.), Nuclease Methods and Protocols, Methods in Molecular Biology. Humana Press, Totowa, N.J., Vol. 160, pp. 263–286. [DOI] [PubMed]
- 16.Schein C.H., Nagle,G.T., Page,J.S., Sweedler,J.V., Xu,Y., Painter,S.D. and Braun,W. (2001) Aplysia attractin: Biophysical characterization and modeling of a water-borne pheromone. Biophys. J., 81, 463–472. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Murtazina D., Puchkaev,A.V., Schein,C.H., Oezguen,N., Braun,W., Nanavati,A. and Pikuleva,I.A. (2002) Membrane protein interactions contribute to efficient 27-hydroxylation of cholesterol by mitochondrial cytochrome P450-27a1. J. Biol. Chem., 277, 37582–37589. [DOI] [PubMed] [Google Scholar]
- 18.Bairoch A. and Apweiler,R. (2000) The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res., 28, 45–48. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Wu C.H., Huang,H.Z., Arminski,L., Castro-Alvear,J., Chen,Y.X., Hu,Z.Z., Ledley,R.S., Lewis,K.C., Mewes,H.W., Orcutt,B.C., Suzek,B.E., Tsugita,A., Vinayaka,C.R., Yeh,L.S.L., Zhang,J. and Barker,W.C. (2002) The Protein Information Resource: an integrated public resource of functional annotation of proteins. Nucleic Acids Res., 30, 35–37. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Wheeler D.L., Church,D.M., Lash,A.E., Leipe,D.D., Madden,T.L., Pontius,J.U., Schuler,G.D., Schriml,L.M., Tatusova,T.A., Wagner,L. and Rapp,B.A. (2001) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res., 29, 11–16. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Westbrook J., Feng,Z.K., Jain,S., Bhat,T.N., Thanki,N., Ravichandran,V., Gilliland,G.L., Bluhm,W., Weissig,H., Greer,D.S., Bourne,P.E. and Berman,H.M. (2002) The Protein Data Bank: unifying the archive. Nucleic Acids Res., 30, 245–248. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Venkatarajan M.S. and Braun,W. (2001) New quantitative descriptors of amino acids based on multidimensional scaling of a large number of physical–chemical properties. J. Mol. Model., 7, 445–453. [Google Scholar]
- 23.Ivanciuc O., Schein,C.H. and Braun,W. (2002) Data mining of sequences and 3D structures of allergenic proteins. Bioinformatics, 18, 1358–1364. [DOI] [PubMed] [Google Scholar]
- 24.Leung P.S.C. and Chu,K.H. (2001) cDNA cloning and molecular identification of the major oyster allergen from the Pacific oyster Crassostrea gigas. Clin. Exp. Allergy, 31, 1287–1294. [DOI] [PubMed] [Google Scholar]
- 25.Santos A.B.R., Chapman,M.D., Aalberse,R.C., Vailes,L.D., Ferriani,V.P.L., Oliver,C., Rizzo,M.C., Naspitz,C.K. and Arruda,L.K. (1999) Cockroach allergens and asthma in Brazil: Identification of tropomyosin as a major allergen with potential cross-reactivity with mite and shrimp allergens. J. Allergy Clin. Immunol., 104, 329–337. [DOI] [PubMed] [Google Scholar]
- 26.vanRee R., Antonicelli,L., Akkerdaas,J.H., Garritani,M.S., Aalberse,R.C. and Bonifazi,F. (1996) Possible induction of food allergy during mite immunotherapy. Allergy, 51, 108–113. [PubMed] [Google Scholar]
- 27.Kelley L.A., MacCallum,R.M. and Sternberg,M.J.E. (2000) Enhanced genome annotation using structural profiles in the program 3D-PSSM. J. Mol. Biol., 299, 499–520. [DOI] [PubMed] [Google Scholar]