Victor (VIrtual Construction TOolkit for pRoteins) -------------------------------------------------------------- (c) Silvio C.E. Tosatto, 1999 - 2006. E-mail: -------------------------------------------------------------- README File Version 1.1 ALIGN - Alignment generation and analysis -------------------------------------------------------------- 1. Introduction: This file describes the basic funcionality of the Align package, a C++ library for the generation of pairwise alignments of protein sequences and their analysis. The package comes in the form of C++ source code with several sample executables that can be compiled and used. The necessary data files (e.g. substitution matrices) are provided. Instructions on how to install the software are provided below (cf. 2.). The most important feature of the package is the modular object oriented design, which should allow a moderately experienced C++ programmer to rapidly implement and test new features for sequence alignment. The basic concepts of the design are described below (cf. 3.) and a reference manual is available online from the URL: http://protein.cribi.unipd.it/align/ Some standard examples are also provided below (cf. 4.) for the impatienced and should be sufficient to understand the functionality of the software. Please direct any enquiries regarding the software to Silvio Tosatto , but please be aware that working in academia means we cannot guarantee to fix all bugs immediatly. The Align package is realeased with a non-commercial license, as described in the separate LICENSE file. If you are pursuing academic research, you are especially encouraged to use the software, but please cite the reference below (cf. 5.) in any publication. -------------------------------------------------------------- 2. Installation: In order to function properly, the Victor/Align package requires the environment variable "VICTOR_ROOT" to be set in the shell. In addition, you should add Victor's bin directory to your "PATH" variable. If you are using the BASH shell, add the following lines to your ".bashrc" file: """ > export VICTOR_ROOT= > export PATH=$PATH: """ e.g.: """ > export VICTOR_ROOT=/home/silvio/Victor/ > export PATH=$PATH:/home/silvio/Victor/bin/ """ Once the variables are set, you may proceed to create the executables. From the main Victor directory type the following: """ > make depend # builds library dependencies > make install # compiles libraries & programs; """ (NB: Please ignore any error messages that should be reported during "make install".) The executables will be automatically copied to "VICTOR_ROOT/bin/". The source code will be stored in the "VICTOR_ROOT/Align" directory. Data files (e.g. substitution matrices) are stored in "VICTOR_ROOT/data/". If, for any reason, you wish to restart the compilation from scratch, use: """ > make clean # removes all compiled data """ NB: The code was tested with both GNU C++ v 2.95 and v 3.3 compilers. By default, the Makefile uses the "g++" command, whereever that is mapped to. In case you wish to force usage of either v 2.95 or v 3.3 use the "gcc2=1" resp. "gcc3=1" switches for the "make install" command. Please refere to "Makefile.global" in the "tools" directory for the technical details. -------------------------------------------------------------- 3. Basic Concept: The Align library was designed to be modular and easy to expand. There are four basic components which are needed to use the alignment methods. A minimal implementation can be seen in the "alitest.cc" program. The four main components are: * Blosum The substitution matrix * AlignmentData Stores information on sequence ("SequenceData") and, where needed, secondary structure ("SecSequenceData") * ScoringScheme Stores information on how a single position shall be scored in the alignment, e.g. sequence-to-sequence ("ScoringS2S"), profile-to-sequence ("ScoringP2S") or profile-to-profile ("ScoringP2P") scoring, etc. Requires both an "AlignmentData" and a "Blosum" object. * Align The alignment algorithm. This can be either local (Smith-Waterman, "SWAlign"), global (Needleman-Wunsch, "NWAlign") or glocal/overlap (Free-Shift, "FSAlign") Requires both an "AlignmentData" and a "ScoringScheme" object. If P2S or P2P scoring is used, the class "Profile" stores the necessary information to generate the profile from a multiple sequence alignment (read a either FASTA or BLAST M6 format). Two advanced options, which may be useful in certain circumstances, are supported by the software: 1) ReverseScoring. This allows the estimation of a staistical significance of the raw alignment score by testing it against an ensemble of alignments based on the reversed sequence in the form of a Z-score. 2) Suboptimal alignments. Rather than generating a single solution, the user may decide on a number of different, alternative, suboptimal alignments to be generated. The simplest possible C++ code fragment to generate a global alignment is: """ Blosum sub(matrixFile); SequenceData ad(2, seq1, seq2); ScoringS2S sc(&sub, &ad); NWAlign nwAlign(&sc, &ad, gapPenalty, gapExtension); """ The executables provided with the source code have various functions to exemplify the potential of the C++ library. In detail thse are: * alitest Baseline pairwise alignment generator. * subali Complex alignment generator using the full functionality provided by the software. * blast2fasta A BLAST M6 to FASTA file converter capable of removing redundant data due to several criteria. * checkprofile A profile checking program, loading either FASTA or BLAST M6 data and generating a profile. * alignsec Implementation of the SSEA (secndary structure element alignment) algorithm used by Fontana et al., Bioinformatics, 21(3): 393-395, 2005. A reference manual is available online from the URL: http://protein.cribi.unipd.it/align/ -------------------------------------------------------------- 4. Examples: Below are a number of examples to test the functionality of the C++ classes. The relevant example files can be found in the "examples" subdirectory. In all cases the executables will help output by using the "-h" option. Two sets of sample sequences are included: * A pairwise sequence alignment between an old CASP-4 comparative modelling target (t0111) nd its template (1pdz). This includes PSI-BLAST profiles and PSIPRED predicted secondary structures. * A set of two secondary structures for SSEA alignment. Examples: --------- * Simple sequence to sequence alignment: """ > alitest --fasta t0111-1pdz.fasta -x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x- Read matrix from file: /kim/victor-develop/Victor/data/blosum50.dat Sequence 1: IVKIIGREIIDSRGNPTVEAEVHLEGGFVGMAAAPSGASTGSREALELRDGDKSRFLGKGVTKAVAAVNGPIAQALIGKDAKDQAGIDKIMIDLDGTENKSKFGANAILAVSLANAKAAAAAKGMPLYEHIAELNGTPGKYSMPVPMMNIINGGEHADNNVDIQEFMIQPVGAKTVKEAIRMGSEVFHHLAKVLKAKGMNTAVGDEGGYAPNLGSNAEALAVIAEAVKAAGYELGKDITLAMDCAASEFYKDGKYVLAGEGNKAFTSEEFTHFLEELTKQYPIVSIEDGLDESDWDGFAYQTKVLGDKIQLVGDDLFVTNTKILKEGIEKGIANSILIKFNQIGSLTETLAAIKMAKDAGYTAVISHRSGETEDATIADLAVGTAAGQIKTGSMSRSDRVAKYNQLIRIEEALGEKAPYNGRKEIKG Sequence 2: ITKVFARTIFDSRGNPTVEVDLYTSKGLFRAAVPSGASTGVHEALEMRDGDKSKYHGKSVFNAVKNVNDVIVPEIIKSGLKVTQQKECDEFMCKLDGTENKSSLGANAILGVSLAICKAGAAELGIPLYRHIANLANYDEVILPVPAFNVINGGSHAGNKLAMQEFMILPTGATSFTEAMRMGTEVYHHLKAVIKARFGLDATAVGDEGGFAPNILNNKDALDLIQEAIKKAGYTGKIEIGMDVAASEFYKQNNIYDLDFKTANNDGSQKISGDQLRDMYMEFCKDFPIVSIEDPFDQDDWETWSKMTSGTTIQIVGDDLTVTNPKRITTAVEKKACKCLLLKVNQIGSVTESIDAHLLAKKNGWGTMVSHRSGETEDCFIADLVVGLCTGQIKTGAPCRSERLAKYNQILRIEEELGSGAKFAGKNFRAP gap penalty: 11 -x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x- A Needleman-Wunsch optimal alignment: Score = 1202 An optimal alignment: IVKIIGREIIDSRGNPTVEAEVHLEGGFVGMAAAPSGASTGSREALELRDGDKSRFLGKGVTKAVAAVNGPIAQALI--GKDAKDQAGIDKIMIDLDGTENKSKFGANAILAVSLANAKAAAAAKGMPLYEHIAELNGTPGKYSMPVPMMNIINGGEHADNNVDIQEFMIQPVGAKTVKEAIRMGSEVFHHLAKVLKAK-GMN-TAVGDEGGYAPNLGSNAEALAVIAEAVKAAGYELGKDITLAMDCAASEFYK-----DGKYVLA-GEGNKAFTSEEFTHFLEELTKQYPIVSIEDGLDESDWDGFAYQTKVLGDKIQLVGDDLFVTNTKILKEGIEKGIANSILIKFNQIGSLTETLAAIKMAKDAGYTAVISHRSGETEDATIADLAVGTAAGQIKTGSMSRSDRVAKYNQLIRIEEALGEKAPYNGRKEIKG ITKVFARTIFDSRGNPTVEVDLYTSKGLFR-AAVPSGASTGVHEALEMRDGDKSKYHGKSVFNAVKNVNDVIVPEIIKSGLKVTQQKECDEFMCKLDGTENKSSLGANAILGVSLAICKAGAAELGIPLYRHIANL-ANYDEVILPVPAFNVINGGSHAGNKLAMQEFMILPTGATSFTEAMRMGTEVYHHLKAVIKARFGLDATAVGDEGGFAPNILNNKDALDLIQEAIKKAGY-TGK-IEIGMDVAASEFYKQNNIYDLDFKTANNDGSQKISGDQLRDMYMEFCKDFPIVSIEDPFDQDDWETWSKMTS--GTTIQIVGDDLTVTNPKRITTAVEKKACKCLLLKVNQIGSVTESIDAHLLAKKNGWGTMVSHRSGETEDCFIADLVVGLCTGQIKTGAPCRSERLAKYNQILRIEEELGSGAKFAGKNFRAP -x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x- A Smith-Waterman alignment: Score = 1212 An optimal alignment: IVKIIGREIIDSRGNPTVEAEVHLEGGFVGMAAAPSGASTGSREALELRDGDKSRFLGKGVTKAVAAVNGPIAQALI--GKDAKDQAGIDKIMIDLDGTENKSKFGANAILAVSLANAKAAAAAKGMPLYEHIAELNGTPGKYSMPVPMMNIINGGEHADNNVDIQEFMIQPVGAKTVKEAIRMGSEVFHHLAKVLKAK-GMN-TAVGDEGGYAPNLGSNAEALAVIAEAVKAAGYELGKDITLAMDCAASEFYK-----DGKYVLA-GEGNKAFTSEEFTHFLEELTKQYPIVSIEDGLDESDWDGFAYQTKVLGDKIQLVGDDLFVTNTKILKEGIEKGIANSILIKFNQIGSLTETLAAIKMAKDAGYTAVISHRSGETEDATIADLAVGTAAGQIKTGSMSRSDRVAKYNQLIRIEEALGEKAPYNGR ITKVFARTIFDSRGNPTVEVDLYTSKGLFR-AAVPSGASTGVHEALEMRDGDKSKYHGKSVFNAVKNVNDVIVPEIIKSGLKVTQQKECDEFMCKLDGTENKSSLGANAILGVSLAICKAGAAELGIPLYRHIANL-ANYDEVILPVPAFNVINGGSHAGNKLAMQEFMILPTGATSFTEAMRMGTEVYHHLKAVIKARFGLDATAVGDEGGFAPNILNNKDALDLIQEAIKKAGY-TGK-IEIGMDVAASEFYKQNNIYDLDFKTANNDGSQKISGDQLRDMYMEFCKDFPIVSIEDPFDQDDWETWSKMTS--GTTIQIVGDDLTVTNPKRITTAVEKKACKCLLLKVNQIGSVTESIDAHLLAKKNGWGTMVSHRSGETEDCFIADLVVGLCTGQIKTGAPCRSERLAKYNQILRIEEELGSGAKFAGK -x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x- """ * A more complex profile to sequence alignment with secondary structure: """ > subali -i t0111-1pdz.fasta --prof1 t0111.blast --sec t0111-1pdz.sec -n 1 -x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x- Suboptimal Free-shift alignment: Saving FASTA file to screen: -x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x- >t111 IVKIIGREIIDSRGNPTVEAEVHLEGGFVGMAAAPSGASTGSREALELRDGDKSRFLGKG VTKAVAAVNGPIAQALI--GKDAKDQAGIDKIMIDLDGTENKSKFGANAILAVSLANAKA AAAAKGMPLYEHIAELNGTPGKYSMPVPMMNIINGGEHADNNVDIQEFMIQPVGAKTVKE AIRMGSEVFHHLAKVLKAK-GMN-TAVGDEGGYAPNLGSNAEALAVIAEAVKAAGYELGK DITLAMDCAASEFYK-----DGKYVLA-GEGNKAFTSEEFTHFLEELTKQYPIVSIEDGL DESDWDGFAYQTKVLGDKIQLVGDDLFVTNTKILKEGIEKGIANSILIKFNQIGSLTETL AAIKMAKDAGYTAVISHRSGETEDATIADLAVGTAAGQIKTGSMSRSDRVAKYNQLIRIE EALGEKAPYNGRKEIKG >1PDZ // 943 ITKVFARTIFDSRGNPTVEVDLYTSKGLFR-AAVPSGASTGVHEALEMRDGDKSKYHGKS VFNAVKNVNDVIVPEIIKSGLKVTQQKECDEFMCKLDGTENKSSLGANAILGVSLAICKA GAAELGIPLYRHIANL-ANYDEVILPVPAFNVINGGSHAGNKLAMQEFMILPTGATSFTE AMRMGTEVYHHLKAVIKARFGLDATAVGDEGGFAPNILNNKDALDLIQEAIKKAGY-TGK -IEIGMDVAASEFYKQNNIYDLDFKTANNDGSQKISGDQLRDMYMEFCKDFPIVSIEDPF DQDDWETWSKMTS--GTTIQIVGDDLTVTNPKRITTAVEKKACKCLLLKVNQIGSVTESI DAHLLAKKNGWGTMVSHRSGETEDCFIADLVVGLCTGQIKTGAPCRSERLAKYNQILRIE EELGSGAKFAG-KNFRA -x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x- """ * Profile to profile global alignment with secondary structure, output of the top 5 suboptimal alignments and reverse scoring based on 25 samples: """ > subali -i t0111-1pdz.fasta --prof1 t0111.blast --prof2 1pdz.blast --sec t0111-1pdz.sec -n 5 -r 25 -x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x- Suboptimal Free-shift alignment: Saving FASTA file to screen: -x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x- >t111 IVKIIGREIIDSRGNPTVEAEVHLEGGFVGMAAAPSGASTGSREALELRDGDKSRFLGKG VTKAVAAVNGPIAQALI------GKDAK----DQAGIDKIMIDLDGTENKSKFGANAILA VSLANAKAAAAAKGMPLYEHIAELNGTPGKYSMPVPMMNIINGGEHADNNVDIQEFMIQP VGAKTVKEAIRMGSEVFHHLAKVLKAK------GMN----TAVGDEGGYAPNLGSNAEAL AVIAEAVKAAGYELGKDITLAMDCAASEFY-K----------DGK------YVL------ ---A--G-E-GNKAFTSEEFTHFLEELTKQYPIVSIEDGLDESDWDGFAYQTKVLGDKIQ LVGDDLFVTNTKILKEGIEKGIANSILIKFNQIGSLTETLAAIKMAKDAGYTAVISHRSG ETEDATIADLAVGTAAGQIKTGSMSRSDRVAKYNQLIRIEEALGEKAPYNGRKEIKG >1PDZ // 943 ITKVFARTIFDSRGNPTVEVDLYTSKGLFR-AAVPSGASTGVHEALEMRDGDKSKYHGKS VFNAVKNVNDVIVPEII----KSGLKVT----QQKECDEFMCKLDGTENKSSLGANAILG VSLAICKAGAAELGIPLYRHIANL-ANYDEVILPVPAFNVINGGSHAGNKLAMQEFMILP TGATSFTEAMRMGTEVYHHLKAVIKAR-----FGLD---ATAVGDEGGFAPNILNNKDAL DLIQEAIKKAGY-TGK-IEIGMDVAASEFY-K-----QNNIYDLD------FKT------ ---A-NN-D-GSQKISGDQLRDMYMEFCKDFPIVSIEDPFDQDDWETWSKMTS--GTTIQ IVGDDLTVTNPKRITTAVEKKACKCLLLKVNQIGSVTESIDAHLLAKKNGWGTMVSHRSG ETEDCFIADLVVGLCTGQIKTGAPCRSERLAKYNQILRIEEELGSGAKFAG-KNFRA >1PDZ // 942 ITKVFARTIFDSRGNPTVEVDLYTSKGLFR-AAVPSGASTGVHEALEMRDGDKSKYHGKS VFNAVKNVNDVIVPEII----KSGLKVT----QQKECDEFMCKLDGTENKSSLGANAILG VSLAICKAGAAELGIPLYRHIANL-ANYDEVILPVPAFNVINGGSHAGNKLAMQEFMILP TGATSFTEAMRMGTEVYHHLKAVIKAR-----FGLD---ATAVGDEGGFAPNILNNKDAL DLIQEAIKKAGYT-GK-IEIGMDVAASEFY-K-----QNNIYDLD------FKT------ ---A--NND-GSQKISGDQLRDMYMEFCKDFPIVSIEDPFDQDDWETWSKMTS--GTTIQ IVGDDLTVTNPKRITTAVEKKACKCLLLKVNQIGSVTESIDAHLLAKKNGWGTMVSHRSG ETEDCFIADLVVGLCTGQIKTGAPCRSERLAKYNQILRIEEELGSGAKFAG-KNFRA >1PDZ // 941 ITKVFARTIFDSRGNPTVEVDLYTSKG-LFRAAVPSGASTGVHEALEMRDGDKSKYHGKS VFNAVKNVNDVIVPEII------KSGLK--VTQQKECDEFMCKLDGTENKSSLGANAILG VSLAICKAGAAELGIPLYRHIANLANYD-EVILPVPAFNVINGGSHAGNKLAMQEFMILP TGATSFTEAMRMGTEVYHHLKAVIKAR-----FGLD---ATAVGDEGGFAPNILNNKDAL DLIQEAIKKAGY-TGK-IEIGMDVAASEFY-K----------QNN-----IYDL-----D FKTA-NN-D-GSQKISGDQLRDMYMEFCKDFPIVSIEDPFDQDDWETWSKMT--SGTTIQ IVGDDLTVTNPKRITTAVEKKACKCLLLKVNQIGSVTESIDAHLLAKKNGWGTMVSHRSG ETEDCFIADLVVGLCTGQIKTGAPCRSERLAKYNQILRIEEELGSGAKFAG-KNFRA >1PDZ // 940 ITKVFARTIFDSRGNPTVEVDLYTSKGLFR-AAVPSGASTGVHEALEMRDGDKSKYHGKS VFNAVKNVNDVIVPEII----KSGLKVT----QQKECDEFMCKLDGTENKSSLGANAILG VSLAICKAGAAELGIPLYRHIANLANYDEVIL-PVPAFNVINGGSHAGNKLAMQEFMILP TGATSFTEAMRMGTEVYHHLKAVIKAR----FGLDA----TAVGDEGGFAPNILNNKDAL DLIQEAIKKAGYT-GK-IEIGMDVAASEFY-K----------QNN-IYDLDFKT------ ---A--N-NDGSQKISGDQLRDMYMEFCKDFPIVSIEDPFDQDDWETWSKMTS--GTTIQ IVGDDLTVTNPKRITTAVEKKACKCLLLKVNQIGSVTESIDAHLLAKKNGWGTMVSHRSG ETEDCFIADLVVGLCTGQIKTGAPCRSERLAKYNQILRIEEELGSGAKFAG-KNFRA >1PDZ // 940 ITKVFARTIFDSRGNPTVEVDLYTSKG-LFRAAVPSGASTGVHEALEMRDGDKSKYHGKS VFNAVKNVNDVIVPEII------KSGLK--VTQQKECDEFMCKLDGTENKSSLGANAILG VSLAICKAGAAELGIPLYRHIANL-ANYDEVILPVPAFNVINGGSHAGNKLAMQEFMILP TGATSFTEAMRMGTEVYHHLKAVIKAR-----FGLD---ATAVGDEGGFAPNILNNKDAL DLIQEAIKKAGY-TGK-IEIGMDVAASEFYKQ----------NNI------YDL----DF KTAN--N-D-GSQKISGDQLRDMYMEFCKDFPIVSIEDPFDQDDW--ETWSKMTSGTTIQ IVGDDLTVTNPKRITTAVEKKACKCLLLKVNQIGSVTESIDAHLLAKKNGWGTMVSHRSG ETEDCFIADLVVGLCTGQIKTGAPCRSERLAKYNQILRIEEELGSGAKFAGKNFRAP -x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x- Reverse sequence based Z-score = 265.511 forward score = 943 reverse score = 13.24 -x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x- """ * Secondary structure element alignment: (pairwise) """ > alignsec -a ssea1.fasta -b ssea2.fasta Sequence a: CCEEEEEEEECCCCCCCCCCCCCCCCCCCCCHHHHHHHCCCCCCCCEEEEEEEEEECCCCCCHHHHHHHHHHHHHHCCCCCEEEEECCCCCHHHHHHHHHHHCCCCCCEEEECCCCCCCCCCCCHHHHHHHHHHHHCCCCCCCCCCEEEECCEEEECCCEEECCCCCCCCEEECCCCCCEEEECCEEEECCCCCCCCCCCCCCCCCCCCCCCCEEEEECCCCCCCHHHHHHHHCCCCEEEEEECCCCCCCHHHHHHHHHHHHCCCEEEEEECCCCCCCCCCCCCCHHHHCCEECCCCCHHHHHHHHHHHCCCCCCHHHHHHHHCCC Sequence b: CCCEEEECCEECCCCCCCEEEEEECCCCCCCCCCCEEEEECCCCCCCEEEEEECCCCCCEECCCCCCCCCEEEECCCCEEEEECCCCCCCCCCEECEEECCCCCCCCCCCEEEECCC Computing global alignment using secondary structure types and lengths. Normalized score: 46.5011 Block alignment: CECHCECHCECHCECHCECECECECECECECHCECHCECHCECHCHC ----CE--CE--CEC----E--CEC--ECE--CECECEC-----E-C Experimental residue alignment: 1 CCEEEEEEEECCCCCCCCCCCCCCCCCCCCCHHHHHHHCCCCCCCCEEEEEEEEEECCCCCCHHHHHHHHHHHHHHCCCCCEEEEECCCCCHHHHHHHHHHHCCCCCC-EEEE--CCCCCCCCCCCCHHHHHHHHHHHHCCCCCCCCCCEEEECCEEEE-CCCEEECCCCCCCCEEE---CCCCCCEEEECCEEEECCCCCCCCCCCCCCCCCCCCCCCCEEEEECCCCCCCHHHHHHHHCCCCEEEEEECCCCCCC---HHHHHHHHHHHHCCCEEEEEECCCCCCCCCCCCCCHHHHCCEECCCCCHHHHHHHHHHHCCCCCCHHHHHHHHCCC 326 1 --------------------------------------CCC-----EEEE--------------------------CC---EE-------------------CCCCCCCEEEEEECCCCCCCCCCC-----------------------------EEEEE------CCCCCCC-EEEEEECCCCCC------EE--CCCCCCCCC---------------EEEE----------------CCCCEEEEE-CCCCCCCCCCEE----------C--EEE---CCCCCCCCCCC---------------------------EEEE----------CCC 117 """ * Secondary structure element alignment: (1 vs. many) """ > alignsec -a ssea1.fasta -b ssea3.fasta --mult -x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x QUERY Sequence: CCEEEEEEEECCCCCCCCCCCCCCCCCCCCCHHHHHHHCCCCCCCCEEEEEEEEEECCCCCCHHHHHHHHHHHHHHCCCCCEEEEECCCCCHHHHHHHHHHHCCCCCCEEEECCCCCCCCCCCCHHHHHHHHHHHHCCCCCCCCCCEEEECCEEEECCCEEECCCCCCCCEEECCCCCCEEEECCEEEECCCCCCCCCCCCCCCCCCCCCCCCEEEEECCCCCCCHHHHHHHHCCCCEEEEEECCCCCCCHHHHHHHHHHHHCCCEEEEEECCCCCCCCCCCCCCHHHHCCEECCCCCHHHHHHHHHHHCCCCCCHHHHHHHHCCC SUBJECT Sequence: 3ecaa CCEEEEEEEECCCCCCCCCCCCCCCCCCCCCHHHHHHHCCCCCCCCEEEEEEEEEECCCCCCHHHHHHHHHHHHHHCCCCCEEEEECCCCCHHHHHHHHHHHCCCCCCEEEECCCCCCCCCCCCHHHHHHHHHHHHCCCCCCCCCCEEEECCEEEECCCEEECCCCCCCCEEECCCCCCEEEECCEEEECCCCCCCCCCCCCCCCCCCCCCCCEEEEECCCCCCCHHHHHHHHCCCCEEEEEECCCCCCCHHHHHHHHHHHHCCCEEEEEECCCCCCCCCCCCCCHHHHCCEECCCCCHHHHHHHHHHHCCCCCCHHHHHHHHCCC Computing global alignment using secondary structure types and lengths. Normalized score: 100 Block alignment: CECHCECHCECHCECHCECECECECECECECHCECHCECHCECHCHC CECHCECHCECHCECHCECECECECECECECHCECHCECHCECHCHC Experimental residue alignment: 1 CCEEEEEEEECCCCCCCCCCCCCCCCCCCCCHHHHHHHCCCCCCCCEEEEEEEEEECCCCCCHHHHHHHHHHHHHHCCCCCEEEEECCCCCHHHHHHHHHHHCCCCCCEEEECCCCCCCCCCCCHHHHHHHHHHHHCCCCCCCCCCEEEECCEEEECCCEEECCCCCCCCEEECCCCCCEEEECCEEEECCCCCCCCCCCCCCCCCCCCCCCCEEEEECCCCCCCHHHHHHHHCCCCEEEEEECCCCCCCHHHHHHHHHHHHCCCEEEEEECCCCCCCCCCCCCCHHHHCCEECCCCCHHHHHHHHHHHCCCCCCHHHHHHHHCCC 326 1 CCEEEEEEEECCCCCCCCCCCCCCCCCCCCCHHHHHHHCCCCCCCCEEEEEEEEEECCCCCCHHHHHHHHHHHHHHCCCCCEEEEECCCCCHHHHHHHHHHHCCCCCCEEEECCCCCCCCCCCCHHHHHHHHHHHHCCCCCCCCCCEEEECCEEEECCCEEECCCCCCCCEEECCCCCCEEEECCEEEECCCCCCCCCCCCCCCCCCCCCCCCEEEEECCCCCCCHHHHHHHHCCCCEEEEEECCCCCCCHHHHHHHHHHHHCCCEEEEEECCCCCCCCCCCCCCHHHHCCEECCCCCHHHHHHHHHHHCCCCCCHHHHHHHHCCC 326 -x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x QUERY Sequence: CCEEEEEEEECCCCCCCCCCCCCCCCCCCCCHHHHHHHCCCCCCCCEEEEEEEEEECCCCCCHHHHHHHHHHHHHHCCCCCEEEEECCCCCHHHHHHHHHHHCCCCCCEEEECCCCCCCCCCCCHHHHHHHHHHHHCCCCCCCCCCEEEECCEEEECCCEEECCCCCCCCEEECCCCCCEEEECCEEEECCCCCCCCCCCCCCCCCCCCCCCCEEEEECCCCCCCHHHHHHHHCCCCEEEEEECCCCCCCHHHHHHHHHHHHCCCEEEEEECCCCCCCCCCCCCCHHHHCCEECCCCCHHHHHHHHHHHCCCCCCHHHHHHHHCCC SUBJECT Sequence: d1mcoh1 CCCEEEECCEECCCCCCCEEEEEECCCCCCCCCCCEEEEECCCCCCCEEEEEECCCCCCEECCCCCCCCCEEEECCCCEEEEECCCCCCCCCCEECEEECCCCCCCCCCCEEEECCC Computing global alignment using secondary structure types and lengths. Normalized score: 46.5011 Block alignment: CECHCECHCECHCECHCECECECECECECECHCECHCECHCECHCHC ----CE--CE--CEC----E--CEC--ECE--CECECEC-----E-C Experimental residue alignment: 1 CCEEEEEEEECCCCCCCCCCCCCCCCCCCCCHHHHHHHCCCCCCCCEEEEEEEEEECCCCCCHHHHHHHHHHHHHHCCCCCEEEEECCCCCHHHHHHHHHHHCCCCCC-EEEE--CCCCCCCCCCCCHHHHHHHHHHHHCCCCCCCCCCEEEECCEEEE-CCCEEECCCCCCCCEEE---CCCCCCEEEECCEEEECCCCCCCCCCCCCCCCCCCCCCCCEEEEECCCCCCCHHHHHHHHCCCCEEEEEECCCCCCC---HHHHHHHHHHHHCCCEEEEEECCCCCCCCCCCCCCHHHHCCEECCCCCHHHHHHHHHHHCCCCCCHHHHHHHHCCC 326 1 --------------------------------------CCC-----EEEE--------------------------CC---EE-------------------CCCCCCCEEEEEECCCCCCCCCCC-----------------------------EEEEE------CCCCCCC-EEEEEECCCCCC------EE--CCCCCCCCC---------------EEEE----------------CCCCEEEEE-CCCCCCCCCCEE----------C--EEE---CCCCCCCCCCC---------------------------EEEE----------CCC 117 """ -------------------------------------------------------------- 5. References: Silvio C.E. Tosatto, Alessandro Albiero, Alessandra Mantovan, Carlo Ferrari, Eckart Bindewald and Stefano Toppo. "Align: a C++ class library for rapid sequence alignment prototyping." Current Drug Discovery Technologies, in press. (2006) --------------------------------------------------------------