Difference between revisions of "Features"

From victor
Jump to: navigation, search
(How to generate clustered lookup tables)
(Lobo)
Line 249: Line 249:
  
 
Remember that before trying any of the following applications the environment variables should be set. Be careful to add the final "/" to the path.
 
Remember that before trying any of the following applications the environment variables should be set. Be careful to add the final "/" to the path.
  export VICTOR_ROOT=/<your_folder>/victor2.0/   
+
  export VICTOR_ROOT=/<your_folder>/victor/   
  export PATH=$PATH:/<your_folder>/victor2.0/bin/
+
  export PATH=$PATH:/<your_folder>/victor/bin/
  
 
==How to create a LUT==
 
==How to create a LUT==

Revision as of 09:55, 31 July 2014

All the applications are in the bin folder, in there you will find a set of programs ready to use, all of them have the -h option, that shows which are the possible options to run the program. In the following section all the application are explain and also there is at least one example on how to use it.

Biopool library

The Biopool class implementation follows the composite design pattern and for a complete description of the class hierarchy we recommend to see the [Doxygen documentation]. Without going into implementation details a Protein object is just a container for vectors representing chains. Each vector has 2 elements: the Spacer and the Ligand Set. The Spacer is the container for AminoAcid objects whereas the LigandSet is a container for all other molecules and ions, including DNA/RNA chains. Ultimately all molecules, both in the Spacer and in the LigandSet are collections of Atom objects. The main feature in Biopool is that each AminoAcid object in the Spacer is connected to its neighbours by means of one rotational vector plus one translational vector. This implementation make ease the modification of the protein structure and lot of functions were implemented to modify/perturbate/transformate the residue relative position in an efficient way. Rotation and Translation vectors:


The object representation look like that:

immagine:SchemeProteinclass.jpg


Victor includes different packages: Biopool, Lobo and Energy. Every package is identified by a directory, starting with a capital letter, in the main Victor path. Inside each package you will find the Source folder containing the classes code and the APPS directory including useful utilities. In the main path you will find the data folder containing symbolic links to data files used by singular packages. In the main Victor path you should also find the bin directory containing most important programs simply copied from the APPS folders.


Parsing a PDB file (PdbLoader)

Biopool uses the PdbLoader class to load PDB files. By default it loads all standard residues and hetero atoms excluding nucleotides and water molecules. When possible it also tries to place hydrogen atoms to every amino acid included in the spacer and determine the secondary structure with the DSSP algorithm. The simplest way to load a PDB into a Protein object is:

  1.   #include <PdbLoader.h>
  2.   #include <Protein.h>
  3.   #include <iostream>
  4.  
  5.   int main( int argc, char* argv[] ) {
  6.  
  7.      string inputFile = "MyPdbFile.pdb";
  8.      ifstream inFile( inputFile.c_str() );
  9.      PdbLoader pl(inFile);    // creates the PdbLoader object
  10.  
  11.      Protein prot;            
  12.      prot.load( pl );         // creates the Protein object
  13.   }

Modify the structure

Add hydrogen atoms

Get the secondary structure

There are 3 different ways in Victor to get the secondary structure. The first (inaccurate) is just parsing the HELIX and SHEET fields in the PDB file. The second method is to infer the secondary structure from torsional angles. The last choice is to use an implementation of the DSSP algorithm, consider that you can find little (negligible) differences compared to the original algorithm but it is the most accurate way to calculate the secondary structure.

Energy

Remember that before trying any of the following applications the environment variables should be set

export VICTOR_ROOT=/<your_folder>/victor2.0/  
export PATH=$PATH:/<your_folder>/victor2.0/bin/ 

How to obtain the solvation potential

pdb2solv is an application that creates a file containing all the frequencies of occurrence of residue a with burial r, that are needed to derived the solvation potentials for all the amino acids in the given PDB. A solvation potential for an amino acid residue a is defined as:

Solvation potential=R*T*ln(fa(r)/f(r)) 

where r is the degree of residue burial,fa(r) is the frequency of occurrence of residue a with burial r. and f(r) is the frequency of occurrence of all residues with burial r.

The degree of burial for a residue is defined as the number of other Cβ atoms located within 10 Å(non polar)/ 7 Å (polar)of the residue’s Cβ atom.

As input a PDB is needed


The output will depend on the given options, considering 30 maximum binds possible (by default test.out, use -o option to set a name)


Non polar output file format

--------------------------------------------------------------------------------------------------------------- 
total quantity of residues evaluated | AA type(3L) | frequencies 
--------------------------------------------------------------------------------------------------------------- 


Polar output file format

--------------------------------------------------------------------------------------------------------------- 
P |  total quantity of residues evaluated | AA type(3L) | frequencies | Polar frequency | 
--------------------------------------------------------------------------------------------------------------- 

To obtain the solv.par file used for pdb2energy, frst, etc applications, you need to use the following line with all the pdbs in the TOP500H database.

./pdb2solv -i ../samples/119L.pdb 


More reference "Victor/Frst function for model quality Estimation" GenTHREADER: an efficient and reliable protein fold recognition method for genomic sequences1 David T. Jones The TOP500H database was used to create the file (solv.par)

for a detailed example see pdb2solv example

How to obtain the torsion angles from the PDB residues

The application pdb2tor obtains the set of angle for each residue. As input it uses a PDB file and the corresponding chain, or a file with the PDB ids which can include the chain, if a chain is not included the application uses the first found chain.

Structure of the pdb filelist Uses the first chain for each pdb

PDBID 
PDBID 
PDBID 

To use the corresponding chain for each pdb, need to use the --complete option

PDBID(complete name of the corresponding file) chain 
PDBID chain 
PDBID chain 

if many chains from the same pdb are input, just repeat the PDBid and use a different chain

This application can be used also to generate the tor.par file used for TAP application. To generate it you need to use the following line with the TOP500H database.

./pdb2tor -I ../samples/filelist2_ --complete 


Output format (-A option, Give per residue phi, psi, omega, chi, pre-psi and pre-psi angle)


AA Type(one letter format) | Number | pre-phi | pre-psi | phi | psi | omega | chi1 | chi2


!Total file analyzed: Number of files analyzed


Output format (using -r option)


Numbers of lines in the file


phi | psi | AA type | pre phi | pre psi | omega | #carbons | chi1 | chi2


for a detailed example see pdb2tor example

How to obtain normalized energy from a PDB

The application pdb2torenergy calculates a pseudo-energy to evaluate the quality of a given protein structural model, as expressed in a single (real) number. This program allows you to obtain the normalized energy mentioned in TAP paper.

Input data by default

tor.par , created by pdb2tor using TOP500H database.

To calculate the normalized energy multiple PDBs and PDB chain(s) can be used

Output Depending of the options the energy can be calculated for all the chain residues or for each of them. Per residue, one energy for each of the residues in the pdb

./pdb2torenergy -i ../samples/119L.pdb --allchains -p 

Per pdb(one energy value)

./pdb2torenergy -i ../samples/119L.pdb --allchains 

For chain A in each model each model(many energy values as models in the pdb file)

./pdb2torenergy -i ../samples/1IHQ.pdb -c A 

for a detailed example see pdb2torenergy example

How to obtain FRST value from a PDB

The application frst allows to calculates the frst value using solvation potential, torsion angles, rapfdf . To use this application some input files are needed. All this mentioned files can be generated using another energy/lobo applications or you can use the already generated ones saved in the victor2.0/data folder.

Default Input files

tor.par, created by pdb2tor using TOP500H database
solv.par created by pdb2solv using TOP500H database
ram.par 


Output format

The application prints the value of frst for the given pdb if use the option -v it will print also the values of Rapdf energy, Solvation energy, Mainchain hydrogen bonds ,Torsion energy .

To calculate the average over a chain in a NMR ensemble

./frst -i ../samples/16PK.pdb  

To calculate the average over many pdb files

./frst -I ../samples/filelist

for a detailed example see frst example

How to obtain TAP value from a PDB

The pdb2tap application allows to evaluate the quality of a model, using TAP method (). Used for the evaluation of the quality of protein models determined by X-ray crystallography. The method is based on a relative pseudo-energy calculated from the side chain torsion angle propensities and the backbone, both then are normalized against the global minimum and maximum for the protein sequence under consideration.

Methods

Torsion angle potential (based on frst) 
Pseudo Energy i, maximum and minimum 
TAP = (E-Emin)/(Emax-Emin) 

Known as normalized torsion angle propensity, gives a indication of the degree of nativeness of the protein model.


Initial default data (can be created with pdb2tor)

The file tor.par is used to calculate the TAP value, this file can be created with the pdb2tor application,and by default is 
created using the TOP500H database.
tor.par: file containing all torsion angles available from TOP500H database. 

For more reference see:

For the database 
TOP500H is the list of 500 proteins used for the Ramachandran plot distributions, with File ID {PDB code + chainID 
(if not the full PDB  file) + H (to signify H's added), structure factor deposition status, resolution, and protein name. 
500High resolution xRay resolved to 1.8 A or more and less than 60%seq ident.609NMR structures(9578 models)
  http://kinemage.biochem.duke.edu/databases/top500.php   
For method
Fine-grained statistical torsion angle potentials are effective in discriminating native protein structures. PMID: 16712465 
[PubMed - indexed for MEDLINE] 


Output format

A plain text file containing:

Numbers of lines in the file


phi | psi | AA type | pre phi | pre psi | omega | #carbons | chi1 | chi2



Output interpretation:

Value close to 1 for a native structure 
Value close to 0 for a largely incompatible sequence. 


Input data

The application can be used with one or many PDBs and PDB chains.

Single structure Xray using one chain:

./pdb2tap -i ../samples/102M.pdb -c A   

Output: Prints the tap value, as shown in http://www.biomedcentral.com/content/supplementary/1471-2105-8-155-s1.txt


Single structure Xray using all chains(all chains in pdb):

./pdb2tap -i ../samples/1A3W.pdb -P sal  --allchains as shown in 				

Output: Prints the tap value average value for all chains. http://www.biomedcentral.com/content/supplementary/1471-2105-8-155-s1.txt


Multiple models NMR using one chain:

./pdb2tap -i ../samples/1IHQ.pdb -P sal -c A --nmr 

Output: Prints the tap value for the selected for each model, the average tap value for all models, standard Deviation, minimum and maximum tap value.

Multiple models NMR using all chains(all chains in pdb):

./pdb2tap -i ../samples/1IHQ.pdb -P sal --allchains --nmr 

Output: Prints the tap value average value for all chains in each model, the average tap value for all models, standard Deviation, minimum and maximum tap value.

for a detailed example see pdb2tap example

Lobo

Lobo is a Loop Modeling software that uses pre-calculated Look-Up Tables (LUTs) that represent loop fragments of various sizes to speed up calculation. LUTs can be generated once and stored, only requiring loading during loop modeling.

Conformations are produced by recursively dividing the segment until the backbone coordinates can be derived analytically.


caption

Remember that before trying any of the following applications the environment variables should be set. Be careful to add the final "/" to the path.

export VICTOR_ROOT=/<your_folder>/victor/  
export PATH=$PATH:/<your_folder>/victor/bin/

How to create a LUT

The construction of the LUTs is separated from modelling and has to be executed only once. LoboLUT is the program necessary to create a look-up table of a specific length. To create a LUT to model loops of length N, first is necessary to create LUTs from size 2 to N/2. In any case the application would create a binary file containing the corresponding values for the selected length.

Create a first LUT of length 2:

loboLUT -A 1 -B 1 -O aa2.lt --table <destination path>/ -R data/tor.par

Add 1 residue:

loboLUT -A aa2.lt -B 1 -O aa3.lt --table <destination path>/ -R data/tor.par

Create a table of length 4 combining two smaller LUTs.

loboLUT -A aa2.lt -B aa2.lt -O aa4.lt --table <destination path>/ -R data/tor.par

To avoid the annoying task of creating all LUT tables by hand you can use LoboLUT_all that will do the task for you automatically.

N.B. Remember you set the VICTOR_ROOT path to select a convenient destination path.

How to create LUTs for a fragment of size N

LoboLUT_all is a perl script used to automatically generate all the necessary LUTs for modelling a fragment of length N. For example, to create LUTs for a fragment of length 5 you can run the following command:

loboLUT_all -c 5 

This will create LUTs for fragments of length 2, 3 and 5. For more details see also loboLUT_all example

How to identify loops in PDBs

CreateLoopTestset is a program that allows you to model a single loop. It gives the user full flexibility about the setting of parameters for ranking and modelling. It finds the starting and ending positions in a single o multiple PDB files. Its output can be used to model the loop with the LoopModelTest application. To obtain the list of starting and ending points:

createLoopTestset -o listLoops -i samples/filelist 

Where the content in filelist is for example:

samples/173D
samples/2MKPC
samples/4JDG
samples/173L    
samples/3A0R

The output will be:

index1 (-s): 9 index2 (-e) 13
index1 (-s): 44 index2 (-e) 49
index1 (-s): 52 index2 (-e) 57
index1 (-s): 62 index2 (-e) 70
..........

Where the (-s) and (-e) are the starting and ending position respectively.

Covert a binary LUT into text

LUT tables are generally saved in binary format both for performance and space efficiency. LoopTablePlot is able to convert LUT tables in a human readable textual format. For example, to generate the corresponding plot for the LUT aa5.lt (created previously):

LoopTablePlot -i aa5.lt  -o <plot output file> -s l 

The s option allows to define the numerical precision (small=s, medium=m, large=l), that, of course, strongly affects the storage size. For a detailed example see LoopTablePlot example

How to model a loop

LoopModelTest allows to generate possible loop conformations and creates a PDB file for each solution:

LoopModelTest -i samples/<pdb_file.pdb> -c A -s X -e Y

Where X and Y are the start and end positions obtained by CreateLoopTestset and -c A tells the program to work on the chain A of an specific PDB file: Using the information obtained with the app CreateLoopTestset

LoopModelTest -i samples/119L.pdb -c A -s 7 -e 14 

Remember to create the LUT table for a fragment of length 7 with loboLUT_all.

The new pdbs files are created in the working path. The output columns correspond to the global RMS, end RMS, bond lenght, bond angle and torsion angle:

Results:  						    1.35     121     180 
 0   global RMS=  0.416   ( 0.366)	end-RMS=  0.234	    1.17     126     175 
 1   global RMS=  0.356   ( 0.295)	end-RMS= 0.0822	    1.38     121    -176 
......

How to obtain torsion angles of a PDB

Loop2torsion allows to obtain all the phi and psi angles of all amino acids in a selected chain.

loop2torsion -i samples/2R8O.pdb -c A  

The output contains the list of the angles and the B-factor of 1.

-72.1     157    1.0 
 -165     142    1.0 
  122    -172    1.0 
 -126    98.1    1.0 
....

How to cluster angle data

ClusterRama can clusterize a Ramachandran distribution. The input file can be for example tor.par generated before with the Energy module (see Energy section). To obtain the clustered data using a cutoff value of 100:

ClusterRama -i data/tor.par -o outRama -c 100.0 

The output contains the number of values in the input file, the angles and the corresponding residue name:

12 
 -55.07    -44.61   GLY 
  76.11    -172.4   GLY 
 -139.2       129   GLY 
 ...

How to generate clustered lookup tables (REMOVE)

LoopTableTest generates tables of protein entries for the Lobo algorithm .

LoopTableTest -A 1 -B 1 -O output.lt -R outRama -S s 

The "output.lt" created is not a plain text file, use LoopTablePlot application to output the corresponding angle values

 Min: 
  EP: -4.126	 ED: -1.281	 N: -0.9997	 MP: -1.582	 MD: -0.4919	 MN: -0.9949 
  EP:   2.6	 ED: -1.332	 N:    -1	 MP: 1.521	 MD: 0.4671	 MN: -0.8217 
  EP: -3.966	 ED: -1.289	 N: -0.9836	 MP: -1.598	 MD: -0.7378	 MN: -0.5885 
 Max: 
  EP: 3.437	 ED: 1.022	 N: 0.6597	 MP: 0.9131	 MD: 0.5203	 MN: 0.8068 
  EP: 4.856	 ED: 0.1761	 N: 0.6105	 MP: 2.486	 MD: 0.9987	 MN: 0.6888 
  EP: 3.592	 ED:  1.27	 N: 0.9813	 MP: 1.307	 MD: 0.8342	 MN: 0.7185 
 ---------------------------- 
 Entry    0	 EP: -2.737	 ED: -0.01248	 N: -0.02252	 MP: -0.8014	 MD: 0.2146	 MN: 0.6219 
		 EP: 2.699	 ED: -1.172	 N: 0.5104	 MP: 1.879	 MD: 0.921	 MN: -0.3856 
		 EP: 1.984	 ED: -0.6955	 N: -0.8596	 MP: 1.022	 MD: 0.3252	 MN: 0.6816

To create the Ramachandran input file that contains the clustered data use ClusterRama application.

How to generate LUTs using Ramachandran clustered data

The ClusterLoopTable program allows you to create a new clustered LUT, based on LUTs already created with LoboLUT or loboLUT_all and defining a cutoff value. In this example, a cutoff of 10 is set, and used a LUT of length 5.

ClusterLoopTable -I data/aa5.lt -O data/aa5clustered.lt -C 10.0 

The created output is not a plain text file, to see the content use the LoopTablePlot application

How to analyze the backbone geometry of a PDB

BackboneAnalyzer is an application that allows to analyze a PDB file in terms of bond lengths and bond angles . As input it uses the PDB file and the chain to evaluate

backboneAnalyzer -i samples/2R8O.pdb -c A 

The printed output includes the minimum, maximum, average bonds lengths and angles and the corresponding standard deviations.

------------------------------------------------------- 
	      Bond Lengths			Bond Angles 
Num	 N->CA	 CA->C'	 C'->N		 N->CA	 CA->C'	 C'->N 
------------------------------------------------------- 
Min:	1.4450  	1.5019	            1.3206		116.87	             104.83	112.55 
Max:	1.4804	        1.5479	            4.0701		158.03	             118.34	158.56 
------------------------------------------------------- 
Avg:	1.4636		1.5272		1.3505		121.58		111.71		116.73 
SD:	0.0054		0.0067		0.2074		  2.45	  	  2.16	 	  1.98