Difference between revisions of "Tutorial"

From victor
Jump to: navigation, search
(Parsing a PDB file (PdbLoader))
(Lobo)
Line 50: Line 50:
  
  
 +
How Do I …?
 +
Loop Modeling
 +
 +
 +
Loop Modeling
 +
  Based on Lobo algorithm and the lookup tables. All the samples used are in the samples folder, all the generated data is in the data folder.
 +
 +
How to create a lookup table
 +
How to create a lookup table for a fragment of length n
 +
How to see the content of a lookup table
 +
How to find the starting and ending position of a loop
 +
How to model a loop
 +
How to obtain a pdb`s torsion angles
 +
How to Clustering data
 +
How to generate clustered lookup tables
 +
How to generate lookup tables using Ramachandran`s clustered data
 +
How to analize the backbone  geometry of a PDB
 +
 +
 +
 +
 +
 +
 +
 +
 +
 +
How to create a lookup table
 +
LoboLUT is a perl script used to create a lookup table of a specific length. This table contains the posible angles for the loop creation, this table is used each time a loop is going to be modeled. But is generated only once.
 +
To create a lookup table for a fragment of an N lenght
 +
N=2
 +
        ./loboLUT -A 1 -B 1 -O aa2.lt --table <destination path>
 +
N=3
 +
        ./loboLUT -A aa2.lt -B 1 -O aa3.lt --table <destination path>
 +
N=4
 +
        ./loboLUT -A ../data/aa2.lt -B ../data/aa2.lt -O aa4.lt --table <destination path>
 +
 +
Is recomended to use as a destination path the data folder inside victor library
 +
 +
 +
 +
How to create a lookup table for a fragment of length n
 +
LoboLUT_all is a perl script used to create all the lookup tables needed. For the modeling of a fragment of lenght N. The algorithm to create the tables, is based on some of the previous created tables. It always considers the half of the length and the half of it until arriving to the 2 or 3 length with are the based tables.
 +
To create the Lookup tables for a fragment of 5 you could use the following line
 +
./loboLUT_all -c 5
 +
this will create the lookup table for 2, 3 and 5, that will be the needed ones. Remember that the Lobo algorithm divides in two the lenght of the fragment thats why the aa2.lt and aa3.lt should also be created.
 +
 +
Also you can create a lookup table of a specific lenght of
 +
N=2
 +
        ./loboLUT -A 1 -B 1 -O aa2.lt --table <destination path>
 +
N=3
 +
        ./loboLUT -A aa2.lt -B 1 -O aa3.lt --table <destination path>
 +
N=4
 +
        ./loboLUT -A ../data/aa2.lt -B ../data/aa2.lt -O aa4.lt --table <destination path>
 +
 +
Is recomended to use as a destination path the data folder inside victor library
 +
 +
 +
How to see the content of a lookup table
 +
the lookUp table is created an a not plain text file, this is why a diferent application is needed to see the content of the table. LoopTablePlot is a c++ program that is used to do this, remember that this table file is create by loboLUT_all / loboLUT .
 +
To see the lookup table aa5.lt (created previously), in this example s option allows to define the size for the output
 +
./LoopTablePlot -i ../data/aa5.lt  -o PLotoutput -s l
 +
The output created in the  Plotoutput file contains the list of possible loop angles. 
 +
 +
 +
How to find the starting and ending position of a loop
 +
CreateLoopTestset is a c++ program that finds the starting and ending positions in a PDB file or in many PDB files. Its output could be used to model the loop with LoopModelTest application .
 +
To obtain the list of starting and ending points for the files 119L 16PK
 +
./createLoopTestset -o listLoops -i ../samples/filelist
 +
 +
Content in filelist file
 +
  ../samples/119L
 +
  ../samples/16PK
 +
 +
the output will be printed and will be like
 +
index1 (-s): 7 index2 (-e) 14
 +
index1 (-s): 48 index2 (-e) 52
 +
index1 (-s): 86 index2 (-e) 89
 +
index1 (-s): 99 index2 (-e) 104
 +
….........
 +
where the -s is the starting position and the -e is the ending position
 +
if many pdbs are evaluated, the application will show all the loops for the first pdb listed and then all the loops for the following ones.
 +
 +
 +
 +
How to model a loop
 +
LoopModelTest is a c++ program that allows the creation of multiple possible loops and creates a pdb file for each of them .
 +
This program needs as input the pdb and the start and end position to set the loop
 +
To create the loop from a start position X to an end position Y of the chain A of an specific pdb file
 +
./LoopModelTest -i ../samples/ZZZZ.pdb -c A -s X -e Y
 +
Using the information obtained with the app CreateLoopTestset 
 +
./LoopModelTest -i ../samples/119L.pdb -c A -s 7 -e 14
 +
Remember to create the lookup table for a 7 length fragment using    ./loboLUT_all -c 7
 +
 +
The new pdbs files fill be created in the working path, and in the printed output will be shown the global RMS, end RMS, bond lenght, bond angle and torsion angle
 +
 +
Printed output
 +
Results:      1.35    121    180
 +
  0  global RMS=  0.416  ( 0.366) end-RMS=  0.234     1.17    126    175
 +
  1  global RMS=  0.356  ( 0.295) end-RMS= 0.0822     1.38    121    -176
 +
…....
 +
 +
How to obtain a PDB`s torsion angles
 +
Loop2torsion is a c++ program that allows to obtain all the phi and psi angles of all the amino acids in a selected  PDB chain .
 +
 +
To obtain the angles a PDB file is needed as input and also the chain should be specified
 +
 +
    ./loop2torsion -i ../samples/2R8O.pdb -c A 
 +
The printed output is the list of the angles and the Bfactor of 1.
 +
  -72.1    157    1.0
 +
    -165    142    1.0
 +
    122    -172    1.0
 +
    -126    98.1    1.0
 +
  …....
 +
 +
 +
 +
How to Clustering data
 +
Using the tor file created in Energy package, ClusterRama, a c++ program clusters the data contained in a Ramachandran distribution file
 +
To obtain the clustered data using a cutoff value of 100 
 +
    ./ClusterRama -i ../data/tor.par -o outRama -c 100.0
 +
 +
The output contains the number of the values found the angles values and the corresponding residue
 +
12
 +
-55.07    -44.61  GLY
 +
  76.11    -172.4  GLY
 +
-139.2      129  GLY
 +
…...
 +
 +
How to generate clustered lookup tables
 +
Based in the clustered data the LoopTableTest c++ program generates tables of protein entries for the Lobo algorithm .
 +
    ./LoopTableTest -A 1 -B 1 -O output.lt -R outRama -S s
 +
To create the Ramachandran input file that contains the clustered data use ClusterRama application.
 +
The output created is not a plan text file, use the  LoopTablePlot application
 +
The printed output, includes the corresponding angle values (see figure)
 +
Min:
 +
EP: -4.126 ED: -1.281 N: -0.9997 MP: -1.582 MD: -0.4919 MN: -0.9949
 +
EP:  2.6 ED: -1.332 N:    -1 MP: 1.521 MD: 0.4671 MN: -0.8217
 +
EP: -3.966 ED: -1.289 N: -0.9836 MP: -1.598 MD: -0.7378 MN: -0.5885
 +
Max:
 +
EP: 3.437 ED: 1.022 N: 0.6597 MP: 0.9131 MD: 0.5203 MN: 0.8068
 +
EP: 4.856 ED: 0.1761 N: 0.6105 MP: 2.486 MD: 0.9987 MN: 0.6888
 +
EP: 3.592 ED:  1.27 N: 0.9813 MP: 1.307 MD: 0.8342 MN: 0.7185
 +
----------------------------
 +
Entry    0 EP: -2.737 ED: -0.01248 N: -0.02252 MP: -0.8014 MD: 0.2146 MN: 0.6219
 +
EP: 2.699 ED: -1.172 N: 0.5104 MP: 1.879 MD: 0.921 MN: -0.3856
 +
EP: 1.984 ED: -0.6955 N: -0.8596 MP: 1.022 MD: 0.3252 MN: 0.6816
 +
 +
 +
 +
How to generate lookup tables using Ramachandran`s clustered data
 +
Based on a lookup table already created with LoboLUT/loboLUT_all and defining a cutoff value. The ClusterLoopTable program allows you to create the new clustered lookuptable.
 +
In this example, a cutoff of 10 is set, and it uses the lookup table for a length of 5.
 +
    ./ClusterLoopTable -I ../data/aa5.lt -O ../data/aa5clustered.lt -C 10.0
 +
The created output is not a plain text file, to see the content use the LoopTablePlot application
 +
 +
 +
How to analyze the backbone  geometry of a PDB
 +
BackboneAnalyzer is an application that allows to analyze a PDB file in terms of bond lengths and bond angles .
 +
As input it uses the PDB file and the chain to evaluate
 +
      ./backboneAnalyzer -i ../samples/2R8O.pdb -c A
 +
 +
The printed output includes the minimum, maximum, average bonds lengths and angles and the corresponding standard deviations.
 +
-------------------------------------------------------
 +
      Bond Lengths Bond Angles
 +
Num N->CA CA->C' C'->N N->CA CA->C' C'->N
 +
-------------------------------------------------------
 +
Min: 1.4450  1.5019             1.3206 116.87             104.83 112.55
 +
Max: 1.4804             1.5479             4.0701 158.03             118.34 158.56
 +
-------------------------------------------------------
 +
Avg: 1.4636 1.5272 1.3505 121.58 111.71 116.73
 +
SD: 0.0054 0.0067 0.2074   2.45   2.16 1.98
  
 
= Energy =
 
= Energy =

Revision as of 17:15, 27 June 2014

Biopool

The Biopool class implementation follows the composite design pattern and for a complete description of the class hierarchy we reccomend to see the [Doxygen documentation]. Whitout going into implementation details a Protein object is just a container for vectors representing chains. Each vector has 2 elements: the Spacer and the Ligand Set. The Spacer is the container for AminoAcid objects whereas the LigandSet is a container for all other melecules and ions, including DNA/RNA chains. Ultimately all molecules, both in the Spacer and in the LigandSet are collections of Atom objects. The main feature in Biopool is that each AminoAcid object in the Spacer is connected to its neighbours by menas of one rotational vector plus one translational vector. This implementation make ease the modification of the protein structure and lot of functions were implemented to modify/perturbate/transformate the residue relative position in an efficent way. Rotation and Translation vectors:


The object representation look like that:

immagine:SchemeProteinclass.jpg


Victor includes different packages: Biopool, Lobo and Energy. Every package is identified by a direcotry, starting with a capital letter, in the main Victor path. Inside each package you will find the Source folder containing the classes code and the APPS directory including useful utilities. In the main Victor path you will find the bin directory containing most important porgrams simply copied from the APPS folders. In the main path you should also find the data folder containing symbolic links to data files used by singular packages.


Parsing a PDB file (PdbLoader)

Biopool uses the PdbLoader class to load PDB files. By default it loads all standard residues and hetero atoms excluding nucleotides and water molecules. When possible it also tries to place hydrogen atoms to every amino acid included in the spacer and determine the secondary structure with the DSSP algorithm. The simplest way to load a PDB into a Protein object is:

  1.   #include <PdbLoader.h>
  2.   #include <Protein.h>
  3.   #include <iostream>
  4.  
  5.   int main( int argc, char* argv[] ) {
  6.  
  7.      string inputFile = "MyPdbFile.pdb";
  8.      ifstream inFile( inputFile.c_str() );
  9.      PdbLoader pl(inFile);    // creates the PdbLoader object
  10.  
  11.      Protein prot;            
  12.      prot.load( pl );         // creates the Protein object
  13.   }

Modify the structure

Add hydrogen atoms

Get the secondary structure

There are 3 different ways in Victor to get the secondary structure. The first (innacurate) is just parsing the HELIX and SHEET fields in the PDB file. The second method is to infer the secondary structure from torsional angles. The last choice is to use an implementation of the DSSP algorithm, consider that you can find little (negligible) differences compared to the original algorithm but it is the most accurate way to calculate the secondary structure.

Lobo

How Do I …? Loop Modeling


Loop Modeling

 Based on Lobo algorithm and the lookup tables. All the samples used are in the samples folder, all the generated data is in the data folder.

How to create a lookup table How to create a lookup table for a fragment of length n How to see the content of a lookup table How to find the starting and ending position of a loop How to model a loop How to obtain a pdb`s torsion angles How to Clustering data How to generate clustered lookup tables How to generate lookup tables using Ramachandran`s clustered data How to analize the backbone geometry of a PDB





How to create a lookup table LoboLUT is a perl script used to create a lookup table of a specific length. This table contains the posible angles for the loop creation, this table is used each time a loop is going to be modeled. But is generated only once. To create a lookup table for a fragment of an N lenght N=2

       ./loboLUT -A 1 -B 1 -O aa2.lt --table <destination path> 

N=3

       ./loboLUT -A aa2.lt -B 1 -O aa3.lt --table <destination path> 

N=4

       ./loboLUT -A ../data/aa2.lt -B ../data/aa2.lt -O aa4.lt --table <destination path> 

Is recomended to use as a destination path the data folder inside victor library


How to create a lookup table for a fragment of length n LoboLUT_all is a perl script used to create all the lookup tables needed. For the modeling of a fragment of lenght N. The algorithm to create the tables, is based on some of the previous created tables. It always considers the half of the length and the half of it until arriving to the 2 or 3 length with are the based tables.

To create the Lookup tables for a fragment of 5 you could use the following line 

./loboLUT_all -c 5 this will create the lookup table for 2, 3 and 5, that will be the needed ones. Remember that the Lobo algorithm divides in two the lenght of the fragment thats why the aa2.lt and aa3.lt should also be created.

Also you can create a lookup table of a specific lenght of N=2

       ./loboLUT -A 1 -B 1 -O aa2.lt --table <destination path> 

N=3

       ./loboLUT -A aa2.lt -B 1 -O aa3.lt --table <destination path> 

N=4

       ./loboLUT -A ../data/aa2.lt -B ../data/aa2.lt -O aa4.lt --table <destination path> 

Is recomended to use as a destination path the data folder inside victor library


How to see the content of a lookup table the lookUp table is created an a not plain text file, this is why a diferent application is needed to see the content of the table. LoopTablePlot is a c++ program that is used to do this, remember that this table file is create by loboLUT_all / loboLUT .

To see the lookup table aa5.lt (created previously), in this example s option allows to define the size for the output 
./LoopTablePlot -i ../data/aa5.lt  -o PLotoutput -s l 

The output created in the Plotoutput file contains the list of possible loop angles.


How to find the starting and ending position of a loop CreateLoopTestset is a c++ program that finds the starting and ending positions in a PDB file or in many PDB files. Its output could be used to model the loop with LoopModelTest application . To obtain the list of starting and ending points for the files 119L 16PK

./createLoopTestset -o listLoops -i ../samples/filelist 

Content in filelist file

 ../samples/119L 
 ../samples/16PK 

the output will be printed and will be like index1 (-s): 7 index2 (-e) 14 index1 (-s): 48 index2 (-e) 52 index1 (-s): 86 index2 (-e) 89 index1 (-s): 99 index2 (-e) 104 …......... where the -s is the starting position and the -e is the ending position if many pdbs are evaluated, the application will show all the loops for the first pdb listed and then all the loops for the following ones.


How to model a loop

LoopModelTest is a c++ program that allows the creation of multiple possible loops and creates a pdb file for each of them .

This program needs as input the pdb and the start and end position to set the loop To create the loop from a start position X to an end position Y of the chain A of an specific pdb file

	./LoopModelTest -i ../samples/ZZZZ.pdb -c A -s X -e Y

Using the information obtained with the app CreateLoopTestset ./LoopModelTest -i ../samples/119L.pdb -c A -s 7 -e 14 Remember to create the lookup table for a 7 length fragment using ./loboLUT_all -c 7

The new pdbs files fill be created in the working path, and in the printed output will be shown the global RMS, end RMS, bond lenght, bond angle and torsion angle

Printed output Results: 1.35 121 180

 0   global RMS=  0.416   ( 0.366)	end-RMS=  0.234	    1.17     126     175 
 1   global RMS=  0.356   ( 0.295)	end-RMS= 0.0822	    1.38     121    -176 
…....

How to obtain a PDB`s torsion angles Loop2torsion is a c++ program that allows to obtain all the phi and psi angles of all the amino acids in a selected PDB chain .

To obtain the angles a PDB file is needed as input and also the chain should be specified

   ./loop2torsion -i ../samples/2R8O.pdb -c A  

The printed output is the list of the angles and the Bfactor of 1.

  -72.1     157    1.0 
   -165     142    1.0 
    122    -172    1.0 
   -126    98.1    1.0 
  …....


How to Clustering data Using the tor file created in Energy package, ClusterRama, a c++ program clusters the data contained in a Ramachandran distribution file To obtain the clustered data using a cutoff value of 100

   ./ClusterRama -i ../data/tor.par -o outRama -c 100.0 

The output contains the number of the values found the angles values and the corresponding residue 12

-55.07    -44.61   GLY 
 76.11    -172.4   GLY 
-139.2       129   GLY 
…...

How to generate clustered lookup tables

Based in the clustered data the LoopTableTest c++ program generates tables of protein entries for the Lobo algorithm .
   ./LoopTableTest -A 1 -B 1 -O output.lt -R outRama -S s 
To create the Ramachandran input file that contains the clustered data use ClusterRama application. 

The output created is not a plan text file, use the LoopTablePlot application The printed output, includes the corresponding angle values (see figure) Min: EP: -4.126 ED: -1.281 N: -0.9997 MP: -1.582 MD: -0.4919 MN: -0.9949 EP: 2.6 ED: -1.332 N: -1 MP: 1.521 MD: 0.4671 MN: -0.8217 EP: -3.966 ED: -1.289 N: -0.9836 MP: -1.598 MD: -0.7378 MN: -0.5885 Max: EP: 3.437 ED: 1.022 N: 0.6597 MP: 0.9131 MD: 0.5203 MN: 0.8068 EP: 4.856 ED: 0.1761 N: 0.6105 MP: 2.486 MD: 0.9987 MN: 0.6888 EP: 3.592 ED: 1.27 N: 0.9813 MP: 1.307 MD: 0.8342 MN: 0.7185


Entry 0 EP: -2.737 ED: -0.01248 N: -0.02252 MP: -0.8014 MD: 0.2146 MN: 0.6219 EP: 2.699 ED: -1.172 N: 0.5104 MP: 1.879 MD: 0.921 MN: -0.3856 EP: 1.984 ED: -0.6955 N: -0.8596 MP: 1.022 MD: 0.3252 MN: 0.6816


How to generate lookup tables using Ramachandran`s clustered data Based on a lookup table already created with LoboLUT/loboLUT_all and defining a cutoff value. The ClusterLoopTable program allows you to create the new clustered lookuptable. In this example, a cutoff of 10 is set, and it uses the lookup table for a length of 5.

   ./ClusterLoopTable -I ../data/aa5.lt -O ../data/aa5clustered.lt -C 10.0 

The created output is not a plain text file, to see the content use the LoopTablePlot application


How to analyze the backbone geometry of a PDB BackboneAnalyzer is an application that allows to analyze a PDB file in terms of bond lengths and bond angles . As input it uses the PDB file and the chain to evaluate

      ./backboneAnalyzer -i ../samples/2R8O.pdb -c A 

The printed output includes the minimum, maximum, average bonds lengths and angles and the corresponding standard deviations.


Bond Lengths Bond Angles Num N->CA CA->C' C'->N N->CA CA->C' C'->N


Min: 1.4450 1.5019 1.3206 116.87 104.83 112.55 Max: 1.4804 1.5479 4.0701 158.03 118.34 158.56


Avg: 1.4636 1.5272 1.3505 121.58 111.71 116.73 SD: 0.0054 0.0067 0.2074 2.45 2.16 1.98

Energy