Difference between revisions of "Tutorial"

From victor
Jump to: navigation, search
Line 47: Line 47:
 
There are 3 different ways in Victor to get the secondary structure. The first (innacurate) is just parsing the '''HELIX''' and '''SHEET''' fields in the PDB file. The second method is to infer the secondary structure from '''torsional angles'''. The last choice is to use an implementation of the '''DSSP algorithm''', consider that you can find little (negligible) differences compared to the original algorithm but it is the most accurate way to calculate the secondary structure.
 
There are 3 different ways in Victor to get the secondary structure. The first (innacurate) is just parsing the '''HELIX''' and '''SHEET''' fields in the PDB file. The second method is to infer the secondary structure from '''torsional angles'''. The last choice is to use an implementation of the '''DSSP algorithm''', consider that you can find little (negligible) differences compared to the original algorithm but it is the most accurate way to calculate the secondary structure.
  
 +
 +
[[Energy]]
  
 
[[Lobo]]
 
[[Lobo]]
 
= Energy =
 
 
 
==How to obtain the solvation potential==
 
pdb2solv is an application that creates a file containing all the frequencies of occurrence of residue a with burial r, that are needed to derived the solvation potentials for all the amino acids in the given PDB.
 
A solvation potential for an amino acid residue a is defined as:
 
=RTln(fa(r)/f(r))
 
where r is the degree of residue burial,fa (r) is the frequency of occurrence of residue a with burial r
 
and f(r) is the frequency of occurrence of all residues with burial r.
 
 
The degree of burial for a residue is defined as the number of other Cβ atoms located within 10 Å(non polar)/ 7 Å (polar)of the residue’s Cβ atom.
 
 
As input a PDB is needed
 
 
The output will depend on the given options
 
Output considering 30 maximum binds possible (by default test.out, use -o option to set a name)
 
 
Non polar option
 
---------------------------------------------------------------------------------------------------------------
 
total quantity of residues evaluated | AA type(3L) | frecuencies
 
---------------------------------------------------------------------------------------------------------------
 
 
Polar option
 
---------------------------------------------------------------------------------------------------------------
 
P |  total quantity of residues evaluated | AA type(3L) | frecuencies | Polar frecuency |
 
---------------------------------------------------------------------------------------------------------------
 
 
 
To obtain the solv.par file used for  pdb2energy, frst, etc applications you need to use the following line all the pdbs in the TOP500H database.
 
./pdb2solv -i ../samples/119L.pdb
 
 
 
More reference "Victor/Frst function for model quality Estimation" GenTHREADER: an efficient and reliable protein fold recognition method for genomic sequences1 David T. Jones
 
The TOP500H database was used to create the file (solv.par)
 
 
==How to obtain the torsion angles from the PDB residues==
 
The application pdb2tor obtains the set of angle for each residue. As input it uses a PDB file and the corresponding chain, or a file with the PDB ids which can include the chain, if a chain is not included the application uses the first found chain.
 
 
Structure of the pdb filelist
 
Uses the first chain for each pdb
 
PDBID
 
PDBID
 
PDBID
 
 
To use the corresponding chain for each pdb, need to use the --complete option
 
PDBID(complete name of the corresponding file) chain
 
PDBID chain
 
PDBID chain
 
if many chains from the same pdb are input, just repeat the PDBid and use a different chain
 
 
This application can be used also to generate the tor.par file used for TAP application. To generate it you need to use the following line with the TOP500H database.
 
./pdb2tor -i ../samples/filelist -A -r
 
 
 
OUTPUT FORMAT (default option)
 
---------------------------------------------------------------------------------------------------------------
 
AA Type(one letter format)  | Number | pre-phi | pre-psi | phi | psi | omega |  chi1  | chi2
 
---------------------------------------------------------------------------------------------------------------
 
!Total file analized: Number of files analized
 
 
 
OUTPUT FORMAT (using -r option)
 
---------------------------------------------------------------------------------------------------------------
 
Numbers of lines in the file
 
---------------------------------------------------------------------------------------------------------------
 
phi | psi | AA type | pre phi | pre psi | omega | #carbons | chi1 | chi2
 
---------------------------------------------------------------------------------------------------------------
 
 
 
==How to obtain normalized energy from a PDB==
 
The application pdb2torenergy calculates a pseudo-energy to evaluate the quality  of a given protein structural model, as expressed in a single (real) number.This program allows you to obtain the normalized energy mentioned in TAP paper.
 
 
INPUT DEFAULT DATA
 
tor.par , created by pdb2tor using TOP500H database
 
 
To calculate the normalized energy multiple PDBs and PDB chain(s) can be used
 
 
Output
 
Depending of the options the energy can be calculated for all the chain residues or for each of them.
 
Per residue, one energy for each of the residues in the pdb
 
./pdb2torenergy -i ../samples/119L.pdb --allchains -p
 
 
Per pdb(one energy value)
 
./pdb2torenergy -i ../samples/119L.pdb --allchains
 
 
For chain A in each model each model(many energy values as models in the pdb file)
 
./pdb2torenergy -i ../samples/1IHQ.pdb -c A
 
 
 
 
==How to obtain FRST value from a PDB==
 
The application frst allows to calculates the frst value using solvation potential, torsion angles, rapfdf . To use this application some input files are needed. All this mentioned files can be generated using another energy/lobo applications or you can use the already generated ones saved in the victor2.0/data folder. 
 
 
Default Input files
 
tor.par, created by pdb2tor using TOP500H database
 
solv.par created by pdb2solv using TOP500H database
 
ram.par
 
 
 
Output
 
The application prints the value of first for the given pdb
 
if use the option -v it will print also the values of  Rapdf energy ,Solvation energy ,Mainchain hydrogen bonds  ,Torsion energy .
 
 
To calculate the average over a chain in a NMR ensemble
 
./frst -i ../samples/16PK.pdb 
 
To calculate the average over many pdb files
 
./frst -I ../samples/filelist
 
 
==How to obtain TAP value from a PDB==
 
The pdb2tap application allows to evaluate the quality of a model, using TAP method (). Used for the evaluation of the quality of protein models determined by X-ray crystalography. The method is based on a relative pseudo-energy calculated from the side chain torsion angle propensities and the backbone, both then are normalized against the global minimum and maximum for the protein sequence under consideration.
 
 
Methods
 
Torsion angle potential (based on frst)
 
Pseudo Energy i, maximum and minimum
 
 
TAP = (E-Emin)/(Emax-Emin)
 
Known as normalized torsion angle propensity, gives a indication of the degree of nativeness of the protein model.
 
 
INICIAL DEFAULT DATA (can be created with pdb2tor)
 
The file tor.par is used to calculate the TAP value, this file can be created with the pdb2tor application,      and by default is created using the TOP500H database.
 
tor.par: file containing all torsion angles availabe from TOP500H database.
 
For more reference see:
 
For the database
 
TOP500H is the list of 500 proteins used for the Ramachandran-plot distributions, with    File ID {PDB code + chainID (if not the full PDB file) + H (to signify H's added), structure factor deposition status, resolution, and protein name. 500High resolution xRay resolved to 1.8 A or more and less than 60%seq ident. 609NMR structures(9578 models)http://kinemage.biochem.duke.edu/databases/top500.php
 
For method
 
Fine-grained statistical torsion angle potentials are effective in discriminating native protein structures.
 
PMID: 16712465    [PubMed - indexed for MEDLINE]
 
 
Output format
 
A plain text file containing:
 
Numbers of lines in the file
 
---------------------------------------------------------------------------------------------------------------
 
phi | psi | AA type | pre phi | pre psi | omega | #carbons | chi1 | chi2
 
---------------------------------------------------------------------------------------------------------------
 
 
 
Output interpretation
 
Value close to 1 for a native structure
 
Value close to 0 for a largely incompatible sequence.
 
 
 
Input data
 
the aplication can be used with one or many PDBs and PDB chains.
 
Single structure Xray using one chain
 
./pdb2tap -i ../samples/102M.pdb -c A 
 
as shown in http://www.biomedcentral.com/content/supplementary/1471-2105-8-155-s1.txt
 
 
Output: Prints the tap value
 
Single structure Xray using all chains(all chains in pdb)
 
./pdb2tap -i ../samples/1A3W.pdb -P sal  --allchains as shown in
 
http://www.biomedcentral.com/content/supplementary/1471-2105-8-155-s1.txt
 
 
Output: Prints the tap value average value for all chains
 
 
Multiple models NMR using one chain
 
./pdb2tap -i ../samples/1IHQ.pdb -P sal -c A --nmr
 
Output: Prints the tap value  for the selected for each model, the average tap value for all models, standard Deviation, minimum and maximum tap value
 
 
Multiple models NMR using all chains(all chains in pdb)
 
./pdb2tap -i ../samples/1IHQ.pdb -P sal --allchains --nmr
 
Output: Prints the tap value  average value for all chains in each model, the average tap value for all models, standard Deviation, minimum and maximum tap value
 

Revision as of 10:57, 3 July 2014

Biopool

The Biopool class implementation follows the composite design pattern and for a complete description of the class hierarchy we reccomend to see the [Doxygen documentation]. Whitout going into implementation details a Protein object is just a container for vectors representing chains. Each vector has 2 elements: the Spacer and the Ligand Set. The Spacer is the container for AminoAcid objects whereas the LigandSet is a container for all other melecules and ions, including DNA/RNA chains. Ultimately all molecules, both in the Spacer and in the LigandSet are collections of Atom objects. The main feature in Biopool is that each AminoAcid object in the Spacer is connected to its neighbours by menas of one rotational vector plus one translational vector. This implementation make ease the modification of the protein structure and lot of functions were implemented to modify/perturbate/transformate the residue relative position in an efficent way. Rotation and Translation vectors:


The object representation look like that:

immagine:SchemeProteinclass.jpg


Victor includes different packages: Biopool, Lobo and Energy. Every package is identified by a direcotry, starting with a capital letter, in the main Victor path. Inside each package you will find the Source folder containing the classes code and the APPS directory including useful utilities. In the main Victor path you will find the bin directory containing most important porgrams simply copied from the APPS folders. In the main path you should also find the data folder containing symbolic links to data files used by singular packages.


Parsing a PDB file (PdbLoader)

Biopool uses the PdbLoader class to load PDB files. By default it loads all standard residues and hetero atoms excluding nucleotides and water molecules. When possible it also tries to place hydrogen atoms to every amino acid included in the spacer and determine the secondary structure with the DSSP algorithm. The simplest way to load a PDB into a Protein object is:

  1.   #include <PdbLoader.h>
  2.   #include <Protein.h>
  3.   #include <iostream>
  4.  
  5.   int main( int argc, char* argv[] ) {
  6.  
  7.      string inputFile = "MyPdbFile.pdb";
  8.      ifstream inFile( inputFile.c_str() );
  9.      PdbLoader pl(inFile);    // creates the PdbLoader object
  10.  
  11.      Protein prot;            
  12.      prot.load( pl );         // creates the Protein object
  13.   }

Modify the structure

Add hydrogen atoms

Get the secondary structure

There are 3 different ways in Victor to get the secondary structure. The first (innacurate) is just parsing the HELIX and SHEET fields in the PDB file. The second method is to infer the secondary structure from torsional angles. The last choice is to use an implementation of the DSSP algorithm, consider that you can find little (negligible) differences compared to the original algorithm but it is the most accurate way to calculate the secondary structure.


Energy

Lobo