# Difference between revisions of "Introduction"

m (Damiano moved page Introduction to Victor package to Introduction) |
|||

Line 1: | Line 1: | ||

The Victor2.0 library (Virtual Construction Toolkit for Proteins) is an open-source project dedicated to providing a C++ implementation of tools for analyzing and manipulating protein structures. | The Victor2.0 library (Virtual Construction Toolkit for Proteins) is an open-source project dedicated to providing a C++ implementation of tools for analyzing and manipulating protein structures. | ||

− | |||

Victor is composed of three main modules: | Victor is composed of three main modules: | ||

− | + | * [[Biopool]] ('''BIOP'''olymer '''O'''bject '''O'''riented '''L'''ibrary) - The core library that generates the protein object and provides useful methods to manipulate the structure. | |

− | + | ||

− | + | * [[Energy]] - A library to calculate statistical potentials from protein structures. | |

+ | |||

+ | * [[Lobo]] ('''LO'''op '''B'''uild-up and '''O'''ptimization) - Ab-intio prediction of missing loop conformation in protein models. | ||

+ | |||

== Biopolymer Object Oriented Library (Biopool) == | == Biopolymer Object Oriented Library (Biopool) == |

## Revision as of 17:57, 24 July 2014

The Victor2.0 library (Virtual Construction Toolkit for Proteins) is an open-source project dedicated to providing a C++ implementation of tools for analyzing and manipulating protein structures. Victor is composed of three main modules:

- Biopool (
**BIOP**olymer**O**bject**O**riented**L**ibrary) - The core library that generates the protein object and provides useful methods to manipulate the structure.

- Energy - A library to calculate statistical potentials from protein structures.

- Lobo (
**LO**op**B**uild-up and**O**ptimization) - Ab-intio prediction of missing loop conformation in protein models.

## Biopolymer Object Oriented Library (Biopool)

The **Biopool** class implementation follows the composite design pattern and for a complete description of the class hierarchy we recommend to see the [Doxygen documentation]. Without going into implementation details a **Protein** object is just a container for vectors representing **chains**. Each vector has 2 elements: the **Spacer** and the **Ligand Set**. The Spacer is the container for **AminoAcid** objects whereas the LigandSet is a container for all other **molecules** and **ions**, including DNA/RNA chains. Ultimately all molecules, both in the Spacer and in the LigandSet are collections of **Atom** objects. The main feature in Biopool is that each AminoAcid object in the Spacer is connected to its neighbours by means of one rotational vector plus one translational vector.

This implementation make easy the modification of the protein structure and lot of functions were implemented to modify/perturbate/transformate the residue relative position in an efficient way, **rotation and Translation vectors**.

For more detail on how to use energy look Biopool

## Energy functions implementation

**Energy** functions are used in a variety of roles in **protein modelling**. An energy function precise enough to always discriminate the native protein structure from all possible decoys would not only simplify the protein structure prediction problem considerably. It would also increase our understanding of the **protein folding process** itself.
If feasible, one would like to use quantum mechanical models, being the most detailed representation, to calculate the energy of a protein. It can theoretically be done by solving the **Schrödinger** equation. This equation can be solved exactly for the hydrogen atom, but is no longer trivial for three or more particles. In recent years it has become possible to approximately solve the Schrödinger equation for systems up to hundred atoms with the **Hartree-Fock** or self-consistent field approximations. Their main idea is that the many-body interactions are reduced to several two-body interactions.

Energy functions are important to all aspects of protein structure prediction, as they give a **measure of confidence for optimization**. An ideal energy function would also explain the process of protein folding. The most detailed way to calculate energies are **quantum mechanical** methods. These are, to date, still overly time consuming and impractical. Two alternative classes of functions have been developed: **force fields** and **knowledge-based potentials**.

Force fields (e.g. AMBER) are empirical models approximating the energy of a protein with **bonded and non-bonded interactions**, attempting to
describe all contributions to the total energy. They tend to be very detailed and are prone to yield many erroneous local minima.
An alternative are knowledge-based potentials (e.g. [78]), where the “energy” is derived from the probability of a structure being similar to interaction patterns found in the database of known structures. This approach is very popular for **fold recognition**, as it produces a smoother “global” energy surface, allowing the detection of a general trend. Abstraction levels for knowledge-based potentials vary greatly, and several functional forms have been proposed.

iNDLUDE PAGE 102

The **energy functions** presented in the package allow to optimize procedures. The main feature is its applicability in the context of the **protein** classes implemented in the package. It should be possible to invoke the energy calculation with any structure from all programs. At the same time the parameters of the energy models had to be stored externally to allow their rapid modification. With this considerations in mind, the package Energy was designed to collect the classes and programs dealing with energy calculation. The main design decision was to use the “strategy” design pattern from Gamma et al. The abstract class Potential was defined to provide a common interface for energy calculation. It contains the necessary methods to load the energy parameters during initialization of an object. Computing the energy value for objects of the **Atom** and **Spacer** classes as well as a combination of both is allowed.

For more detail on how to use energy look Energy

## LOop Build-up and Optimization (Lobo)

Current database methods using solely experimentally determined loop fragments do not cover all possible **loop conformations**, especially for longer fragments. On the other hand it is not feasible to use a combinatorial search of all possible **torsion angle** combinations.
For an **algorithm** to be efficient, a compromise has to be found. One improvement in **ab initio** loop modelling is the use of **look-up tables**(LUT) to avoid the repetitive calculation of loop fragments. **LUTs** can be generated once and stored, only requiring loading during loop modelling. Using a set of LUTs reduces the computational time significantly.
The next problem is how to best explore the **conformational space**. Especially for longer loops, it is useful to generate a set of different candidate loops to exclude improbable ones by ranking. The method should therefore be able to select different loops by **global exploration** of the conformational space independently of starting conditions. Methods building the loop stepwise from one anchor residue to the other bias the solutions depending on choices made in conformation of the first few residues. Rather a **global approach** to the **optimization** is required.
This criterion is fulfilled by the **divide & conquer algorithm**, which is recursively described by the following steps:

1. if start = end, compute result; 2. else use algorithm for: (a) start to end/2 (b) end/2 to end 3. combine the partial solutions into the full result.

Applied to loop modelling, the basic idea of a divide & conquer approach is to divide the loop into two segments of half the original length choosing a good central position, as shown:

The segments can be **recursively divided and transformed**, until the problem is small enough to be solved analytically (conquered). The positions of **main-chain atoms** for segments of a single amino acid can be calculated analytically, using the vector representation. Longer loop segments can be stored in **LUTs** and their coordinates extracted by geometrically transforming the **coordinates** for single amino acids back into the context of the initial problem. To this end we need to define an unambiguous way to represent the conformation of any given residue along the chain and a set of operations to concatenate and decompose loop segments.

For more detail on how to use Lobo look Lobo