The Victor2.0 library (Virtual Construction Toolkit for Proteins) is an open-source project dedicated to providing a C++ implementation of tools for analyzing and manipulating protein structures.
Proteins are particularly suited to machine learning (ML) due to the wealth of available sequence and structural information. This data availability is a result of the recent advances in next generation sequencing technologies and in vitro determination of structures deposited in the Protein Data Bank (PDB). Moreover, nature conserves the same structures and functions thus allowing pattern matching ML approaches to spreadsheet software. However, representing the proteins is a tricky issue, extracting the relevant data became complicated by the protein representations. Thus the first stage of any protein ML approach is complicated by the need to software engineer the data extraction (e.g. extracting residue-residue contacts from a PDB structure). Our lab has recently developed VICTOR which is an easy to use C++ library for extracting relevant protein features. We show that with a simple wrapper VICTOR can be easily incorporated into the ML package WEKA, opening a rich set of ML algorithms to the world of proteins. An interesting example is also shown by clustering the Torsion Angle Potentials (TAPs) of 40,000 protein structures.
Victor is composed of three main modules:
- Biopool - Biopolymer Object Oriented Library. The core library that generates the protein object and provides useful methods to manipulate the structure.
- Energy - Energy functions implementation.
- Lobo - LOop Build-up and Optimization.
This Wiki will help you to discover how to use the Victor package through an example driven approach. We believe this is the easiest way to get confident with the Victor library. For a detailed description of all classes and methods please visit the Doxygen documentation Victor2.0 complete guide.