BioComputing  

    Test Set Generator

Version 1.2


Quick Help Examples References Methods Precompiled sets

Last update:   21/DEC/2009,    using:   CATH v3.3.0

TESE is a web server that can be used to derive curated sets of protein structures to be used in a variety of situations. The most typical is to construct representative non-redundant sets of protein sequences and/or structures for the benchmarking of novel methods. For a more detailed description, see the Quick Help page. The server has been designed to make the selection process as easy as possible. It currently offers three different search modes to initiate the data collection process:

Search Mode:
  Query   PDB Ensemble Key word

The Query mode allows the user to select structural and quality filters to generate a test set at any given level of residual sequence and structural similarity using the CATH structural classification scheme. CATH defines several levels of structural as well as common sequence similarity thresholds (e.g. less than 35% sequence identity). Additionally, the server implements several quality checks for the structures to be included, e.g. X-ray resolution and R-free cutoffs. The data can be visualized either interactively, for smaller sets, or simply downloaded.

The PDB Ensemble mode can be used to seed the structural search form a limited number of PDB codes of proteins sharing the desired structural and/or sequence features. The server will then present a list of CATH codes from which to choose the desired set in analogy to the Query mode. This can be useful if the intention is to extend a previously published test set.

The Key word mode initiates a structural search from key words contained in the header and compound records of all PDB structures. A list of matching proteins is then presented and can be manipulated in analogy to the Query mode. This can be useful if the user has no specific idea about the PDB codes of relevant proteins or their structural classification.


SEARCH BY QUERY
QUERY
WIZARD



    

VISUALIZATION


Interactive view whith images of protein domains, for visual validation.


Interactive view whithout images of protein domains.


Non interactive version: Download the raw list, FASTA and PDB of protein domains. For large datasets.


SELECT BY CATH HIERARCHY CODE


INCLUDE BY CATH DEFINITION
 

EXCLUDE BY CATH DEFINITION
 



REDUNDANCY REDUCTION

 Random selection





QUALITY CONTROL


OTHER PARAMETERS (size, exp. type, ...)





SEARCH FROM A PDB ENSEMBLE
Structural correlated proteins
Explanation



    
Insert a valid PDB code (or CATH domain code) per line.



SEARCH BY KEY WORD
Key Words
Explanation





    
Key words on the same line are joined with AND.
Different lines are read as boolean OR.

The filter level serves to limit the initial choice by CATH structural similarity.




(c)   Silvio Tosatto & Francesco G. Sirocco for   Biocomputing UP,    03 / 2008