Quick Help and References

Description
Prediction of Amyloid STructure Aggregation 2.0 (PASTA 2.0) is a web server for the analysis of amino acid sequences. It predicts which portions of a given input sequence are more likely to stabilize the cross-beta core of fibrillar aggregates. There are many novel features over the previous PASTA.

E-Mail address
This is not a requirement but is useful for communication of errors if they occur. In addition, for large longer jobs notification by email may be a useful completion flag.

Name of sequence
An optional title for your submission. This will appear in the header of the output. We suggest you select one, especially if sending multiple queries, as they may be completed in a different order.

Sequence
This is the amino acid sequence to analyze. Do not use letters representing invalid amino acid codes.
  • The sequence can be simply added as follows:
    DAEFRHDSGYEVHHQKLVFFAEDVGSNKGAIIGLMVGGVV
    
  • Fasta format is preferable:
     
    >human_amyloid
    DAEFRHDSGYEVHHQKLVFFAEDVGSNKGAIIGLMVGGVV
    
  • Multiple sequences are supported but must be in fasta format in order to separate them:
     
    >human_amyloid
    DAEFRHDSGYEVHHQKLVFFAEDVGSNKGAIIGLMVGGVV
    >HETs_prion
    MKIDAIVGRNSAKDIRTEERARVQLGNVVTAAALHGGIRISDQTTNSVETVVGKGESRVLIGNEYGGKGFWDNHHHHHH
    >hexa_amyloid_beta
    KLVFFA
    
Mutation mode
If the tab Mutate is selected then only one sequence must be entered. In addition, it is important to specify the mutation by clicking the Mutate residue button. This can be done in two ways:
  1. By providing uppercase character at the position. For example suppose we want to mutate Aspartic acid (d) at position 23 to Lysine (K)

    an uppercase K should be provided in that position as follows:

    It is important to note that multiple point mutations can be provided. Also the wildcard operator * can be used to mutate a position to its 19 other amino acids.

  2. The second method is easier for many mutations. By entering multiple mutations of the form XiY where X is the wild type, i the position in sequence and Y is the muntant. For example:
     
    A2W
    V18A
    D23K
    M35P
    
    Multiple point mutations can be provided. Also the wildcard operator Xi* can be used to mutate a position to its 19 other amino acids.
Protein-protein
If this tab is clicked the user can select one-against-all or all-against-all pairings of the input sequences. This is useful for analyzing aggregation in hetero-dimers.
all-against-all
For example, if the following sequences are enetered:
 
>human_amyloid
DAEFRHDSGYEVHHQKLVFFAEDVGSNKGAIIGLMVGGVV
>HETs_prion
MKIDAIVGRNSAKDIRTEERARVQLGNVVTAAALHGGIRISDQTTNSVETVVGKGESRVLIGNEYGGKGFWDNHHHHHH
>hexa_amyloid_beta
KLVFFA
on an all-against-all basis the co-aggregation will be calculated for the following protein-protein pairs:
human_amyloid		HETs_prion
hexa_amyloid_beta	human_amyloid
hexa_amyloid_beta	HETs_prion
one-against-all
If one-against-all is selected then the user must enter the main fasta ID to pair. For example, if the following sequences and Main fasta ID is HETs_prion:
 
>human_amyloid
DAEFRHDSGYEVHHQKLVFFAEDVGSNKGAIIGLMVGGVV
>HETs_prion
MKIDAIVGRNSAKDIRTEERARVQLGNVVTAAALHGGIRISDQTTNSVETVVGKGESRVLIGNEYGGKGFWDNHHHHHH
>hexa_amyloid_beta
KLVFFA


Main fasta ID: HETs_prion
The following protein-protein pairs will be calculated:
HETs_prion	spP37840
HETs_prion	human_amyloid

Obviously, for protein-protein mode there should be >1 sequences entered.


Options

Picking an energy cut-off and the top pairings
There are two options which allow the user to alter the specificity and sensitivity of the predictions. These are the Top pairing energies and the Energy threshold:
We have defined 4 modes for selecting the top and energy parameters.
  • Custom allows the user to select their preferred top and energy threshold.

  • Peptides: In this case the nature of the peptide detection forces top=1 and the energy parameter was optimized. This optimization of the energy parameter set it to -5.0 PEU at 95% specificity. At this specificity PASTA2 can achieve 43.6%. The figure below shows the benchmarking:

    Note: the custom mode also uses the above figure to guide the user towards a measured sensitivity and specificity.

  • Regions (90% specificity) were benchmarked and parameters set at 90% specificity. We used the Reg33 set for this optimization. The parameters producing 90% specificity are top=22 and energy <2.8 PEU. At this specificty PASTA2 produces 30.24 sensitivity.

  • Regions (85% specificity) were benchmarked and parameters set at 85% specificity. We used the Reg33 set for this optimization. The parameters producing 85% specificity were top=44 and energy <1.4 PEU. At this specificty PASTA2 produces 40.87 sensitivity.

  • Note: if the user tweaks the latter 3 optimized parameter definitions the prediction mode will become custom once again.
Large-scale
If you have many proteins to analyze it is recommended turn on this option. The limits are explained on a grey box under each input option. The server will not allow you to go beyond these limits. The reason for this option is two-fold. (i) Much of the output is in the form of PDF and PNG graphs. However, these take a little time to generate. Thus, this option will stop the generation of graphics and re-organize the output accordingly. (ii) In addition other factors such as parallelization will speed up the execution of jobs.



Output


Main page
Self aggregation

The user is first presented with the main PASTA summary page:

Self-aggregating sequences can be accessed by a links in the section titled Self aggregation. The following global features are presented:
  1. length: Protein length
  2. #amyloids: The number of amyloid fibril regions predicted by PASTA
  3. best energy: the top ranked amyloid region by energy. This column is sorted in order to quickly see the more amyloidogenic sequences.
  4. % disorder: the percentage of residues predicted to be in a disordered state
  5. % α-helix: the percentage of residues predicted to be in a helix state
  6. % α-strand: the percentage of residues predicted to be in a strand state
  7. % α-coil: the percentage of residues predicted to be in a coil state

Mutations

In addition, if Mutate was selected in the input then the wild type protein will reserve its name while mutants will be appended with "_XiY"" where X is the wild type, i the position in sequence and Y is the mutant. For example a wild type sequence with fasta header "human_amyloid" can be mutated at H6P, D23K and M35V to produce the following output:

Notice the increase in energy for mutant M35V.

Co-aggregation

If Protein-protein is clicked in the input page then pairing links will be presented for the Co- aggregation section as follows:

Note: only 5 sequences were entered in the above example which resulted in 10 unique pairings (5x5 all pairings - 5 self pairings)/(2 symmetric) = 10.


All predictions can be downloaded in an archive. Each archive has its own README file describing each file. Next, the output for each protein which can be accessed individually by links is described.


Self-aggregating individual pages
More detail can be accessed by clicking the main output page links. Each page is organized as follows:
Each [open] can clicked to revelal further information or [close] to hide information .

The following briefly describes each:
  1. Available files for further processing. It is important that the user can access the raw data for their own maniupulation. The following data is available from PASTA2:
      Sequence profile files
    • Your Input parameters: a basic summary of the user selected server options.
    • Fasta sequences: the protein sequence annotated with amino acid, aggregation, intrinsic disorder and secondary structure in that order.
    • Aggregation profile: the probability of fibril formation at each residue
    • Aggregation free energy: the free energy of fibril formation at each residue (measured in PASTA Energy Units (PEU) = 1.192 Kcal/mol)
    • Disorder prediction: the probability of a residue to be disordered
    • Secondary structure: The probability of a residue to be in states: Helix, strand or coil.
    • Pairing files
    • Probability matrix: The probability of aggregation formation at residue i and j.
    • Probability free energy: The free energy of aggregation formation at residue i and j (measured in PASTA Energy Units (PEU) = 1.192 Kcal/mol).
    • best pairings: A list of the top X ranked aggregation forming regions where X is choosen in the input (see options).


  2. Residue assignment. Each residue is assigned to a particular state as follows:

    In addition, the best X pairing residues are ranked and colored according to the user selected energy threshold and number of pairings (see options). In the example above the top 10 pairing energies were all above the energy threshold.

  3. Probability graphs. The probability of aggregation formation is an important scale. We utilize this information in three graphical forms:
    • The probability of residue k pairing with residue m:
    • The aggregation probability at residue k compared to disorder probability at residue k (only in Regular self-aggregation mode):
    • The aggregation probability at residue k compared to secondary structure probability at residue k (only in Regular self-aggregation mode):

      There are three graphs for secondary structure for Helix, strand and coil.

  4. Free energy graphs. There are two graphs here showing the aggregation potential free energy (1 PASTA unit = 1.192 Kcal/mol). Although these are related to probability they give a more biochemical meaning to the potential. In addition, the scale is larger and it can be compared to the energy threshold which the user selected. Two examples are shown below:

    • The free energy potential of aggregation formation, the green line corresponds to the user selected energy threshold:
      Because of the wider range of values for the free energy it is used for comparing mutations.

    • The free energy of residue k pairing with residue m:
it is important to note that all the data for the graphs are available in Available files section or beside the graphs themselves




Co-aggregating individual pages
The layout here is identical to the self aggregating output page. However, in many ways it is more complex because it has to report graphics and files for two sequences rather than one. They should be easy to grasp once the user understands the self aggregating output page explained above.


References

If you use the server in work leading to publications.
  • Please cite:
    Ian Walsh, Flavio Seno, Silvio C.E. Tosatto and Antonio Trovato.
    PASTA2: An improved server for protein aggregation prediction
    Nucleic Acids Research, accepted. (2014)