CODEHOP:
COnsensus-DEgenerate Hybrid Oligonucleotide Primers

CODEHOP mascot

CODEHOP program

The CODEHOP program designs a pool of primers containing all possible 11- or 12-mers for the 3' degenerate core region and having the most probable nucleotide predicted for each position in the 5' non-degenerate clamp region.

The program consists of the following steps: (note the scheme on the right)
1) A set of blocks is input, where a block is an aligned array of amino acid 
sequence segments without gaps that represents a highly conserved region of 
homologous proteins. A weight is provided for each sequence segment, which can be 
increased to favor the contribution of selected sequences in designing the primer. 
A codon usage table is chosen for the target genome.

2) An amino acid position-specific scoring matrix (PSSM) is computed for each block using the odds ratio method.

3) A consensus amino acid residue is selected for each position of the block as the highest scoring amino acid in the matrix.

4) For each position of the block, the most common codon corresponding to the amino acid chosen in step 3 is selected utilizing the user-selected codon usage table. This selection is used for the default 5' consensus clamp in step 8.

5) A DNA PSSM is calculated from the amino acid matrix (step 2), genetic code table and codon usage table. The DNA matrix has three positions for each position of the amino acid matrix. The score for each amino acid is divided among its codons in proportion to their relative weights from the codon usage table, and the scores for each of the four different nucleotides are combined in each DNA matrix position. Nucleotide positions are treated independently when the scores are combined. As an option, the highest scoring nucleotide residue from each position can replace the most common codons from step 4 that are used in the consensus clamp.

6) The degeneracy is determined at each position of the DNA matrix based on the number of bases found there. As an option, a weight threshold can be specified such that bases that contribute less than a minumum weight are ignored in determining degeneracy.

7) Possible degenerate core regions are identified by scanning the DNA matrix in the 3' to 5' direction. A core region must start on an invariant 3' nucleotide position, have length of 11 or 12 positions ending on a codon boundary, and have a maximum degeneracy of 128 (current default). The degeneracy of a region is the product of the number of possible bases in each position.

8) Candidate degenerate core regions are extended by addition of a 5' consensus clamp from step 4 or 5. The length of the clamp is controlled by a melting point temperature calculation (current default = 60o) and is usually about 20 nucleotides.

9) Steps 7 and 8 are repeated on the reverse complement of the DNA matrix from step 5 for primers corresponding to the opposite DNA strand.

           CODEHOP program scheme

1) input   -  -  -  -  -  -  -  -  -  -  -  -  -  -  -   seq 1  Protein sequence block
           -  -  -  -  -  -  -  -  -  -  -  -  -  -  -   seq 2
           -  -  -  -  -  -  -  -  -  -  -  -  -  -  -   seq 3
           -  -  -  -  -  -  -  -  -  -  -  -  -  -  -   seq 4
           -  -  -  -  -  -  -  -  -  -  -  -  -  -  -   seq 5
           -  -  -  -  -  -  -  -  -  -  -  -  -  -  -   etc.

  |
  | 2) transformation to AA PSSM
  V
           |  |  |  |  |  |  |  |  |  |  |  |  |  |  |   Ala    AA PSSM
           |  |  |  |  |  |  |  |  |  |  |  |  |  |  |   Cys
           |  |  |  |  |  |  |  |  |  |  |  |  |  |  |   Asp
           |  |  |  |  |  |  |  |  |  |  |  |  |  |  |   Glu
           |  |  |  |  |  |  |  |  |  |  |  |  |  |  |   Phe
           |  |  |  |  |  |  |  |  |  |  |  |  |  |  |   Gly
           |  |  |  |  |  |  |  |  |  |  |  |  |  |  |   His
           |  |  |  |  |  |  |  |  |  |  |  |  |  |  |   Ile
           |  |  |  |  |  |  |  |  |  |  |  |  |  |  |   Lys
           |  |  |  |  |  |  |  |  |  |  |  |  |  |  |   etc.

     |  |
     |  | 3) calculation of AA consensus sequence
     |  V
     |     -  -  -  -  -  -  -  -  -  -  -  -  -  -  -          AA consensus sequence
     |
     |  |
     |  | 4) transformation to DNA consensus sequence
     |  V
     |     -------------------------------------------          DNA consensus sequence
 |   |
 |   | 5) back-translation to DNA PSSM
 |   V
 |         |||||||||||||||||||||||||||||||||||||||||||   A      DNA PSSM
 |         |||||||||||||||||||||||||||||||||||||||||||   C
 |         |||||||||||||||||||||||||||||||||||||||||||   G
 |         |||||||||||||||||||||||||||||||||||||||||||   T
 | | |
 | | | 6) calculation of degeneracies
 | | V
 | |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~          position degeneracy values
 | |
 | | 7) identify degenerate regions ("===")
 | |
 | 8) identify consensus regions for degenerate regions ("---")
 | |
 V V
                   5'  -------====           3'                 CODEHOP primers
   output          3'         ====---------  5'


[Blocks home] [CODEHOP] [Getting started] [CODEHOP help]
Contact us

Page last modified Feb 2001