wwpdb
PDB FORMAT Version 2.3
Main Index
DBREF
SEQADV
SEQRES
MODRES

Primary Structure Section

The primary structure section of a PDB file contains the sequence of residues in each chain of the macromolecule. Embedded in these records are chain identifiers and sequence numbers that allow other records to link into the sequence.


DBREF

Overview

The DBREF record provides cross-reference links between PDB sequences and the corresponding database entry or entries.

Record Format

COLUMNS       DATA TYPE          FIELD          DEFINITION
----------------------------------------------------------------
 1 - 6        Record name        "DBREF "
 8 - 11       IDcode             idCode         ID code of this entry.
13            Character          chainID        Chain identifier.
15 - 18       Integer            seqBegin       Initial sequence number 
                                                of the PDB sequence segment.
19            AChar              insertBegin    Initial insertion code 
                                                of the PDB sequence segment.
21 - 24       Integer            seqEnd         Ending sequence number 
                                                of the PDB sequence segment.
25            AChar              insertEnd      Ending insertion code 
                                                of the PDB sequence segment.
27 - 32       LString            database       Sequence database name. 
34 - 41       LString            dbAccession    Sequence database accession code.
43 - 54      LString            dbIdCode        Sequence database 
                                                identification code.
56 - 60      Integer            dbseqBegin      Initial sequence number of the
                                                database seqment.
61           AChar              idbnsBeg        Insertion code of initial residue
                                                of the segment, if PDB is the
                                                reference.
63 - 67      Integer            dbseqEnd        Ending sequence number of the
                                                database segment.
68           AChar              dbinsEnd        Insertion code of the ending
                                                residue of the segment, if PDB is
                                                the reference.

Details

  • PDB entries contain multi-chain molecules with sequences that may be wild type, variant, or synthetic. Sequences may also have been modified through site-directed mutagenesis experiments (engineered). A number of PDB entries report structures of domains cleaved from larger molecules.
  • The DBREF record was designed to account for these differences by providing explicit correlations between sequences as given in the SEQRES records and the sequence database entry. Several cases are easily represented by means of pointers between the databases using DBREF. PDB entries containing heteropolymers are linked to different sequence database entries.
  • Database names and their abbreviations as used on DBREF records.
  •     Database name                         database 
                                         (code in columns 27 - 32)
        ----------------------------------------------------------
        GenBank                               GB
        Protein Data Bank                     PDB
        Protein Identification Resource       PIR
        SWISS-PROT                            SWS
        TREMBL                                TREMBL
        UNIPROT                               UNP
    

  • DBREF records present sequence correlations between PDB SEQRES records and corresponding PIR, GenBank, or SWISS-PROT, etc. entries.
  • PDB does not guarantee that all possible references to the listed databases will be provided. In most cases, only one reference to a sequence database will be provided.
  • If no reference is found in the sequence databases, then the PDB entry itself can be given as the reference.
  • Selection of the appropriate sequence database entry or entries to be linked to a PDB entry is done on the basis of the sequence and its biological source. Questions on entry assignment that may arise are resolved by consultation with database staff.
  • Verification/Validation/Value Authority Control

    The sequence database entry found during PDB's search is compared to that provided by the depositor and any differences are resolved or annotated.

    In most cases, only one reference to a sequence database will be provided. PDB does not guarantee that all possible references to the listed databases will be provided.

    Relationships to Other Record Types

    DBREF represents the sequence as found in SEQRES records.

    Example

             1         2         3         4         5         6         7
    1234567890123456789012345678901234567890123456789012345678901234567890
    DBREF  2J83 A   61   322  UNP    Q8TL28   Q8TL28_METAC    61    322
    DBREF  2J83 B   61   322  UNP    Q8TL28   Q8TL28_METAC    61    322
    DBREF  1ABC B    1B   36  PDB    1ABC     1ABC             1B    36
    DBREF  3AKY      3   220  SWS    P07170   KAD1_YEAST       5    222 
    
    DBREF  1HAN      2   288  GB     397884   X66122           1    287 
    
    DBREF  3HSV A    1    92  SWS    P22121   HSF_KLULA      193    284
    DBREF  3HSV B    1    92  SWS    P22121   HSF_KLULA      193    284
    
    DBREF  1ARL      1   307  SWS    P00730   CBPA_BOVIN     111    417  
    


    SEQADV

    Overview

    The SEQADV record identifies conflicts between sequence information in the SEQRES records of the PDB entry and the sequence database entry given on DBREF. Please note that these records were designed to identify differences and not errors. No assumption is made as to which database contains the correct data. PDB may include REMARK records in the entry that reflect the depositor's view of which database has the correct sequence.

    Record Format

    COLUMNS       DATA TYPE       FIELD      DEFINITION
    -----------------------------------------------------------------
     1 -  6       Record name     "SEQADV"
     8 - 11       IDcode          idCode    ID code of this entry.
    13 - 15       Residue name    resName   Name of the PDB residue in conflict.
    17            Character       chainID   PDB chain identifier.
    19 - 22       Integer         seqNum    PDB sequence number.
    23            AChar           iCode     PDB insertion code.
    25 - 28       LString         database  
    30 - 38       LString         dbIdCode  Sequence database accession number.
    40 - 42       Residue name    dbRes     Sequence database residue name.
    44 - 48       Integer         dbSeq     Sequence database sequence number.
    50 - 70       LString         conflict  Conflict comment.
    

    Details

  • In a number of cases, conflicts between the sequences found in PDB entries and in PIR or SWISS- PROT entries have been noted. There are several possible reasons for these conflicts, including natural variants or engineered sequences (mutants), polymorphic sequences, or ambiguous or conflicting experimental results. These discrepancies, which were previously described in REMARK records, are now reported in SEQRES.
  • SEQADV describes conflicts between residue sequences given by SEQRES records and those in the appropriate sequence database entry.
  • Some of the possible conflict comments:
  •        Cloning artifact
           Conflict
           Engineered
           Disordered
           Variant
           Insertion
           Deletion
           Microheterogeneity
           D-configuration
    

  • When conflicts arise which are not classifiable by these terms, a reference to either a published paper, a PDB entry, or a REMARK within the entry is given.
  • Finally, the comment "SEE REMARK 999" is included when the explanation for the conflict is too long to fit the SEQADV record.
  • Microheterogeneity is to be represented as a variant with one of the possible residues in the site being selected (arbitrarily) as the primary residue, in which case a SEQADV record must be provided for the alternate residue.
  • Verification/Validation/Value Authority Control

    SEQADV records are automatically generated by the PDB.

    Relationships to Other Record Types

    SEQADV refers to the sequence as found in the SEQRES records, and to the sequence database reference found on DBREF.

    REMARK 999 contains text that explains discrepancies when the explanation is too lengthy to fit in SEQADV.

    Example

             1         2         3         4         5         6         7
    1234567890123456789012345678901234567890123456789012345678901234567890
    SEQADV 2J83 ALA A  269  UNP  Q8TL28    CYS   269 ENGINEERED MUTATION
    SEQADV 2J83 ALA B  269  UNP  Q8TL28    CYS   269 ENGINEERED MUTATION 
    
    SEQADV 3ABC MET A   -1  SWS  P10725              CLONING ARTIFACT
    SEQADV 3ABC GLY A   50  SWS  P10725    VAL    50 ENGINEERED
    


    SEQRES

    Overview

    SEQRES records contain the amino acid or nucleic acid sequence of residues in each chain of the macromolecule that was studied.

    Record Format

    COLUMNS      DATA TYPE       FIELD          DEFINITION
    -------------------------------------------------------------------
     1 -  6      Record name     "SEQRES"
     9 - 10      Integer         serNum         Serial number of the SEQRES record
                                                    for the current chain. Starts at 1
                                                    and increments by one each line.
                                                    Reset to 1 for each chain.
    12           Character       chainID        Chain identifier. This may be any
                                                    single legal character, including a
                                                    blank which is used if there is
                                                    only one chain.
    14 - 17      Integer         numRes         Number of residues in the chain.
                                                    This value is repeated on every
                                                    record.
    20 - 22      Residue name    resName        Residue name.
    24 - 26      Residue name    resName        Residue name.
    28 - 30      Residue name    resName        Residue name.
    32 - 34      Residue name    resName        Residue name.
    36 - 38      Residue name    resName        Residue name.
    40 - 42      Residue name    resName        Residue name.
    44 - 46      Residue name    resName        Residue name.
    48 - 50      Residue name    resName        Residue name.
    52 - 54      Residue name    resName        Residue name.
    56 - 58      Residue name    resName        Residue name.
    60 - 62      Residue name    resName        Residue name.
    64 - 66      Residue name    resName        Residue name.
    68 - 70      Residue name    resName        Residue name.
    

    Verification/Validation/Value Authority Control

    The residues presented on the SEQRES records must agree with those found in the ATOM records.

    The SEQRES records are checked by PDB using the sequence databases and information provided by the depositor.

    SEQRES is compared to the ATOM records during processing, and both are checked against the sequence database. All discrepancies are either resolved or annotated in the entry.

    Example

             1         2         3         4         5         6         7
    1234567890123456789012345678901234567890123456789012345678901234567890
    SEQRES   1 A   21  GLY ILE VAL GLU GLN CYS CYS THR SER ILE CYS SER LEU
    SEQRES   2 A   21  TYR GLN LEU GLU ASN TYR CYS ASN                    
    SEQRES   1 B   30  PHE VAL ASN GLN HIS LEU CYS GLY SER HIS LEU VAL GLU
    SEQRES   2 B   30  ALA LEU TYR LEU VAL CYS GLY GLU ARG GLY PHE PHE TYR
    SEQRES   3 B   30  THR PRO LYS ALA                                    
    SEQRES   1 C   21  GLY ILE VAL GLU GLN CYS CYS THR SER ILE CYS SER LEU
    SEQRES   2 C   21  TYR GLN LEU GLU ASN TYR CYS ASN                    
    SEQRES   1 D   30  PHE VAL ASN GLN HIS LEU CYS GLY SER HIS LEU VAL GLU
    SEQRES   2 D   30  ALA LEU TYR LEU VAL CYS GLY GLU ARG GLY PHE PHE TYR
    SEQRES   3 D   30  THR PRO LYS ALA                                    
    

    Known Problems

    Polysaccharides do not lend themselves to being represented in SEQRES.

    There is no mechanism provided to describe sequence runs when the exact ordering of the sequence is not known.

    For cyclic peptides, PDB arbitrarily assigns a residue as the N-terminus.

    No distinction is made between ribo- and deoxyribonucleotides in the SEQRES records. These residues are identified with the same residue name (i.e., A, C, G, T, U).


    MODRES

    Overview

    The MODRES record provides descriptions of modifications (e.g., chemical or post-translational) to protein and nucleic acid residues. Included are a mapping between residue names given in a PDB entry and standard residues.

    Record Format

    COLUMNS    DATA TYPE        FIELD         DEFINITION
    ----------------------------------------------------
     1 - 6     Record name      "MODRES"
     8 - 11    IDcode           idCode     ID code of this entry.
    13 - 15    Residue name     resName    Residue name used in this entry.
    17         Character        chainID    Chain identifier.
    19 - 22    Integer          seqNum     Sequence number.
    23         AChar            iCode      Insertion code.
    25 - 27    Residue name     stdRes     Standard residue name.
    30 - 70    String           comment    Description of the residue
                                           modification
    

    Details

  • Residues modified post-translationally, enzymatically, or by design are described in MODRES records. In those cases where PDB has opted to use a non-standard residue name for the residue, MODRES also provides a mapping to the precursor standard residue name.
  • MODRES is mandatory for when modified standard residues exist in the entry.
  • Examples of some modification descriptions:
  •        Glycosylation site
           Post-translational modification
           Designed chemical modification
           Phosphorylation site
           Blocked N-terminus
           Aminated C-terminus
           D-configuration
           Reduced peptide bond
    

  • MODRES is not required if coordinate records are not provided for the modified residue.
  • D-amino acids are given their own resName , i.e., DAL for D-alanine. This resName appears in the SEQRES records, and has the associated SEQADV, MODRES, HET, and FORMUL records. The coordinates are given as HETATMs within the ATOM records and occur in the correct order within the chain. This ordering is an exception to the stated Order of Records.
  • When a standard residue name is used to describe a modified site, resName (columns 13-15) and stdRES (columns 25-27) contain the same value.
  • Verification/Validation/Value Authority Control

    MODRES is generated by the PDB.

    Relationships to Other Record Types

    MODRES maps ATOM and HETATM records to the standard residue names. SEQADV, HET, and FORMUL may also appear.

    Example

             1         2         3         4         5         6         7
    1234567890123456789012345678901234567890123456789012345678901234567890
    MODRES 1ABC ASN A   22A ASN  GLYCOSYLATION SITE
    MODRES 2ABC TTQ A   50A TRP  POST-TRANSLATIONAL MODIFICATION
    MODRES 3ABC DAL A   32  ALA  POST-TRANSLATIONAL MODIFICATION,D-ALANINE
    MODRES 3ABC DAL B   32  ALA  POST-TRANSLATIONAL MODIFICATION,D-ALANINE
    


    � 2007 wwPDB