Primary Structure Section

The primary structure section of a PDB formatted file contains the sequence of residues in each chain of the macromolecule(s). Embedded in these records are chain identifiers and sequence numbers that allow other records to link into the sequence.


DBREF (standard format)

The DBREF record provides cross-reference links between PDB sequences (what appears in SEQRES record) and a corresponding database sequence. 

Record Format

COLUMNS       DATA TYPE     FIELD              DEFINITION
-----------------------------------------------------------------------------------
 1 -  6       Record name   "DBREF "
 8 - 11       IDcode        idCode             ID code of this entry.
13            Character     chainID            Chain  identifier.
15 - 18       Integer       seqBegin           Initial sequence number of the
                                               PDB sequence segment.
19            AChar         insertBegin        Initial  insertion code of the
                                               PDB  sequence segment.
21 - 24       Integer       seqEnd             Ending sequence number of the
                                               PDB  sequence segment.
25            AChar         insertEnd          Ending insertion code of the
                                               PDB  sequence segment.
27 - 32       LString       database           Sequence database name.
34 - 41       LString       dbAccession        Sequence database accession code.
43 - 54       LString       dbIdCode           Sequence  database identification code.
56 - 60       Integer       dbseqBegin         Initial sequence number of the
                                               database seqment.
61            AChar         idbnsBeg           Insertion code of initial residue of the
                                               segment, if PDB is the reference.
63 - 67       Integer       dbseqEnd           Ending sequence number of the
                                               database segment.
68            AChar         dbinsEnd           Insertion code of the ending residue of
                                               the segment, if PDB is the reference.

Note: By default this format is used as long as the information entered into these fields fits.  For sequence databases that use longer accession code or long sequence numbering, the new  DBREF1/DBREF2 format can be used.

Details

Verification/Validation/Value Authority Control

The sequence database entry found during PDB's search is compared to that provided by the depositor and any differences are resolved or annotated.

All polymers in the entry will be assigned a DBREF record.

Relationships to Other Record Types

DBREF represents the sequence as found in SEQRES records.

DBREF1/DBREF2 replaces DBREF when the accession codes or sequence numbering does not fit the DBREF format.

Examples

         1         2         3         4         5         6         7         8
12345678901234567890123456789012345678901234567890123456789012345678901234567890
DBREF  2JHQ A    1   226  UNP    Q9KPK8   UNG_VIBCH        1    226 
          
DBREF  3AKY A    1   219  UNP    P07170   KAD1_YEAST       3    221    

DBREF  1HAN A    2   298  UNP    P47228   BPHC_BURCE       1    297

DBREF  3D3I A    0   760  UNP    P42592   YGJK_ECOLI      23    783            
DBREF  3D3I B    0   760  UNP    P42592   YGJK_ECOLI      23    783       

DBREF  3C2J A    1     8  PDB    3C2J     3C2J             1      8            
DBREF  3C2J B  101   108  PDB    3C2J     3C2J           101    108            

DBREF  1FFK 0    2  2923  GB     3377779  AF034620      2597   5518            
DBREF  1FFK 9    1   122  GB     3377779  AF034620      5658   5779      
DBREF  1UNJ X    6    11  NOR    NOR00228 NOR00228         6     11


DBREF1 / DBREF2 (added)

Details

This updated two-line format is used when the accession code or sequence numbering does not fit the space allotted in the standard DBREF format. This includes some GenBank sequence numbering (greater than 5 characters) and UNIMES accession numbers (greater than 12 characters). 

Record Format

DBREF1

COLUMNS        DATA  TYPE    FIELD         DEFINITION
-----------------------------------------------------------------------------------
 1 -  6        Record name   "DBREF1"
 8 - 11        IDcode        idCode        ID code of this entry.
13             Character     chainID       Chain identifier.
15 - 18        Integer       seqBegin      Initial sequence number of the
                                           PDB sequence segment, right justified.
19             AChar         insertBegin   Initial insertion code of the
                                           PDB sequence segment.
21 - 24        Integer       seqEnd        Ending sequence number of the
                                           PDB sequence segment, right justified.
25             AChar         insertEnd     Ending insertion code of the
                                           PDB sequence  segment.
27 - 32        LString       database      Sequence database name.
48 - 67        LString       dbIdCode      Sequence database identification code,
                                           left justified.

DBREF2

COLUMNS       DATA  TYPE    FIELD         DEFINITION
-----------------------------------------------------------------------------------
 1 -  6       Record name   "DBREF2"
 8 - 11       IDcode        idCode        ID code of this entry.
13            Character     chainID       Chain identifier.
19 - 40       LString       dbAccession   Sequence database accession code,
                                          left justified.
46 - 55       Integer       seqBegin      Initial sequence number of the
                                          Database segment, right justified.
58 - 67       Integer       seqEnd        Ending sequence number of the
                                          Database segment, right justified.

Details

Verification/Validation/Value Authority Control

The sequence database entry found by wwPDB staff is compared to answers provided by the depositor; any differences are resolved or annotated appropriately.

Relationships to Other Record Types

DBREF1/DBREF2 represents the sequence as found in SEQRES records.

Template

         1         2         3         4         5         6         7         8
12345678901234567890123456789012345678901234567890123456789012345678901234567890
DBREF1 2J83 A   61   322  XXXXXX               YYYYYYYYYYYYYYYYYYYY                    
DBREF2 2J83 A     ZZZZZZZZZZZZZZZZZZZZZZ     nnnnnnnnnn  mmmmmmmmmm

Examples

         1         2         3         4         5         6         7         8
12345678901234567890123456789012345678901234567890123456789012345678901234567890
DBREF1 2J83 A   61    322 UNIMES               UPI000148A153                   
DBREF2 2J83 A     MES00005880000                     61         322

         1         2         3         4         5         6         7         8
12345678901234567890123456789012345678901234567890123456789012345678901234567890
DBREF1 2J83 A   61   322  GB                   AE017221                   
DBREF2 2J83 A     46197919                      1534489     1537377


SEQADV

Overview

The SEQADV record identifies differences between sequence information in the SEQRES records of the PDB entry and the sequence database entry given in DBREF.  Please note that these records were designed to identify differences and not errors.  No assumption is made as to which database contains the correct data.  A comment explaining any engineered differences in the sequence between the PDB and the sequence database may also be included here.

Record Format

COLUMNS        DATA TYPE     FIELD         DEFINITION
-----------------------------------------------------------------
 1 -  6        Record name   "SEQADV"
 8 - 11        IDcode        idCode        ID  code of this entry.
13 - 15        Residue name  resName       Name of the PDB residue in conflict.
17             Character     chainID       PDB  chain identifier.
19 - 22        Integer       seqNum        PDB  sequence number.
23             AChar         iCode         PDB insertion code.
25 - 28        LString       database
30 - 38        LString       dbIdCode      Sequence  database accession number.
40 - 42        Residue name  dbRes         Sequence database residue name.
44 - 48        Integer       dbSeq         Sequence database sequence number.
50 - 70        LString       conflict      Conflict comment.

Details

Verification/Validation/Value Authority Control

SEQADV records are automatically generated.

Relationships to Other Record Types

SEQADV refers to the sequence as found in the SEQRES records, and to the sequence database
reference found on DBREF.

REMARK 999 contains text that explains discrepancies when the explanation is too lengthy to fit in SEQADV.

Examples

         1         2         3         4         5         6         7         8
12345678901234567890123456789012345678901234567890123456789012345678901234567890
SEQADV 3ABC MET A   -1  UNP  P10725              EXPRESSION TAG
SEQADV 3ABC GLY A   50  UNP  P10725    VAL    50 ENGINEERED
SEQADV 2QLE CRO A   66  UNP  P42212    SER    65 CHROMOPHORE
SEQADV 2OKW LEU A   64  UNP  P42212    PHE    64 SEE REMARK 999 
SEQADV 2OKW LEU A   64  NOR  NOR00669  PHE    14 SEE REMARK 999


SEQRES (updated)

Overview

SEQRES records contain a listing of the consecutive chemical components covalently linked in a linear fashion to form a polymer. The chemical components included in this listing may be standard or modified amino acid and nucleic acid residues. It may also include other residues that are linked to the standard backbone in the polymer. Chemical components or groups covalently linked to side-chains (in peptides) or sugars and/or bases (in nucleic acid polymers) will not be listed here.

Record Format

COLUMNS        DATA TYPE      FIELD        DEFINITION
-------------------------------------------------------------------------------------
 1 -  6        Record name    "SEQRES"
 8 - 10        Integer        serNum       Serial number of the SEQRES record for  the
                                           current  chain. Starts at 1 and increments
                                           by one  each line. Reset to 1 for each chain.
12             Character      chainID      Chain identifier. This may be any single
                                           legal  character, including a blank which is
                                           is  used if there is only one chain.
14 - 17        Integer        numRes       Number of residues in the chain.
                                           This  value is repeated on every record.
20 - 22        Residue name   resName      Residue name.
24 - 26        Residue name   resName      Residue name.
28 - 30        Residue name   resName      Residue name.
32 - 34        Residue name   resName      Residue name.
36 - 38        Residue name   resName      Residue name.
40 - 42        Residue name   resName      Residue name.
44 - 46        Residue name   resName      Residue name.
48 - 50        Residue name   resName      Residue name.
52 - 54        Residue name   resName      Residue name.
56 - 58        Residue name   resName      Residue name.
60 - 62        Residue name   resName      Residue name.
64 - 66        Residue name   resName      Residue name.
68 - 70        Residue name   resName      Residue name.

Verification/Validation/Value Authority Control

The residues presented in the ATOM records must agree with those on the SEQRES records.

The SEQRES records are checked using sequence databases and information provided by the depositor.

SEQRES is compared to the ATOM records during processing, and both are checked against the sequence databases. All discrepancies are either resolved or annotated appropriately in the entry.

The ribo- and deoxyribonucleotides in the SEQRES records are distinguished.  The ribo- forms of these residues are identified with the residue names A, C, G, U and I. The deoxy- forms of these residues are identified with the residue names DA, DC, DG, DT and DI. Modified nucleotides in the sequence are identified by separate 3-letter residue codes.  The plus character prefix to label modified nucleotides (e.g. +A, +C, +T) is no longer used.

Example

         1         2         3         4         5         6         7         8
12345678901234567890123456789012345678901234567890123456789012345678901234567890
SEQRES   1 A   21  GLY ILE VAL GLU GLN CYS CYS THR SER ILE CYS SER LEU         
SEQRES   2 A   21  TYR GLN LEU GLU ASN TYR CYS ASN                              
SEQRES   1 B   30  PHE VAL ASN GLN HIS LEU CYS GLY SER HIS LEU VAL GLU         
SEQRES   2 B   30  ALA LEU TYR LEU VAL CYS GLY GLU ARG GLY PHE PHE TYR         
SEQRES   3 B   30  THR PRO LYS ALA                                              
SEQRES   1 C   21  GLY ILE VAL GLU GLN CYS CYS THR SER ILE CYS SER LEU         
SEQRES   2 C   21  TYR GLN LEU GLU ASN TYR CYS ASN                               
SEQRES   1 D   30  PHE VAL ASN GLN HIS LEU CYS GLY SER HIS LEU VAL GLU         
SEQRES   2 D   30  ALA LEU TYR LEU VAL CYS GLY GLU ARG GLY PHE PHE TYR         
                                                                                                                                                  SEQRES   3 D   30   THR PRO LYS ALA
SEQRES   1 A    8   DA  DA  DC  DC  DG  DG  DT  DT                             
SEQRES   1 B    8   DA  DA  DC  DC  DG  DG  DT  DT

SEQRES   1 X   39    U   C   C   C   C   C   G   U   G   C   C   C   A         
SEQRES   2 X   39    U   A   G   C   G   G   C   G   U   G   G   A   A           
SEQRES   3 X   39    C   C   A   C   C   C   G   U   U   C   C   C   A       

Known Problems

Polysaccharides do not lend themselves to being represented in SEQRES.

There is no mechanism provided to describe the sequence order if their starting position is unknown.

For cyclic peptides, a residue is arbitrarily assigned as the N-terminus.


MODRES (updated)

Overview

The MODRES record provides descriptions of modifications (e.g., chemical or post-translational) to protein and nucleic acid residues. Included are correlations between residue names given in a PDB entry and standard residues.

Record Format

COLUMNS        DATA TYPE     FIELD       DEFINITION
--------------------------------------------------------------------------------
 1 -  6        Record name   "MODRES"
 8 - 11        IDcode        idCode      ID code of this entry.
13 - 15        Residue name  resName     Residue name used in this entry.
17             Character     chainID     Chain identifier.
19 - 22        Integer       seqNum      Sequence number.
23             AChar         iCode       Insertion code.
25 - 27        Residue name  stdRes      Standard residue name.
30 - 70        String        comment     Description of the residue modification.

Details

Verification/Validation/Value Authority Control

MODRES is generated by the wwPDB.

Relationships to Other Record Types

MODRES maps ATOM and HETATM records to the standard residue names. HET, and FORMUL may also appear.

Example

         1         2         3         4         5         6         7         8
12345678901234567890123456789012345678901234567890123456789012345678901234567890
MODRES 2R0L ASN A   74  ASN  GLYCOSYLATION SITE  
MODRES 1IL2 1MG D 1937    G  1N-METHYLGUANOSINE-5'-MONOPHOSPHATE 
MODRES 4ABC MSE B   32  MET  SELENOMETHIONINE


© 2010 wwPDB