The primary structure section of a PDB formatted file contains the sequence of residues in each chain of the macromolecule(s). Embedded in these records are chain identifiers and sequence numbers that allow other records to link into the sequence.
The DBREF record provides cross-reference links between PDB sequences (what appears in SEQRES record) and a corresponding database sequence.
Record Format
COLUMNS DATA TYPE FIELD DEFINITION ----------------------------------------------------------------------------------- 1 - 6 Record name "DBREF " 8 - 11 IDcode idCode ID code of this entry. 13 Character chainID Chain identifier. 15 - 18 Integer seqBegin Initial sequence number of the PDB sequence segment. 19 AChar insertBegin Initial insertion code of the PDB sequence segment. 21 - 24 Integer seqEnd Ending sequence number of the PDB sequence segment. 25 AChar insertEnd Ending insertion code of the PDB sequence segment. 27 - 32 LString database Sequence database name. 34 - 41 LString dbAccession Sequence database accession code. 43 - 54 LString dbIdCode Sequence database identification code. 56 - 60 Integer dbseqBegin Initial sequence number of the database seqment. 61 AChar idbnsBeg Insertion code of initial residue of the segment, if PDB is the reference. 63 - 67 Integer dbseqEnd Ending sequence number of the database segment. 68 AChar dbinsEnd Insertion code of the ending residue of the segment, if PDB is the reference.
Note: By default this format is used as long as the information entered into these fields fits. For sequence databases that use longer accession code or long sequence numbering, the new DBREF1/DBREF2 format can be used.
Details
The DBREF records present sequence correlations between PDB SEQRES records and corresponding GenBank (for nucleic acids) or UNIPROT/Norine (for proteins) entries. PDB entries containing heteropolymers are linked to different sequence database entries.
Database abbreviations Database name (columns 27 – 32) ---------------------------------------------------------------------- GenBank GB Protein Data Bank PDB UNIPROT UNP Norine NORINE
Verification/Validation/Value Authority Control
The sequence database entry found during PDB's search is compared to that provided by the depositor and any differences are resolved or annotated.
All polymers in the entry will be assigned a DBREF record.
Relationships to Other Record Types
DBREF represents the sequence as found in SEQRES records.
DBREF1/DBREF2 replaces DBREF when the accession codes or sequence numbering does not fit the DBREF format.
Examples
1 2 3 4 5 6 7 8 12345678901234567890123456789012345678901234567890123456789012345678901234567890 DBREF 2JHQ A 1 226 UNP Q9KPK8 UNG_VIBCH 1 226 DBREF 3AKY A 1 219 UNP P07170 KAD1_YEAST 3 221 DBREF 1HAN A 2 298 UNP P47228 BPHC_BURCE 1 297 DBREF 3D3I A 0 760 UNP P42592 YGJK_ECOLI 23 783 DBREF 3D3I B 0 760 UNP P42592 YGJK_ECOLI 23 783 DBREF 3C2J A 1 8 PDB 3C2J 3C2J 1 8 DBREF 3C2J B 101 108 PDB 3C2J 3C2J 101 108 DBREF 1FFK 0 2 2923 GB 3377779 AF034620 2597 5518 DBREF 1FFK 9 1 122 GB 3377779 AF034620 5658 5779 DBREF 1UNJ X 6 11 NOR NOR00228 NOR00228 6 11
Details
This updated two-line format is used when the accession code or sequence numbering does not fit the space allotted in the standard DBREF format. This includes some GenBank sequence numbering (greater than 5 characters) and UNIMES accession numbers (greater than 12 characters).
Record Format
DBREF1
COLUMNS DATA TYPE FIELD DEFINITION ----------------------------------------------------------------------------------- 1 - 6 Record name "DBREF1" 8 - 11 IDcode idCode ID code of this entry. 13 Character chainID Chain identifier. 15 - 18 Integer seqBegin Initial sequence number of the PDB sequence segment, right justified. 19 AChar insertBegin Initial insertion code of the PDB sequence segment. 21 - 24 Integer seqEnd Ending sequence number of the PDB sequence segment, right justified. 25 AChar insertEnd Ending insertion code of the PDB sequence segment. 27 - 32 LString database Sequence database name. 48 - 67 LString dbIdCode Sequence database identification code, left justified.
DBREF2
COLUMNS DATA TYPE FIELD DEFINITION ----------------------------------------------------------------------------------- 1 - 6 Record name "DBREF2" 8 - 11 IDcode idCode ID code of this entry. 13 Character chainID Chain identifier. 19 - 40 LString dbAccession Sequence database accession code, left justified. 46 - 55 Integer seqBegin Initial sequence number of the Database segment, right justified. 58 - 67 Integer seqEnd Ending sequence number of the Database segment, right justified.
Details
Database abbreviations Database name (columns 27 – 32) ---------------------------------------------------------------------- GenBank GB UNIMES UNIMES
Verification/Validation/Value Authority Control
The sequence database entry found by wwPDB staff is compared to answers provided by the depositor; any differences are resolved or annotated appropriately.
Relationships to Other Record Types
DBREF1/DBREF2 represents the sequence as found in SEQRES records.
Template
1 2 3 4 5 6 7 8 12345678901234567890123456789012345678901234567890123456789012345678901234567890 DBREF1 2J83 A 61 322 XXXXXX YYYYYYYYYYYYYYYYYYYY DBREF2 2J83 A ZZZZZZZZZZZZZZZZZZZZZZ nnnnnnnnnn mmmmmmmmmm
Examples
1 2 3 4 5 6 7 8 12345678901234567890123456789012345678901234567890123456789012345678901234567890 DBREF1 2J83 A 61 322 UNIMES UPI000148A153 DBREF2 2J83 A MES00005880000 61 322 1 2 3 4 5 6 7 8 12345678901234567890123456789012345678901234567890123456789012345678901234567890 DBREF1 2J83 A 61 322 GB AE017221 DBREF2 2J83 A 46197919 1534489 1537377
Overview
The SEQADV record identifies differences between sequence information in the SEQRES records of the PDB entry and the sequence database entry given in DBREF. Please note that these records were designed to identify differences and not errors. No assumption is made as to which database contains the correct data. A comment explaining any engineered differences in the sequence between the PDB and the sequence database may also be included here.
Record Format
COLUMNS DATA TYPE FIELD DEFINITION ----------------------------------------------------------------- 1 - 6 Record name "SEQADV" 8 - 11 IDcode idCode ID code of this entry. 13 - 15 Residue name resName Name of the PDB residue in conflict. 17 Character chainID PDB chain identifier. 19 - 22 Integer seqNum PDB sequence number. 23 AChar iCode PDB insertion code. 25 - 28 LString database 30 - 38 LString dbIdCode Sequence database accession number. 40 - 42 Residue name dbRes Sequence database residue name. 44 - 48 Integer dbSeq Sequence database sequence number. 50 - 70 LString conflict Conflict comment.
Details
- Cloning artifact - Expression tag - Conflict - Engineered - Variant - Insertion - Deletion - Microheterogeneity - Chromophore
Verification/Validation/Value Authority Control
SEQADV records are automatically generated.
Relationships to Other Record Types
SEQADV refers to the sequence as found in the SEQRES records, and to the sequence database
reference found on DBREF.
REMARK 999 contains text that explains discrepancies when the explanation is too lengthy to fit in SEQADV.
Examples
1 2 3 4 5 6 7 8 12345678901234567890123456789012345678901234567890123456789012345678901234567890 SEQADV 3ABC MET A -1 UNP P10725 EXPRESSION TAG SEQADV 3ABC GLY A 50 UNP P10725 VAL 50 ENGINEERED SEQADV 2QLE CRO A 66 UNP P42212 SER 65 CHROMOPHORE SEQADV 2OKW LEU A 64 UNP P42212 PHE 64 SEE REMARK 999 SEQADV 2OKW LEU A 64 NOR NOR00669 PHE 14 SEE REMARK 999
Overview
SEQRES records contain a listing of the consecutive chemical components covalently linked in a linear fashion to form a polymer. The chemical components included in this listing may be standard or modified amino acid and nucleic acid residues. It may also include other residues that are linked to the standard backbone in the polymer. Chemical components or groups covalently linked to side-chains (in peptides) or sugars and/or bases (in nucleic acid polymers) will not be listed here.
Record Format
COLUMNS DATA TYPE FIELD DEFINITION ------------------------------------------------------------------------------------- 1 - 6 Record name "SEQRES" 8 - 10 Integer serNum Serial number of the SEQRES record for the current chain. Starts at 1 and increments by one each line. Reset to 1 for each chain. 12 Character chainID Chain identifier. This may be any single legal character, including a blank which is is used if there is only one chain. 14 - 17 Integer numRes Number of residues in the chain. This value is repeated on every record. 20 - 22 Residue name resName Residue name. 24 - 26 Residue name resName Residue name. 28 - 30 Residue name resName Residue name. 32 - 34 Residue name resName Residue name. 36 - 38 Residue name resName Residue name. 40 - 42 Residue name resName Residue name. 44 - 46 Residue name resName Residue name. 48 - 50 Residue name resName Residue name. 52 - 54 Residue name resName Residue name. 56 - 58 Residue name resName Residue name. 60 - 62 Residue name resName Residue name. 64 - 66 Residue name resName Residue name. 68 - 70 Residue name resName Residue name.
Verification/Validation/Value Authority Control
The residues presented in the ATOM records must agree with those on the SEQRES records.
The SEQRES records are checked using sequence databases and information provided by the depositor.
SEQRES is compared to the ATOM records during processing, and both are checked against the sequence databases. All discrepancies are either resolved or annotated appropriately in the entry.
The ribo- and deoxyribonucleotides in the SEQRES records are distinguished. The ribo- forms of these residues are identified with the residue names A, C, G, U and I. The deoxy- forms of these residues are identified with the residue names DA, DC, DG, DT and DI. Modified nucleotides in the sequence are identified by separate 3-letter residue codes. The plus character prefix to label modified nucleotides (e.g. +A, +C, +T) is no longer used.
Example
1 2 3 4 5 6 7 8 12345678901234567890123456789012345678901234567890123456789012345678901234567890 SEQRES 1 A 21 GLY ILE VAL GLU GLN CYS CYS THR SER ILE CYS SER LEU SEQRES 2 A 21 TYR GLN LEU GLU ASN TYR CYS ASN SEQRES 1 B 30 PHE VAL ASN GLN HIS LEU CYS GLY SER HIS LEU VAL GLU SEQRES 2 B 30 ALA LEU TYR LEU VAL CYS GLY GLU ARG GLY PHE PHE TYR SEQRES 3 B 30 THR PRO LYS ALA SEQRES 1 C 21 GLY ILE VAL GLU GLN CYS CYS THR SER ILE CYS SER LEU SEQRES 2 C 21 TYR GLN LEU GLU ASN TYR CYS ASN SEQRES 1 D 30 PHE VAL ASN GLN HIS LEU CYS GLY SER HIS LEU VAL GLU SEQRES 2 D 30 ALA LEU TYR LEU VAL CYS GLY GLU ARG GLY PHE PHE TYR SEQRES 3 D 30 THR PRO LYS ALA SEQRES 1 A 8 DA DA DC DC DG DG DT DT SEQRES 1 B 8 DA DA DC DC DG DG DT DT SEQRES 1 X 39 U C C C C C G U G C C C A SEQRES 2 X 39 U A G C G G C G U G G A A SEQRES 3 X 39 C C A C C C G U U C C C A
Known Problems
Polysaccharides do not lend themselves to being represented in SEQRES.
There is no mechanism provided to describe the sequence order if their starting position is unknown.
For cyclic peptides, a residue is arbitrarily assigned as the N-terminus.
Overview
The MODRES record provides descriptions of modifications (e.g., chemical or post-translational) to protein and nucleic acid residues. Included are correlations between residue names given in a PDB entry and standard residues.
Record Format
COLUMNS DATA TYPE FIELD DEFINITION -------------------------------------------------------------------------------- 1 - 6 Record name "MODRES" 8 - 11 IDcode idCode ID code of this entry. 13 - 15 Residue name resName Residue name used in this entry. 17 Character chainID Chain identifier. 19 - 22 Integer seqNum Sequence number. 23 AChar iCode Insertion code. 25 - 27 Residue name stdRes Standard residue name. 30 - 70 String comment Description of the residue modification.
Details
- Glycosylation site - Post-translational modification - Designed chemical modification - Phosphorylation site - D-configuration
Verification/Validation/Value Authority Control
MODRES is generated by the wwPDB.
Relationships to Other Record Types
MODRES maps ATOM and HETATM records to the standard residue names. HET, and FORMUL may also appear.
Example
1 2 3 4 5 6 7 8 12345678901234567890123456789012345678901234567890123456789012345678901234567890 MODRES 2R0L ASN A 74 ASN GLYCOSYLATION SITE MODRES 1IL2 1MG D 1937 G 1N-METHYLGUANOSINE-5'-MONOPHOSPHATE MODRES 4ABC MSE B 32 MET SELENOMETHIONINE