wwpdb
PDB FORMAT Version 2.3
Main Index
HEADER
OBSLTE
TITLE
CAVEAT
COMPND
SOURCE
KEYWDS
EXPDTA
AUTHOR
REVDAT
SPRSDE
JRNL
REMARK

Title Section

This section contains records used to describe the experiment and the biological macromolecules present in the entry: HEADER, OBSLTE, TITLE, CAVEAT, COMPND, SOURCE, KEYWDS, EXPDTA, AUTHOR, REVDAT, SPRSDE, JRNL, and REMARK records. HEADER


HEADER

Overview

The HEADER record uniquely identifies a PDB entry through the idCode field. This record also provides a classification for the entry. Finally, it contains the date the coordinates were deposited at the PDB.

Record Format

COLUMNS      DATA TYPE      FIELD             DEFINITION
---------------------------------------------------------------------------
 1 -  6      Record name    "HEADER"
11 - 50      String(40)     classification    Classifies the molecule(s)
51 - 59      Date           depDate           Deposition date. 
                                              This is the date the coordinates were 
                                              received by the PDB
63 - 66      IDcode         idCode            This identifier is unique within the PDB

Details

  • The classification string is left-justified and exactly matches one of a collection of strings. See the class list available from the WWW site. In the case of macromolecular complexes, the classification field must present a class for each macromolecule present. Due to the limited length of the classification field, strings must sometimes be abbreviated. In these cases, the full terms are given in KEYWDS.
  • Classification may be based on function, metabolic role, molecule type, cellular location, etc. In the case of a molecule having a dual function, both may be presented here.
  • Verification/Validation/Value Authority Control

    The verification program checks that the deposition date is a legitimate date and that the ID code is well-formed.

    PDB coordinate entry ID codes do not begin with 0, as this is used to identify the NOC ("no coordinates) files that are bibliographic only, not structural entries.

    Relationships to Other Record Types

    The classification found in HEADER also appears in KEYWDS, unabbreviated and in no strict order.

    Example

             1         2         3         4         5         6         7
    1234567890123456789012345678901234567890123456789012345678901234567890
    HEADER    MUSCLE PROTEIN                          02-JUN-93   1MYS
    
    HEADER    HYDROLASE (CARBOXYLIC ESTER)            08-APR-93   2PHI
    
    HEADER    COMPLEX (LECTIN/TRANSFERRIN)            07-JAN-94   1LGB
    


    OBSLTE

    Overview

    OBSLTE appears in entries that have been withdrawn from distribution.

    This record acts as a flag in an entry that has been withdrawn from the PDB's full release. It indicates which, if any, new entries have replaced the withdrawn entry. The format allows for the case of multiple new entries replacing one existing entry.

    Record Format

    COLUMNS    DATA TYPE          FIELD               DEFINITION
    -----------------------------------------------------------------------------
     1 -  6    Record name      "OBSLTE"
     9 - 10    Continuation     continuation  Allows concatenation of multiple records
    12 - 20    Date             repDate       Date that this entry was replaced.
    22 - 25    IDcode           idCode        ID code of this entry.
    32 - 35    IDcode           rIdCode       ID code of entry that replaced this one.
    37 - 40    IDcode           rIdCode       ID code of entry that replaced this one.
    42 - 45    IDcode           rIdCode       ID code of entry that replaced this one.
    47 - 50    IDcode           rIdCode       ID code of entry that replaced this one.
    52 - 55    IDcode           rIdCode       ID code of entry that replaced this one.
    57 - 60    IDcode           rIdCode       ID code of entry that replaced this one.
    62 - 65    IDcode           rIdCode       ID code of entry that replaced this one.
    67 - 70    IDcode           rIdCode       ID code of entry that replaced this one.
    

    Details

    It is PDB policy that only the primary author who submitted an entry has the authority to obsolete it. All OBSLTE entries are available from the PDB archive.

    Verification/Validation/Value Authority Control

    PDB staff adds this record at the time an entry is removed from release.

    Relationships to Other Record Types

    None.

    Example

             1         2         3         4         5         6         7
    1234567890123456789012345678901234567890123456789012345678901234567890
    OBSLTE     31-JAN-94 1MBP      2MBP
    


    TITLE

    Overview

    The TITLE record contains a title for the experiment or analysis that is represented in the entry. It should identify an entry in the PDB in the same way that a title identifies a paper.

    Record Format

    COLUMNS    DATA TYPE        FIELD            DEFINITION
    ----------------------------------------------------------------------------
     1 -  6    Record name      "TITLE "
     9 - 10    Continuation     continuation     Allows concatenation of multiple records.
    11 - 70    String           title            Title of the experiment.
    

    Details

  • The title of the entry is free text and should describe the contents of the entry and any procedures or conditions that distinguish this entry from similar entries. It presents an opportunity for the depositor to emphasize the underlying purpose of this particular experiment.
  • Some items that may be included in TITLE are:
  •  - Experiment type.
     - Description of the mutation.
     - The fact that only alpha carbon coordinates have been provided in the entry.
    

    Verification/Validation/Value Authority Control

    This record is free text so no verification of format is required. The title is supplied by the depositor, but PDB staff may exercise editorial judgment in consultation with depositors in assigning the title.

    Relationships to Other Record Types

    COMPND, SOURCE, EXPDTA, and REMARKs provide information that may also be found in TITLE. You may think of the title as describing the experiment, and the compound record as describing the molecule(s).

    Example

             1         2         3         4         5         6         7
    1234567890123456789012345678901234567890123456789012345678901234567890
    TITLE     RHIZOPUSPEPSIN COMPLEXED WITH REDUCED PEPTIDE INHIBITOR
    
    TITLE     BETA-GLUCOSYLTRANSFERASE, ALPHA CARBON COORDINATES ONLY
    
    TITLE     NMR STUDY OF OXIDIZED THIOREDOXIN MUTANT (C62A,C69A,C73A)
    TITLE    2 MINIMIZED AVERAGE STRUCTURE
    


    CAVEAT

    Overview

    CAVEAT warns of chirality errors in an entry.

    Record Format

    COLUMNS      DATA TYPE        FIELD               DEFINITION
    ----------------------------------------------------------------------
     1 - 6       Record name      "CAVEAT"
     9 - 10      Continuation     continuation      Allows concatenation of multiple records.
    12 - 15      IDcode           idCode            PDB ID code of this entry.
    20 - 70      String           comment           Free text giving the reason for the CAVEAT.
    

    Details

  • Please note the CAVEAT will also be included in cases where PDB is unable to verify the transformation back to the crystallographic cell. In these cases, the molecular structure may still be correct.
  • Verification/Validation/Value Authority Control

    CAVEAT will be added by the PDB to entries known to be incorrect.


    COMPND

    Overview

    The COMPND record describes the macromolecular contents of an entry. Each macromolecule found in the entry is described by a set of token: value pairs, and is referred to as a COMPND record component. Since the concept of a molecule is difficult to specify exactly, PDB staff may exercise editorial judgment in consultation with depositors in assigning these names.

    Record Format

    COLUMNS        DATA TYPE         FIELD          DEFINITION                        
    ----------------------------------------------------------------------------------
     1 -  6        Record name       "COMPND"                                         
     9 - 10        Continuation      continuation   Allows concatenation of multiple records.                          
    11 - 70        Specification     compound       Description of the molecular      
                   list                             components.                  
    

    Details

  • The compound record is a Specification list. The specifications, or tokens, that may be used are listed below:
  • TOKEN         VALUE DEFINITION
    ---------------------------------------------------------------------------------
    MOL_ID        Numbers each component; also used in SOURCE to associate the information.
    MOLECULE      Name of the macromolecule.
    CHAIN         Comma-separated list of chain identifier(s). 
    FRAGMENT      Specifies a domain or region of the molecule.
    SYNONYM       Comma-separated list of synonyms for the MOLECULE.
    EC            The Enzyme Commission number associated with the
                  molecule. If there is more than one EC number, they
                  are presented as a comma-separated list.
    ENGINEERED    Indicates that the molecule was produced using
                  recombinant technology or by purely chemical synthesis.
    MUTATION      Indicates if there is a mutation.
    OTHER_DETAILS Additional comments.
    

  • In the general case the PDB tends to reflect the biological/functional view of the molecule. For example, the hetero-tetramer hemoglobin molecule is treated as a discrete component in COMPND.
  • In the case of synthetic molecules, e. g., hybrids, the depositor will provide the description.
  • No specific rules apply to the ordering of the tokens, except that the occurrence of MOL_ID or FRAGMENT indicates that the subsequent tokens are related to that specific molecule or fragment of the molecule.
  • Asterisks in nucleic acid names (in MOLECULE) are for ease of reading.
  • When insertion codes are given as part of the residue name, they must be given within square brackets, i.e., H57[A]N. This might occur when listing residues in FRAGMENT or OTHER_DETAILS.
  • For multi-chain molecules, e.g., the hemoglobin tetramer, a comma-separated list of CHAIN identifiers is used.
  • When non-blank chain identifiers occur in the entry, they must be specified.
  • Verification/Validation/Value Authority Control

    CHAIN must match the chain identifiers(s) of the molecule(s). EC numbers are also checked

    Relationships to Other Record Types

    In the case of mutations, the SEQADV records will present differences from the reference molecule. REMARK records may further describe the contents of the entry. Also see verification above.

    Example

             1         2         3         4         5         6         7
    1234567890123456789012345678901234567890123456789012345678901234567890
    COMPND    MOL_ID: 1;
    COMPND   2 MOLECULE: HEMOGLOBIN;
    COMPND   3 CHAIN: A, B, C, D;
    COMPND   4 ENGINEERED: YES;
    COMPND   5 MUTATION: YES
    COMPND   6 OTHER_DETAILS: DEOXY FORM
    
    COMPND    MOL_ID: 1;
    COMPND   2 MOLECULE: COWPEA CHLOROTIC MOTTLE VIRUS;
    COMPND   3 CHAIN: A, B, C;
    COMPND   4 SYNONYM: CCMV;
    COMPND   5 MOL_ID: 2;
    COMPND   6 MOLECULE: RNA (5'-(*AP*UP*AP*U)-3');
    COMPND   7 CHAIN: D, F;
    COMPND   8 ENGINEERED: YES;
    COMPND   9 MOL_ID: 3;
    COMPND  10 MOLECULE: RNA (5'-(*AP*U)-3');
    COMPND  11 CHAIN: E;
    COMPND  12 ENGINEERED: YES
    
    COMPND    MOL_ID: 1;                                            
    COMPND   2 MOLECULE: HEVAMINE A;                                
    COMPND   3 CHAIN: A;                                         
    COMPND   4 EC: 3.2.1.14, 3.2.1.17;                              
    COMPND   5 OTHER_DETAILS: PLANT ENDOCHITINASE/LYSOZYME          
    


    SOURCE

    Overview

    The SOURCE record specifies the biological and/or chemical source of each biological molecule in the entry. Sources are described by both the common name and the scientific name, e.g., genus and species. Strain and/or cell-line for immortalized cells are given when they help to uniquely identify the biological entity studied.

    Record Format

    COLUMNS   DATA TYPE         FIELD          DEFINITION                        
    -------------------------------------------------------------------------------
     1 -  6   Record name       "SOURCE"                                         
     9 - 10   Continuation      continuation   Allows concatenation of multiple records.                         
    11 - 70   Specification     srcName        Identifies the source of the macromolecule in 
               list                            a token: value format.                        
    

    Details

    TOKEN                                VALUE DEFINITION                        
    ---------------------------------------------------------------------------------
    MOL_ID                               Numbers each molecule.  Same as appears in COMPND
    SYNTHETIC                            Indicates a chemically-synthesized source.  
    FRAGMENT                             A domain or fragment of the molecule may be specified
    ORGANISM_SCIENTIFIC                  Scientific name of the organism.            
    ORGANISM_COMMON                      Common name of the organism.                
    STRAIN                               Identifies the strain.                      
    VARIANT                              Identifies the variant.                     
    CELL_LINE                            The specific line of cells used in the experiment
    ATCC                                 American Type Culture Collection tissue culture number
    ORGAN                                Organized group of tissues that carries on a specialized function
    TISSUE                               Organized group of cells with a common function and structure
    CELL                                 Identifies the particular cell type
    ORGANELLE                            Organized structure within a cell
    SECRETION                            Identifies the secretion, such as saliva, 
                                         urine, or venom, from which the molecule was isolated
    CELLULAR_LOCATION                    Identifies the location inside (or outside) the cell.
    PLASMID                              Identifies the plasmid containing the gene. 
    GENE                                 Identifies the gene.                        
    EXPRESSION_SYSTEM                    System used to express recombinant macromolecules
    EXPRESSION_SYSTEM_STRAIN             Strain of the organism in which the molecule was expressed
    EXPRESSION_SYSTEM_VARIANT            Variant of the organism used as the expression system
    EXPRESSION_SYSTEM_CELL_LINE          The specific line of cells used as the expression system
    EXPRESSION_SYSTEM_ATCC_NUMBER        Identifies the ATCC number of the expression system
    EXPRESSION_SYSTEM_ORGAN              Specific organ which expressed the molecule.
    EXPRESSION_SYSTEM_TISSUE             Specific tissue which expressed the molecule.
    EXPRESSION_SYSTEM_CELL               Specific cell type which expressed the molecule
    EXPRESSION_SYSTEM_ORGANELLE          Specific organelle which expressed the molecule
    EXPRESSION_SYSTEM_CELLULAR_LOCATION  Identifies the location inside or outside 
                                         the cell which expressed the molecule.
    EXPRESSION_SYSTEM_VECTOR_TYPE        Identifies the type of vector used, i.e., 
                                         plasmid, virus, or cosmid.
    EXPRESSION_SYSTEM_VECTOR             Identifies the vector used.
    EXPRESSION_SYSTEM_PLASMID            Plasmid used in the recombinant experiment. 
    EXPRESSION_SYSTEM_GENE               Name of the gene used in recombinant experiment.                                
    OTHER_DETAILS                        Used to present information on the source 
                                         which is not given elsewhere.              
    

  • The srcName is a list of token: value pairs describing each biological component of the entry.
  • As in COMPND, the order is not specified except that MOL_ID or FRAGMENT indicates subsequent specifications are related to that molecule or fragment of the molecule.
  • Physical layout of these items may be altered by PDB staff to improve human readability of the SOURCE record.
  • Only the relevant tokens need to appear in an entry
  • Molecules prepared by purely chemical synthetic methods are described by the specification SYNTHETIC followed by "YES" or an optional value, such as NON-BIOLOGICAL SOURCE or BASED ON THE NATURAL SEQUENCE. ENGINEERED must appear in the COMPND record.
  • In the case of a chemically synthesized molecule using a biologically functional sequence (nucleic or amino acid), SOURCE reflects the biological origin of the sequence and COMPND reflects its synthetic nature by inclusion of the token ENGINEERED. The token SYNTHETIC appears in SOURCE.
  • If made from a synthetic gene, ENGINEERED appears in COMPND and the expression system is described in SOURCE (SYNTHETIC does NOT appear in SOURCE).
  • If the molecule was made using recombinant techniques, ENGINEERED appears in COMPND and the system is described in SOURCE.
  • When multiple macromolecules appear in the entry, each MOL_ID, as given in the COMPND record, must be repeated in the SOURCE record along with the source information for the corresponding molecule.
  • Hybrid molecules prepared by fusion of genes are treated as multi-molecular systems for the purpose of specifying the source. The token FRAGMENT is used to associate the source with its corresponding fragment.
  •  - When necessary to fully describe hybrid molecules, tokens may appear
       more than once for a given MOL_ID.
     - All relevant token: value pairs that taken together fully describe 
       each fragment are grouped following the appropriate FRAGMENT.
     - Descriptors relative to the full system appear before the FRAGMENT 
       (see Example 3 below).
    

  • ORGANISM_SCIENTIFIC provides the Latin genus and species. Virus names are listed as the scientific name.
  • Cellular origin is described by giving cellular compartment, organelle, cell, tissue, organ, or body part from which the molecule was isolated.
  • CELLULAR_LOCATION may be used to indicate where in the organism the compound was found. Examples are: extracellular, periplasmic, cytosol.
  • Entries containing molecules prepared by recombinant techniques are described as follows:
  •  - The expression system is described.
     - The organism and cell location given are for the source of 
       the gene used in the cloning experiment.
     - Transgenic organisms, such as mouse producing human proteins, 
       are treated as expression systems.
    

  • For a theoretical modeling experiment, SOURCE describes the modelled compound just as though it were an experimental study.
  • New tokens may be added by the PDB.
  • Verification/Validation/Value Authority Control

    The biological source is compared to that found in the sequence databases.

    Relationships to Other Record Types

    Each macromolecule listed in COMPND must have a corresponding source.

    Example

             1         2         3         4         5         6         7
    1234567890123456789012345678901234567890123456789012345678901234567890
    SOURCE    MOL_ID: 1;
    SOURCE   2 ORGANISM_SCIENTIFIC: AVIAN SARCOMA VIRUS;
    SOURCE   3 STRAIN: SCHMIDT-RUPPIN B;
    SOURCE   4 EXPRESSION_SYSTEM: ESCHERICHIA COLI;
    SOURCE   5 EXPRESSION_SYSTEM_PLASMID: PRC23IN
    
    
    SOURCE    MOL_ID: 1;
    SOURCE   2 ORGANISM_SCIENTIFIC: GALLUS GALLUS;
    SOURCE   3 ORGANISM_COMMON: CHICKEN;
    SOURCE   4 ORGAN: HEART;
    SOURCE   5 TISSUE: MUSCLE
    
    
    SOURCE    MOL_ID: 1;
    SOURCE   2 EXPRESSION_SYSTEM: ESCHERICHIA COLI;
    SOURCE   3 EXPRESSION_SYSTEM_STRAIN: BE167;
    SOURCE   4 FRAGMENT: RESIDUES 1-16;
    SOURCE   5 ORGANISM_SCIENTIFIC: BACILLUS AMYLOLIQUEFACIENS;
    SOURCE   6 EXPRESSION_SYSTEM: ESCHERICHIA COLI;
    SOURCE   7 FRAGMENT: RESIDUES 17-214;
    SOURCE   8 ORGANISM_SCIENTIFIC: BACILLUS MACERANS
    


    KEYWDS

    Overview

    The KEYWDS record contains a set of terms relevant to the entry. Terms in the KEYWDS record provide a simple means of categorizing entries and may be used to generate index files. This record addresses some of the limitations found in the classification field of the HEADER record. It provides the opportunity to add further annotation to the entry in a concise and computer-searchable fashion.

    Record Format

    COLUMNS        DATA TYPE       FIELD          DEFINITION                         
    ---------------------------------------------------------------------------------
     1 -  6        Record name     "KEYWDS"                                          
     9 - 10        Continuation    continuation   Allows concatenation of records if necessary
    11 - 70        List            keywds         Comma-separated list of keywords   
                                                  relevant to the entry.            
    

    Details

  • The KEYWDS record contains a list of terms relevant to the entry, similar to that found in journal articles. A phrase may be used if it presents a single concept (e.g., reaction center). Terms provided in this record may include those that describe the following:
  •   - Functional classification. 
      - Metabolic role. 
      - Known biological or chemical activity. 
      - Structural classification. 
    

  • Other classifying terms may be used. No ordering is required for these terms. A number of PDB entries contain complexes of macromolecules. In these cases, all terms applicable to each molecule should be provided.
  • Note that the terms in the KEYWDS record duplicate those found in the classification field of the HEADER record. Terms abbreviated in the HEADER record are unabbreviated in KEYWDS, and the parentheses used in HEADER are optional in KEYWDS.
  • Verification/Validation/Value Authority Control

    Terms used in the KEYWDS record are subject to scientific and editorial review. A list of terms, definitions, and synonyms will be maintained at the PDB. Every attempt will be made to provide some level of consistency with keywords used in other biological databases.

    Relationships to Other Record Types

    HEADER records contain a classification term which must also appear in KEYWDS. Scientific judgment will dictate when terms used in one entry to describe a molecule should be included in other entries with the same or similar molecules.

    Example

             1         2         3         4         5         6         7
    1234567890123456789012345678901234567890123456789012345678901234567890
    KEYWDS    LYASE, TRICARBOXYLIC ACID CYCLE, MITOCHONDRION, OXIDATIVE
    KEYWDS   2 METABOLISM
    


    EXPDTA

    Overview

    The EXPDTA record presents information about the experiment.

    The EXPDTA record identifies the experimental technique used. This may refer to the type of radiation and sample, or include the spectroscopic or modeling technique. Permitted values include:

           ELECTRON DIFFRACTION
           ELECTRON MICROSCOPY
           CRYO-ELECTRON MICROSCOPY
           SOLUTION SCATTERING, THEORETICAL MODEL
           FIBER DIFFRACTION
           FLUORESCENCE TRANSFER
           NEUTRON DIFFRACTION
           NMR (may have a qualifier e.g. number of models see examples below)
           SOLUTION SCATTERING
           THEORETICAL MODEL*
           X-RAY DIFFRACTION
    

  • Note: As of July 1, 2002, models are available from a directory separate from the main archive at https://ftp.rcsb.org/pub/pdb/data/structures/models/current/. As of October 15, 2006, theoretical models are no longer accepted for deposition.
  • Record Format

    COLUMNS       DATA TYPE      FIELD         DEFINITION                          
    -------------------------------------------------------
     1 -  6       Record name    "EXPDTA"                                          
     9 - 10       Continuation   continuation  Allows concatenation 
                                               of multiple records
    11 - 70       SList          technique     The experimental technique(s) 
                                               with optional comment describing 
                                               the sample or experiment. 
    

    Details

  • EXPDTA is mandatory and appears in all entries.
  • The technique must match one of the permitted values. See above.
  • If more than one model appears in the entry, the number of models included must be stated.
  • If only one model appears in the entry, its significance must be stated, such as it being a minimized average or regularized mean structure.
  • If more than one technique was used for the structure determination and is being represented in the entry, EXPDTA presents the techniques as a semi-colon separated list. Each technique may have a comment, which appears before the semi-colon.
  • Verification/Validation/Value Authority Control

    The verification program checks that the EXPDTA record appears in the entry and that the technique matches one of the allowed values. It also checks that the relevant standard REMARK is added in the case of NMR, fiber, or theoretical modeling studies, and that the correct CRYST1 and SCALE are used in these cases. If an entry contains multiple models, the verification program checks for the correct number of matching MODEL/ENDMDL records.

    Relationships to Other Record Types

    If the experiment is an NMR, fiber, or theoretical modeling study, this may be stated in the TITLE, and the appropriate EXPDTA and REMARK records should appear. Specific details of the data collection and experiment appear in the REMARKs.

    In the case of a polycrystalline fiber diffraction study, CRYST1 and SCALE contain the normal unit cell data.

    Example

             1         2         3         4         5         6         7
    1234567890123456789012345678901234567890123456789012345678901234567890
    EXPDTA    X-RAY DIFFRACTION
    
    EXPDTA    NEUTRON DIFFRACTION; X-RAY DIFFRACTION
    
    EXPDTA    NMR, 32 STRUCTURES
    
    EXPDTA    NMR, REGULARIZED MEAN STRUCTURE
    
    EXPDTA    FIBER DIFFRACTION
    


    AUTHOR

    Overview

    The AUTHOR record contains the names of the people responsible for the contents of the entry.

    Record Format

    COLUMNS       DATA TYPE      FIELD         DEFINITION                             
    -------------------------------------------------------
     1 -  6       Record name    "AUTHOR"                                             
     9 - 10       Continuation   continuation  Allows concatenation 
                                               of multiple records
    11 - 70       List           authorList    List of the author names, 
                                               separated by commas.
    

    Details

  • The authorList field lists author names separated by commas with no subsequent spaces.
  • Representation of personal names:
  •  - First and middle names are indicated by initials, each followed by a period, and precede the surname.
     - Only the surname (family or last name) of the author is given in full.
     - Hyphens can be used if they are part of the author's name.
     - Apostrophes are allowed in surnames.
     - Umlauts and other character modifiers are not given.
    

  • Structure of personal names:
  •   - There is no space after any initial and its following period.
      - Blank spaces are used in a name only if properly part of 
        the surname (e.g., J.VAN DORN), or between surname and 
        Junior, II, or III.
      - Abbreviations that are part of a surname, such as St. or Ste., 
        are followed by a period and a space before the next part 
        of the surname.
    
  • Representation of corporate names:
  •   - Group names used for one or all of the authors should 
        be spelled out in full.
      - The name of the larger group comes before the name of 
        a subdivision, e.g., University of
        Somewhere Department of Chemistry.
    
  • Structure of list:
  •   - Line breaks between multiple lines in the author
        List occur only after a comma.
      - Personal names are not split across two lines.
    
  • Special cases:
  •   - Names are given in English if there is an accepted 
        English version; otherwise in the native
        language, transliterated if necessary.
      - "ET AL." may be used when all authors are not individually listed.
    

    Verification/Validation/Value Authority Control

    The verification program checks that the authorList field is correctly formatted. It does not perform any spelling checks or name verification.

    Relationships to Other Record Types

    The format of the names in the AUTHOR record is the same as in JRNL and REMARK 1 references.

    Example

             1         2         3         4         5         6         7
    1234567890123456789012345678901234567890123456789012345678901234567890
    AUTHOR    M.B.BERRY,B.MEADOR,T.BILDERBACK,P.LIANG,M.GLASER,
    AUTHOR   2 G.N.PHILLIPS JUNIOR,T.L.ST. STEVENS
    


    REVDAT

    Overview

    REVDAT records contain a history of the modifications made to an entry since its release.

    Record Format

    COLUMNS    DATA TYPE      FIELD         DEFINITION                             
    --------------------------------------------------------
     1 -  6    Record name    "REVDAT"                                             
     8 - 10    Integer        modNum        Modification number.                   
    11 - 12    Continuation   continuation  Allows concatenation of multiple records
    14 - 22    Date           modDate       Date of modification (or release for   
                                            new entries).  This is not repeated    
                                            on continuation lines.                 
    24 - 28    String(5)      modId         Identifies this particular             
                                            modification.  It links to the         
                                            archive used internally by PDB.        
                                            This is not repeated on continuation lines
    32         Integer        modType       An integer identifying the type of     
                                            modification.  In case of revisions    
                                            with more than one possible modType,   
                                            the highest value applicable will be assigned
    40 - 45    LString(6)     record        Name of the modified record.           
    47 - 52    LString(6)     record        Name of the modified record.           
    54 - 59    LString(6)     record        Name of the modified record.           
    61 - 66    LString(6)     record        Name of the modified record.
    

    Details

  • Each time revisions are made to the entry, a modification number is assigned in increasing (by 1) numerical order. REVDAT records appear in descending order (most recent modification appears first). New entries have a REVDAT record with modNum equal to 1 and modType equal to 0. Allowed modTypes are:
  •     0         Initial released entry.
        1         Miscellaneous - mostly typographical.
        2         Modification of a CONECT record.
        3         Modification to coordinates or transformations.
    
  • Each revision may have more than one REVDAT record, and each revision has a separate continuation field.
  • Verification/Validation/Value Authority Control

    The modType must be one of the defined types, and the given record type must be valid. If modType is 0, the modId must match the entry's ID code in the HEADER record.

    Relationships to Other Record Types

    REMARK 860 presents the correction or change that is made to an entry. Also, see verification above.

    Example

             1         2         3         4         5         6         7
    1234567890123456789012345678901234567890123456789012345678901234567890
    REVDAT   3   15-OCT-89 1PRC    1       REMARK
    REVDAT   2   19-APR-89 1PRC    2       CONECT
    REVDAT   1   09-JAN-89 1PRC    0
    


    SPRSDE

    Overview

    The SPRSDE records contain a list of the ID codes of entries that were made obsolete by the given coordinate entry and withdrawn from the PDB release set. One entry may replace many.

    It is PDB policy that only the principal investigator of a structure has the authority to withdraw it.

    Record Format

    COLUMNS       DATA TYPE      FIELD         DEFINITION                             
    ----------------------------------------------------------------------------------
     1 -  6       Record name    "SPRSDE"                                             
     9 - 10       Continuation   continuation  Allows for multiple ID codes.          
    12 - 20       Date           sprsdeDate    Date this entry superseded the         
                                               listed entries. This field is not      
                                               copied on continuations.               
    22 - 25       IDcode         idCode        ID code of this entry.  This field     
                                               is not copied on continuations.        
    32 - 35       IDcode         sIdCode       ID code of a superseded entry.         
    37 - 40       IDcode         sIdCode       ID code of a superseded entry.         
    42 - 45       IDcode         sIdCode       ID code of a superseded entry.         
    47 - 50       IDcode         sIdCode       ID code of a superseded entry.         
    52 - 55       IDcode         sIdCode       ID code of a superseded entry.         
    57 - 60       IDcode         sIdCode       ID code of a superseded entry.         
    62 - 65       IDcode         sIdCode       ID code of a superseded entry.         
    67 - 70       IDcode         sIdCode       ID code of a superseded entry.         
    

    Details

  • The ID code list is terminated by the first blank sIDcode field.
  • Verification/Validation/Value Authority Control

    PDB checks that the superseded entries have actually been withdrawn from release.

    Relationships to Other Record Types

    The sprsdeDate is usually the date the entry is released, and therefore matches the date in the REVDAT 1 record. The ID code found in the idCode field must be the same as one found in the idCode field of the HEADER record.

    Example

             1         2         3         4         5         6         7
    1234567890123456789012345678901234567890123456789012345678901234567890
    SPRSDE     17-JUL-84 4HHB      1HHB
    
    SPRSDE     27-FEB-95 1GDJ      1LH4 2LH4
    


    JRNL

    Overview

    The JRNL record contains the primary literature citation that describes the experiment which resulted in the deposited coordinate set. There is at most one JRNL reference per entry. If there is no primary reference, then there is no JRNL reference. Other references are given in REMARK 1.

    Record Format

    COLUMNS    DATA TYPE      FIELD     DEFINITION                                   
    -----------------------------------------------
     1 -  6    Record name    "JRNL  "                                               
    13 - 70    LString        text      See Details below.                            
    

    Details

  • The following tables are used to describe the sub-record types of the JRNL record.
  • The AUTH sub-record is mandatory in JRNL. This is followed by TITL, EDIT, REF, PUBL, and REFN sub- record types. REF and REFN are also mandatory in JRNL. EDIT and PUBL may appear only if the reference is to a non-journal.
  • 1. AUTH

  • AUTH contains the list of authors associated with the cited article or contribution to a larger work (i.e., AUTH is not used for the editor of a book).
  • The author list is formatted similarly to the AUTHOR record. It is a comma-separated list of names. Spaces at the end of a sub-record are not significant; all other spaces are significant. See the AUTHOR record for full details.
  • The authorList field of continuation sub-records in JRNL differs from that in AUTHOR by leaving no leading blank in column 20 of any continuation lines.
  • One author's name, consisting of the initials and family name, cannot be split across two lines. If there are continuation sub-records, then all but the last sub-record must end in a comma.
  • COLUMNS        DATA TYPE       FIELD          DEFINITION                         
    -------------------------------------------------------------------------------
     1 -  6        Record name     "REMARK"                                          
    10             LString(1)      "1"                                               
    13 - 16        LString(4)      "AUTH"         Appears on all continuation records
    17 - 18        Continuation    continuation   Allows a long list of authors.     
    20 - 70        List            authorList     List of the authors.               
    

    2. TITL

  • TITL specifies the title of the reference. This is used for the title of a journal article, chapter, or part of a book. The TITL line is omitted if the author(s) listed in authorList wrote the entire book (or other work) listed in REF and no section of the book is being cited.
  • If an article is in a language other than English and is printed with an alternate title in English, the English language title is given, followed by a space and then the name of the language (in its English form, in square brackets) in which the article is written.
  • If the title of an article is in a non-Roman alphabet the title is transliterated.
  • The actual title cited is reconstructed in a manner identical to other continued records, i.e., trailing blanks are discarded and the continuation line is concatenated with a space inserted.
  • A line cannot end with a hyphen. A compound term (two elements connected by a hyphen) or chemical names which include a hyphen must appear on a single line, unless they are too long to fit on one line, in which case the split is made at a normally-occurring hyphen. An individual word cannot be hyphenated at the end of a line and put on two lines. An exception is when there is a repeating compound term where the second element is omitted, e.g., "DOUBLE- AND TRIPLE-RESONANCE". In such a case the non-completed word "DOUBLE-" could end a line and not alter reconstruction of the title.
  • COLUMNS        DATA TYPE       FIELD          DEFINITION                         
    -------------------------------------------------------------------------------
     1 -  6        Record name     "REMARK"                                          
    10             LString(1)      "1"                                               
    13 - 16        LString(4)      "TITL"         Appears on all continuation records
    17 - 18        Continuation    continuation   Permits long titles.               
    20 - 70        LString         title          Title of the article. 
    

    3. EDIT

  • EDIT appears if editors are associated with a non-journal reference. The editor list is formatted and concatenated in the same way that author lists are.
  • COLUMNS        DATA TYPE       FIELD          DEFINITION                         
    -------------------------------------------------------------------------------
     1 -  6        Record name     "REMARK"                                          
    10             LString(1)      "1"                                               
    13 - 16        LString(4)      "TITL"         Appears on all continuation records
    17 - 18        Continuation    continuation   Permits long titles.               
    20 - 70        LString         title          Title of the article.
    

    4. REF

  • REF is a group of fields that contain either the publication status or the name of the publication (and any supplement and/or report information), volume, page, and year. There are two forms of this sub- record group, depending upon the citation's publication status.
  • 4a. If the reference has not been published yet, the sub-record type group has the form:
  •  COLUMNS         DATA TYPE            FIELD                   DEFINITION
    -------------------------------------------------------------------------
     1 -  6         Record name          "JRNL "
    13 - 16         LString(3)           "REF"
    20 - 34         LString(15)          "TO BE PUBLISHED"
    

    Publication name (first item in pubName field):

  • If the publication is a serial (i.e., a journal, an annual, or other non-book or non-monographic item issued in parts and intended to be continued indefinitely), use the abbreviated name of the publication as listed in PubMed and with periods.
  • If the publication is a book, monograph, or other non-serial item, use its full name according to the Anglo-American Cataloging Rules, 2nd Ed., 1988 revision (AACR2R). (Non-serial items include theses, videos, computer programs, and anything that is complete in one or a finite number of parts.) If there is a sub-title, and the item is verified in an online catalog, it will be included using the same punctuation as in the source of verification. Preference will be given to verification using cataloging of the Library of Congress, the National Library of Medicine, and the British Library, in that order.
  • If a book is part of a monographic series: the full name of the book (according to AACR2R) is listed first, followed by the name of the series in which it was published. The series information is given within parentheses and the series name is preceded by "IN:" and a space. If the series has A.C.S. abbreviation, that abbreviation should be used; otherwise the series name should be listed in full. If applicable, the series name should be followed, after a comma and a space, by a volume (V.) and/or number (NO.) and/or part (PT.) indicator and the relevant characters to indicate its number and/or letter in the series.
  • Supplement (follows publication name in pubName field):

  • If a reference is in a supplement to the volume listed, or if information about a "part" is needed to distinguish multiple parts with the same page numbering, such information should be put in the REF sub-record.
  • A supplement indication should follow the name of the publication and should be preceded by a comma and a space. Supplement should be abbreviated as "SUPPL." If there is a supplement number or letter, it should follow "SUPPL." without an intervening space. A part indication should also follow the name of the publication and be preceded by a comma and a space. A part should be abbreviated as "PT.", and the number or letter should follow without an intervening space.
  • If there is both a supplement and a part, their order should reflect the order printed on the work itself.
  • Report (follows publication name and any supplement or part information in pubName field):

  • If a book has a report designation, the report information should follow the title and precede series information. The name and number of the report is given in parentheses, and the name is preceded by "REPORT:" and a space.
  • Reconstruction of publication name:

  • The name of the publication is reconstructed by removing any trailing blanks in the pubName field, and concatenating all of the pubName fields from the continuation lines with an intervening space. There are two conditions where no intervening space is added between lines: when the pubName field on a line ends with a hyphen or a period, or when the line ends with a hyphen (-). When the line ends with a period (.), add a space if this is the only period in the entire pubName field; do not add a space if there are two or more periods throughout the pubName field, excluding any periods after the designations "SUPPL", "V", "NO", or "PT".
  • Volume, page, and year (volume, page, year fields respectively):

  • The REF sub-record type group also contains information about volume, page, and year when applicable.
  • In the case of a monograph with multiple volumes which is also in a numbered series, the number in the volume field represents the number of the book, not the series. (The volume number of the series is in parentheses with the name of the series, as described above under publication name.)
  • COLUMNS        DATA TYPE           FIELD              DEFINITION
    --------------------------------------------------------------------------------
     1 -  6        Record name         "JRNL "
    13 - 16        LString(3)          "REF"
    17 - 18        Continuation continuation       Allows long publication names.
    20 - 47        LString             pubName     Name of the publication including
                                                   section or series designation. This is
                                                   the only field of this sub-record which
                                                   may be continued on successive
                                                   sub-records.
    50 - 51        LString(2)          "V."        Appears in the first sub-record only,
                                                   and only if column 55 is non-blank.
    52 - 55        String              volume      Right-justified blank-filled volume
                                                   information; appears in the first
                                                   sub-record only.
    57 - 61        String              page        First page of the article; appears in the
                                                   first sub-record only.
    63 - 66        Integer             year        Year of publication; first sub-record
                                                   only.
    

    5. PUBL

  • PUBL contains the name of the publisher and place of publication if the reference is to a book or other non- journal publication. If the non-journal has not yet been published or released, this sub- record is absent.
  • The place of publication is listed first, followed by a space, a colon, another space, and then the name of the publisher/issuer. This arrangement is based on the ISBD(M) International Standard Bibliographic Description for Monographic Publications (Rev.Ed., 1987) and AACR2R and is used in public online catalogs in libraries. Details on the contents of PUBL are given below.
  • Place of publication:

  • Give the place of publication. If the name of the country, state, province, etc. is considered necessary to distinguish the place of publication from others of the same name, or for identification, then follow the city with a comma, a space, and the name of the larger geographic area.
  • If there is more than one place of publication, only the first listed will be used. If an online catalog record is used to verify the item, the first place listed there will be used, omitting any brackets. Preference will be given to the cataloging done by the Library of Congress, the National Library of Medicine, and the British Library, in that order.
  • Publisher's name (or name of other issuing entity):

  • Give the name of the publisher in the shortest form in which it can be understood and identified internationally, according to AACR2R rule 1.4D.
  • If there is more than one publisher listed in the publication, only the first will be used in the PDB file. If an online catalog record is used to verify the item, the first place listed there will be used for the name of the publisher. Preference will be given to the cataloging of the Library of Congress, the National Library of Medicine, and the British Library, in that order.
  • Ph.D. and other theses:

  • Theses are presented in the PUBL record if the degree has been granted and the thesis made available for public consultation by the degree-granting institution.
  • The name of the degree-granting institution (the issuing agency) is followed by a space and "(THESIS)".
  • Reconstruction of place and publisher:

  • The PUBL sub-record type can be reconstructed by removing all trailing blanks in the pub field and concatenating all of the pub fields from the continuation lines with an intervening space.
  • Continued lines do not begin with a space.

    COLUMNS         DATA TYPE           FIELD                DEFINITION
    -------------------------------------------------------------------------------
     1 -  6         Record name         "JRNL "
    13 - 16         LString(4)          "PUBL"
    17 - 18         Continuation    continuation    Allows long publisher and place names.
    20 - 70         LString             pub        City of publication and name of the
                                                   publisher/institution.
    

    6. REFN

  • REFN is a group of fields that contain encoded references to the citation. No continuation lines are possible. Each piece of coded information has a designated field.
  • The country field is blank if the reference was published in more than one country.
  • If more than one ISBN is known, select one that matches the individual volume cited (if it happens to be in a set that also has an ISBN for the set). If the reason for multiple ISBNs is that the publication is issued in more than one country, use the ISBN for the country of the first listed place of publication. If there are hardcover and paperback ISBN numbers, use the ISBN for the hardbound version.
  • There are two forms of this sub-record type group, depending upon the publication status.
  • 6a. This form of the REFN sub-record type group is used if the citation has not been published.

    COLUMNS     DATA TYPE            FIELD    
    -------------------------------------------
     1 -  6     Record name          "JRNL "
    13 - 16     LString(4)           "REFN"
    

    6b. This form of the REFN sub-record type group is used if the citation has been published.

    COLUMNS        DATA TYPE           FIELD               DEFINITION
    -----------------------------------------------------------------
     1 -  6        Record name         "JRNL "
    13 - 16        LString(4)          "REFN"
    20 - 23        LString(4)          "ASTM"
    25 - 30        LString(6)          astm          ASTM devised coden.
    33 - 34        LString(2)          country       Country of publication code as defined
                                                     in the OCLC/MARC cataloging format
                                                     (optional).
    36 - 39        LString(4)          "ISBN"        International Standard Book Number or
                                       "ISSN" or     International Standard Serial Number.
                                       "ESSN"
    41 - 65        LString             isbn          ISSN or ISBN number (final digit may 
                                                     be a letter and may contain one or 
                                                     more dashes).
    

    Verification/Validation/Value Authority Control

    PDB verifies that this record is correctly formatted.

    Citations appearing in JRNL may not also appear in REMARK 1.

    Relationships to Other Record Types

    The publication cited as the JRNL record may not be repeated in REMARK 1.

    Example

             1         2         3         4         5         6         7
    1234567890123456789012345678901234567890123456789012345678901234567890
    JRNL        AUTH   G.FERMI,M.F.PERUTZ,B.SHAANAN,R.FOURME              
    JRNL        TITL   THE CRYSTAL STRUCTURE OF HUMAN DEOXYHAEMOGLOBIN AT 
    JRNL        TITL 2 1.74 A RESOLUTION                                  
    JRNL        REF    J.MOL.BIOL.                   V. 175   159 1984    
    JRNL        REFN   ASTM JMOBAK  UK ISSN 0022-2836                     
    

    Known Problems

  • Interchange of bibliographic information and linking with other databases is hampered by the lack of labels or specific locations for certain types of information or by more than one type of information being in a particular location. This is most likely to occur with books, series, and reports. Some of the points below provide details about the variations and/or blending of information.
  • Titles of the publications that require more than 28 characters on the REF line must be continued on subsequent lines. There is some awkwardness due to volume, page, and year appearing on the first REF line, thereby splitting up the title.
  • Information about a supplement and its number/letter is presented in the publication's title field (on the REF lines in columns 20 - 47).
  • When series information for a book is presented, it is added to the REF line. The number of REF lines can become large in some cases because of the 28-column limit for title information in REF.
  • There is often an ISBN for a book title and a separate ISSN for the series in which it was published. There is no way to present more than one of these.
  • Books that are issued in more than one series are not accommodated.
  • Many books are issued in more than one country. The publisher has a separate ISBN number in each country. There is no place to put any additional applicable ISBN numbers.
  • The country code prefix of the ISBN may not match the country of the place of publication that is listed on the PUBL line when a book is published in more than one country.
  • Pagination is limited to the beginning page.
  • There is no place for listing a reference's accession number in another database.

  • � 2007 wwPDB