PDB Versioned Archive
Since October 2017, the wwPDB versions PDB entries and distributes the latest and the prior versions of each entry via a versioned FTP archive accessible at ftp-versioned.wwpdb.org and its mirrors in the USA, UK and Japan.
PDB Versioned Repository
The PDB Versioned Repositories are updated every Wednesday at 00:00 UTC.
wwPDB: files-versioned.wwpdb.org, rsync://rsync-versioned.wwpdb.org
RCSB PDB (USA): files-versioned.rcsb.org, rsync://rsync-versioned.rcsb.org
PDBe (UK): ftp.ebi.ac.uk/pub/databases/pdb_versioned/
PDBj (Japan): ftp://ftp-versioned.pdbj.org, https://files-versioned.pdbj.org, rsync://rsync-versioned.pdbj.org
What is PDB Entry Versioning
Changes made to a PDB entry after its initial release are considered to be either “major” or “minor”. Updates to atomic coordinates, polymer sequence, or chemical description in the coordinate file trigger a major version increment, retaining the originally issued PDB accession code. Other changes to the metadata in the coordinate file are considered minor. Currently, no changes are permitted to the experimental data from which the coordinates are derived. To keep track of the changes between versions, a set of new revision categories were defined in the PDBx/mmCIF dictionary (http://mmcif.wwpdb.org/dictionaries/mmcif_pdbx_v50.dic/Groups/audit_group.html). The revision trail is included in the PDBx/mmCIF formatted coordinate files.
Apart from the inclusion of the new revision audit categories, the conventional PDB archive (ftp.wwpdb.org) will see no impact. It continues to use the familiar naming style and contains only the latest version for every entry.
The versioned FTP archive contains all major versions of a PDB structure.
Extended PDB accession codes
To plan for future growth of the PDB archive and to more closely conform with the "Findability" principle of the FAIR data management, we have extended the PDB accession codes from the familiar four character style to 8 characters prefixed with “pdb”. For example, PDB accession code for entry 1abc becomes pdb_00001abc. This new format of PDB accession codes will be included in the coordinate files at a later date. The versioned FTP tree uses the extended PDB accession codes in file naming.
Directory structure of the versioned PDB FTP tree
Unlike in the conventional FTP tree, in the versioned tree all files for a particular entry are stored in single directory (e.g., "pdb_00001abc"). These directories are grouped under a 2-character hash from the two penultimate characters of the PDB code (for the "pdb_00001abc" example, the hash would be "ab"):
../pdb_versioned/data/entries/<two-letter-hash>/<pdb_accession_code>/<entry_data_File_names>
Thus, all files for entry pdb_00001abc would be stored in the following directory:
../pdb_versioned/data/entries/ab/pdb_00001abc/
File names in the versioned PDB FTP tree
File names in the versioned FTP archive conform to a new naming scheme, which allows users to easily see the major and minor version numbers:
<PDB_ID>_<content_type>_v<major_version>-<minor_version>.<file_format_type>.<file_compression_type>
For example, the first initial release of PDB entry 1abc would have the following form under the new file-naming scheme:
pdb_00001abc_xyz_v1-0.cif.gz
where "xyz" stands for coordinate content; "cif" indicates the file format; and "gz" indicates a compressed UNIX archive file.
The first minor revision (e.g., update to the citation metadata) of PDB entry 1abc would then have the following name:
pdb_00001abc_xyz_v1-1.cif.gz
If PDB entry 1abc then had a major update (e.g., re-refinement by the authors to better represent the ligand, it would have the following name:
pdb_00001abc_xyz_v2-0.cif.gz (N.B.: The minor update number will be reset to zero every time a new
major update is made.)
Multiple Views of Versioned Repository
Different views of the repository are provided for content type and format as a convenience for repository users.
To access the absolute latest version of a coordinate mmCIF file, the version numbers in the file name should be omitted altogether. For example, for entry pdb_00001abc:
../pdb_versioned/views/latest/coordinates/mmcif/ab/pdb_00001abc/pdb_00001abc_xyz.cif.gz (N.B. the
absence of version numbers in the file)
→../pdb_versioned/data/entries/ab/pdb_00001abc/pdb_00001abc_xyz_v2-0.cif.gz (if version 2-0 were the
latest)
To access a specific major version of coordinate files for an entry, the minor version should be omitted in the file name. For example, for entry pdb_00001abc:
../pdb_versioned/views/all/coordinates/mmcif/ab/pdb_00001abc/pdb_00001abc_xyz_v1.cif.gz (N.B. the
absence of the minor version number)
→../pdb_versioned/data/entries/ab/pdb_00001abc/pdb_00001abc/pdb_00001abc_xyz_v1-2.cif.gz (if version 1
had two minor updates since it was released).
../pdb_versioned/views/all/coordinates/mmcif/ab/pdb_00001abc/pdb_00001abc/pdb_00001abc_xyz_v2.cif.gz
→../pub/pdb_versioned/data/entries/ab/pdb_00001abc/pdb_00001abc_xyz_v2-0.cif.gz (if version 2 had no
minor updates since it was released)