Tip This FAQ last updated: 23 August 2016

1. Type of reports

1.1. What are the different types of validation reports?

our different types of validation reports are produced at different stages of the preparation, deposition, annotation and public release of a macromolecular structure.

Validation Server Preliminary Report

This kind of report is produced by the Validation Server whenever you want to validate your structure/data, at any time prior to deposition. See also FAQ: What is a "Preliminary" Validation Report?

On Deposition Preliminary Report

This report is produced during the initial deposition process using the OneDep deposition system http://deposit.wwpdb.org/deposition/. See also FAQ: What is a "Preliminary" Validation Report?

Confidential Report

This report is produced once at the annotation stage after a PDB ID has been issued for the structure. Its title page shows the PDB ID, Title and the date of deposition date. It includes a pink diagonal watermark CONFIDENTIAL VALIDATION REPORT on every page. This report The confidential validation reports are sent only to the depositors (who may choose to submit them with their manuscripts).

Report for a publicly released PDB Entry

Validation reports are currently available for all released PDB entries determined by X-ray crystallography, NMR or EM. See wwPDB Validation Reports help top page for details on how to obtain the reports. Reports for all PDB entries will be updated annually.

For each of the 4 types of reports two different lengths of report are available:

Summary

This is the short form where if outliers are found only the worst 5 of each category are listed. These are normally the worst 5 outliers.

Full

This is the longer form where every outlier found is listed.

Note That if there are no or fewer than 5 outliers found in every category checked the Summary and Full reports will be identical

1.2. What is a 'Preliminary' Validation Report?

Preliminary reports are produced by the Validation Server and when a structure is initially deposited using the OneDep deposition system http://deposit.wwpdb.org/deposition/.

Preliminary reports include a pink diagonal watermark PRELIMINARY VALIDATION REPORT on every page. The preliminary report is not proof of deposition and should not be be submitted to journals.

The reports are described as "preliminary" because there are currently limitations in the checks made: * The sequence of the structure is taken from the input coordinate file and no check is made that this matches the sequence of the macromolecule from the depositor and/or external databases. * Ligands are not matched against the chemical components definition (PDB CCD) for the ligand even when it is present in the PDB CCD. Because of this Mogul checks of the ligand geometry can sometimes give erroneous results because the chemistry of the ligand is derived solely from the input coordinates in the coordinate file. This is unreliable. We intend to improve treatment shortly.

1.3. The validation report only lists up to 5 outliers - what if I want to see more?

If the report you are looking at truncates the list of outliers then you are looking at a summary report. To see all the outliers found, then good back and download full report instead.

1.4. The validation report is too long with very long tables of outliers - how can I see a shorter version?

You are looking a full report that lists all outliers. If you look instead at the summary report only the worst five outliers will be listed for each category.

2. Validation reports and the deposition process

2.1. What information will be made available about a deposited structure before the coordinates and experimental data are released?

Similar to a typical manuscript submission process (where information is confidential until a paper is published), only limited information is made publicly available prior to release of a PDB entry (see also next FAQ about timings).

The confidential validation reports are sent only to the depositors (who may choose to submit them with their manuscripts). They include the results of geometry checks, structure factor validation, and ligand validation. Coordinates and experimental data are not included in the validation report.

Mandatory submission of wwPDB validation reports has not had an impact on the number of submissions to IUCr journals.

2.2. What is the timing on making information about a deposited structure publicly available? What is the timing for releasing the coordinates, etc.?

The validation reports are generated as part of the current wwPDB curation pipelines, and do not affect any timings. Please see the Release of PDB Entries section of the wwPDB Processing Procedures and Policies Document for more information about timings.

2.3. How much control do authors have over the release of information?

During deposition, validation reports are only provided to the depositors, and are not provided by the wwPDB to journals or other third parties. Once a structure is released a validation report will be made available for it like other PDB entries.

2.4. I have deposited a structure with the PDB but not received a report. Why not?

The likely reason is the Confidential Validation Report is only produced after the structure has been annotated by a biocurator and a PDB id has been issued. The annotation process takes time.

It should also be noted that validation reports are only provided for new depositions of X-ray crystal, NMR or 3DEM structures and subject to the successful completion of the underlying calculations. For instance, no reports are currently created for structures determined by neutron diffraction and occasionally a software problem may preclude generation of a report for a structure (we endeavour to fix any such problems as soon as possible, where needed in collaboration with the authors of the software).

3. The Validation Server

3.1. Can I get a validation report for my own intermediate or unpublished structure?

Validation reports can now be generated on demand by using the wwPDB Validation Server https://validate.wwpdb.org. This now works for structure produced for X-ray structures, 3D electron microscopy and NMR methods.

3.2. Where can I get more information on the wwPDB Validation Server?

3.3. Is the validation package available for me to assess my structures?

The wwPDB Validation Server https://validate.wwpdb.org allows the production of reports for your structures.

The validation package software has not been made available for distribution as it includes a lot of PDB-specific code that makes difficult to use elsewhere. Instead a publically accessible standalone validation server is provided to enable a user to produce validation reports prior to deposition. Most of the tools used by the validation server are available separately.

4. Validation reports for existing PDB structures

4.1. Can I get a validation report for an existing structure in the PDB?

In 2014, validation reports for all existing X-ray crystal structures in the PDB archive were made publicly available through the wwPDB ftp sites. Validation reports for a particular PDB entry are also available from the entry’s page at RCSB PDB, PDBe or PDBj. Validation reports for all Nuclear Magnetic Resonance (NMR) and 3D Cryo Electron Microscopy (3DEM) structures already represented in the global PDB archive were made publicly available in May 2016 (announcement). The wwPDB plans to recalculate statistical distributions as well as validation reports for all entries once a year. All these data will be made publicly available to encourage downstream use by software developers, bioinformaticians and other PDB users. In addition, wwPDB expects to reconvene Validation Task Forces once every 5 years or so to assess the validation pipelines, reports and protocols and to recommend any changes, updates or additions.

4.2. Do all entries have a validation report?

There are a few structures determined by X-ray crystallography, 3D Cryo Electron Microscopy or NMR methods for which the validation report PDF or multipercentile slider assessments are missing. In addition, there are also cases where one of the components of the validation report has not successfully run and so is not available in the report and sliders. Periodically as part of the development process each failure is examined and improvements made to resolve as many as possible.

4.3. The percentile ranks for an entry have changed since last year - why?

As the PDB archive continues to grow, we recalculate the statistics underlying the percentile ranks once a year. This process causes small changes in the percentile ranks of existing as new, normally “better” structures are added to the archive.

5. Validation report contents

5.1. What to do if you think a validation report for a structure gets things wrong.

Please let us know by emailing validation@mail.wwpdb.org providing as much detail as possible of the problem.

5.2. Why do REMARK 500 and the validation report differ?

Because different programs are used in preparing REMARK 500 and the validation report. In some cases the validation reports metrics are more up to date, for instance the Molprobity Ramachandran analysis is based on more data than the REMARK 500 analysis that uses the older Kleywegt and Jones (1996) study.

5.3. Why are there clashes reported between hydrogen atoms that are not present in the deposited model?

The MolProbity clashscore works by adding hydrogen atoms to the structure and then analyzing whether there are clashes. This is particularly useful in finding regions could be improved in structures refined without explicit or riding hydrogen atoms (and less so in structures where hydrogen atoms are considered in refinement).

5.4. What to do about reported clashes?

If the structure has a poor clash score then this could indicate that:

  • it is poorly built overall or

  • that there are regions that are poorly built or

  • that the refinement has not been allowed converge or

  • the relative weighting of the geometry versus X-ray term in refinement has gone wrong.

The validation report listing of clashes is not that useful. The Coot program has a useful feature "Validate" "Probe clashes" that uses (and requires) the MolProbity reduce and probe programs. This allows visualization of where in the structure the clashes arises and might point out where some rebuilding could be indicated (see http://kinemage.biochem.duke.edu/teaching/workshop/MolProbity/coot.html).

5.5. How can I view validation results in a molecular graphics program like Coot?

The most recent versions of the Coot program already has a facility to load the XML file produced as part of the validation process and provide a GUI that provides a more interactive way to click and be shown where outliers are found in a structure. Currently this facility is only available for released PDB entries loaded using the Coot menu option: File, Fetch PDB using Accession Code. We are working on a plugin to allow the procedure to work with validation XML files downloaded from the Validation Server or the OneDep deposition system.

6. Validation report contents: ligand geometry

6.1. How does PDB validate the geometry of ligand molecules?

The wwPDB validation report uses the CCDC Mogul program to validate the geometry of ligands against the Cambridge Structural Database (CSD) of small molecule organic structures. For each bond length, bond angle, torsion angle or ring in the ligands, Mogul identifies CSD structures that contain that feature and builds a distribution for the observed values. Engh & Huber showed how the CSD provides a good source for geometrical information for natural amino acids. Mogul facilitates a similar approach to be taken for ligands. For each bond, bond angle and torsion within a ligand Mogul identifies CSD small structures with a similar chemical environment and finds the distribution for the measure.

6.2. Why are there so many geometry outliers for my ligand?

There are a number of possible reasons. In some cases a poor method has been used to generate geometrical restraints for a ligand (see next FAQ ). An erroneous fit to electron density or refinement with problematic X-ray weights can also lead to outliers. Outliers can also be caused by the Validation Pipeline Software providing an incorrect description of the chemistry of the molecule to the Mogul tool. This sometimes happens in preliminary validation reports but is rare for final validation reports where the PDB chemical components definition for the ligand is used as a basis for the chemical description.

6.3. What ligand geometry standards do PDB use and why are they different from REFMAC/PHENIX/BUSTER?

The wwPDB validation report uses Mogul for ligand geometry reports (see above). Good ways to derive ligand restraints including information from small molecule structures are:

  • The ACEDRG program from CCP4 that uses an alternative small molecule structure open database COD.

  • Phenix Elbow (particularly if you have installed licensed Mogul program).

  • Global Phasing Grade program (provided you have installed licensed Mogul program).

  • The Global Phasing Grade Web Server http://grade.globalphasing.org/. This uses Mogul but does not require installation/licenses (but it should be used only for non-confidential ligands).

  • The CCP4 Pyrogen program (provided you have installed licensed Mogul program).

The use of any these programs should ensure that in the (new) validation report ligand geometry outliers should not arise from restraint issues.

6.4. Why are there inconsistencies in the ligand report between preliminary report generated at anonymous validation/deposition server and final report sent by PDB staff post annotation?

E.g., Outliers for ligands are not picked up at anonymous validation or deposition server, but were picked up during PDB processing.

In order for Mogul to produce reliable results it is essential for the program to be provided with the correct bond configuration of the ligand. For the final report the PDB chemical components dictionary is used in which bond orders are defined.

Currently the preliminary validation report uses the CCDC program Gold_utils to find the connectivity and assign connectivity and bond orders from the user provided coordinates alone. This procedure normally works well but can go wrong particularly when provided with distorted ligand geometry. We are currently working on a number of improvements:

  • To work with crystallographic software developers to include ligand chemical and restraint information in the mmCIF coordinate file used for deposition and validation. This will mean that reliable information will be available.

  • To provide user feedback as to the chemistry provided to Mogul so that assignment problems can be quickly diagnosed. This would ideally be provided as 2D chemical diagrams in the report.

  • If users provide a ligand with explicit hydrogen atoms the validator pipeline should use these in setting ligand bond orders. This should provide a potential work around solution that should work practically all the time.

6.5. Why are Mogul statistics not made available for me to analyze?

XML file produced with the validation report includes additional Mogul information on bond and angle outliers. It would be necessary to obtain CCDC permission of the release of the full Mogul output for a ligand (because of data mining issues).

7. Validation report contents: X-ray specific

7.1. How does PDB calculate RSRZ?

RSRZ are calculated by the EDS (Electron-Density Server) component of the validation pipeline which is a re-implementation of the software used by the Uppsala EDS server (Kleywegt et al., 2004). The process is done by:

  • Using the REFMAC program to calculate electron density maps based on the uploaded model and structure factors.

  • The fit between the model and the 2Fo-Fc electron density map is found by calculating the real-space R-value (RSR). RSR is a measure of the quality of fit between a part of an atomic model (in this case, one residue) and the data in real space (Jones et al., 1991). RSR is calculated using USF MAPMAN software tools (for a description see Tickle (2012) ).

  • The RSR Z-score (RSRZ) is a normalisation of RSR specific to a residue type and a resolution bin (Kleywegt et al., 2004). This means that RSRZ provides a comparison to the typical fit of a particular residue type for PDB structures at that resolution. RSRZ is calculated only for standard amino acids and nucleotides in protein, DNA and RNA chains.

7.2. Why does the validation report list RSRZ outliers when my examination of the refined electron density map shows that these residues have a good fit to density?

Currently the validation report assesses the fit to electron density using a map calculated using REFMAC program as part of the EDS procedure (see previous FAQ). If the validation report lists residues as RSRZ outliers that in your examination of electron density maps have a good fit then this most likely indicates that the REFMAC did not calculate maps correctly. This happens occasionally and please accept our apologies for this. The procedure currently is known to have shown problems with:

  • Lowish resolution structures that have been refined using Phenix where hydrogen atoms are involved.

  • Low resolution anisotropic structures.

  • Structures refined with Phenix twin option.

We are working on alternative procedure to improve the reliability of the fit to map validation.

7.3. Why is EDS server not made available for me to re-calculate RSR?

The wwPDB Validation Service http://validate.wwpdb.org is a standalone server where the Validation Pipeline software can be run for any model or structures the user wants to examine. The same software is used as that used on deposition and this can be used to find RSR and RSRZ.

7.4. Why is LLDF so high while RSR value is reasonably low for my ligands?

LLDF compares the RSR value of the ligand with that of surrounding protein/nucleic acid residues. The comparison uses the standard deviation of surrounding residues:

LLDF = (RSR_ligand -<RSR_site>)/σ(RSR_site)

High LLDF is an indication that the fit to electron density for a ligand is significantly worse than that of the protein and nucleic acid residues. Poorly placed ligands normally have a high value of LLDF but the reverse is not necessarily true.

PDB entry 3n86 provides an example where well placed ligand can have high LLDF value. 3n86 is a well built 1.9Å resolution structure that has two homo 12-mer assemblies of the enzyme. This means there are 24 very similar copies of the RJP ligand bound in the enzyme active site. The electron density for each of the 24 copies of the ligand is convincing. Both RSR and RSCC values are indicative of a good fit and are similar for all the copies. However, LLDF values vary from 0.1 to 3.6 depending on where the ligand density is noisier than the surrounding residues or vice versa.

We are working better produces to validate ligands within structures, including better metrics for fit to electron density, guided by the First wwPDB/CCDC/D3R Ligand Validation Workshop (http://dx.doi.org/10.1016/j.str.2016.02.017).

7.5. Why are calculated R factors different from what author reported, especially for BUSTER?

The RCSB utility DCC is used to re-calculate R factors for an entry using the uploaded coordinates and structure factors. DCC uses either REFMAC, phenix-refine or CNS but currently does not support BUSTER. BUSTER R factors are not directly comparable to the other programs as it uses a different approach to the X-ray maximum likelihood calculation.