wwPDB: EM validation report user guide

last updated: 07 October 2020

The wwPDB/EMDataBank electron microscopy (EM)/electron crystallography (EC) validation reports are prepared according to the recommendations of the EM Validation Task Force (EM VTF; Henderson et al., 2012) and reuse common elements from X-ray reports (Read et al., 2011; Gore et al., 2012); Gore et al., 2017). The EM/EC reports summarise the quality of the structure and highlight specific concerns about the atomic model and report statistics about the reconstruction.

The title page shows some information about the entry deposition as well as the names and version numbers of the software tools and reference information used to produce the report. The title page will also show the type of the report (whether it is prelimary, confidential or for a publically released PDB/EMDB entry) and its length, for more details see FAQ on report types.

1. Overall quality at a glance

This section provides a succinct "executive" summary of key quality indicators. If there should be serious issues with a structure, this would usually be evident from this summary.

The metrics shown in the "slider" graphic (see example below) compare several important global quality indicators for this structure with those of previously deposited PDB entries. The comparison is carried out by calculation of the percentile rank, i.e. the percentage of entries that are equal or poorer than this structure in terms of a quality indicator. The global percentile ranks (black vertical boxes) are calculated with respect to all structures available in the PDB archive up to 27 December 2017. The EM model-specific percentile ranks (white vertical boxes) are calculated with respect to all EM and EC (combined) model entries in the PDB. In general, one would of course like all sliders to lie to the far right in the blue areas (especially for recently determined structures, and in particular the EM/EC model-specific sliders).

Image of sliders for EM structure

Note that if you are not an expert you neither need to know what the various quality criteria measure nor whether the values for an entry are unusual or not. However, for increased understanding, below is a brief description of these key global quality indicators:

Clashscore	This score is derived from the number of pairs of atoms in the model that are unusually close to each other. It is calculated by MolProbity (Chen et al., 2010) and expressed as the number or such clashes per thousand atoms. Further information can be found in the Close contacts section of the report, as described below.
Ramachandran outliers	A residue is considered to be a Ramachandran plot outlier if the combination of its φ and ψ torsion angles is unusual, as assessed by MolProbity (Chen et al., 2010). The Ramachandran outlier score for an entry is calculated as the percentage of Ramachandran outliers with respect to the total number of residues in the entry for which the outlier assessment is available. Further information can be found in the Torsion angles, Protein backbone section of the report, as described below.
Sidechain outliers	Protein sidechains mostly adopt certain (combinations of) preferred torsion angle values (called rotamers or rotameric conformers), much like their backbone torsion angles (as assessed in the Ramachandran analysis). MolProbity considers the sidechain conformation of a residue to be an outlier if its set of torsion angles is not similar to any preferred combination. The sidechain outlier score is calculated as the percentage of residues with an unusual sidechain conformation with respect to the total number of residues for which the assessment is available.
RNA backbone	Like the protein backbone and sidechains, the RNA backbone also adopts certain sets of preferred torsion angle values. Based on statistical analysis of RNA chains in the PDB, MolProbity (Chen et al., 2010) assigns a score per nucleotide for the quality of its backbone. This metric is calculated as the average score of all nucleotides in the entry.

For more information about validation metrics, see Henderson et al. (2012) and the review by Kleywegt (2000).

The slider graph is followed by a table that shows the number of entries upon which the percentile rank calculations are based:

(image of metrics table for EM)

The next table provides a graphical summary of the quality of all polymeric chains:

(image of quality of chain table, EM)

There may be green, yellow, orange and red portions in the bar for each chain, indicating the fraction of residues that contain outliers for 0, 1, 2, ≥3 model-only validation criteria, respectively. A grey segment indicates residues present in the sample but not modelled in the final structure. The numeric value for each fraction is shown below the corresponding segment. Values <5% are indicated with a dot. If residue inclusion outliers were present, there is an additional red bar above the lower bar, indicating the fraction of residues that are residue inclusion outliers. The numberic value for the fraction is indicated above this bar.

The Quality of chain chart shows the fraction of residues in each chain that are flagged as unusual according to the validation criteria used rather than where in the sequence this occurs (the plots are a kind of horizontal pie chart). The following section Residue-property plots provides a graphic showing where in the sequence the issues occur.

2. Entry composition

This section summarises the number of unique molecules that are present in the entry, and how they have been modelled. Each unique molecule and its instances (chain id) are described in a table:

with the following columns:

Mol	The identifier of the molecule (for experts: this is the same as the "entity id" in the mmCIF file of the entry).
Chain	The instance identifier. If there is more than one model present in the entry, the chain is prefixed with a model number.
Residues	The number of residues in the molecule.
Atoms	This tabulates the counts of various element types in the molecule.
Trace	The number of residues in the molecule that have been modelled with a reduced set of atoms. Protein or nucleic acid chains may be modelled with only one or two atoms (e.g. Cα, Cβ, P, an atom in a sugar ring or nucleobase, etc.). Typically, such cases are observed when the experimental data is insufficient to confidently model all atoms.

In addition, each unique oligosaccharide molecule, if present, is represented with a 2D SNFG image (Tsuchiya et al., 2017).

A Ligand Of Interest (LOI) is a subject of author’s research. Ligands that are flagged by authors during deposition are labelled as LOI.

The Mol and Chain identifiers are also used in other tables in the report.

3. Residue-property plots

This section shows summary plots of quality information for protein, RNA and DNA molecules on a per-residue basis.

There are two graphics shown for each molecule. The first graphic is the same as that shown in section 1: the green, yellow, orange and red segments indicate the fraction of residues with 0, 1, 2 and 3 or more types of model-only quality criteria with outliers, respectively. The additional red segment above the summary graphic (if present) indicates the fraction of residues that have an unusual fit to the map (residue inclusion outliers).

The second graphic shows the sequence annotated by these criteria with outliers in model quality and unusual fit to the map (see example graphic below). The colour-coding described above is used here too. A red lozenge above a residue indicates a poor fit to the map (i.e., an residue inclusion outlier). Consecutive stretches of residues for which no outliers were detected at all are not shown individually, but indicated by a green connector. Residues absent from the final model are shown in grey.

(example of EM residue-property plot)

In general, the less red, orange, yellow and grey these plots contain, the better. It is important to realise that residues that are outliers on one or more model-validation criteria could be either errors in the model, or reflect genuine features of the structure. Careful analysis of the experimental data is typically required to make the distinction. Outlier residues that are important for structure or function (e.g., enzymatic residues, interface residues, ligand-binding residues) should be inspected extra carefully (and addressed in a manuscript describing the structure).

The types of model-only quality criteria included in this analysis, and the software used for their calculation are:

bond length and angle outliers (MolProbity, Chen et al., 2010)
chirality outliers (Validation-pack, Feng et al.)
planarity outliers (Validation-pack, Feng et al.)
too-close contacts (MolProbity, Chen et al., 2010 and Validation-pack, Feng et al.)
protein backbone (Ramachandran) outliers (MolProbity, Chen et al., 2010)
protein sidechain torsion angle outliers (MolProbity, Chen et al., 2010)
RNA backbone torsion angle outliers (MolProbity, Chen et al., 2010)
RNA sugar pucker outliers (MolProbity, Chen et al., 2010)

Details of the outliers found for a residue can be found further down the report, in the Model quality section.

4. Experimental information

This section reports experimental details about the structure determination.

(image of EM experimental details table)

5. Model quality

Quality statistics in this section are calculated using standard compilations of covalent geometry parameters (Engh & Huber, 2001; Parkinson et al., 1996), tools in MolProbity (Chen et al., 2010), Validation-pack (Feng et al.) and the wwPDB chemical component dictionary (CCD).

5.1. Standard geometry

This section describes the quality of the covalent geometry for protein, DNA and RNA molecules in terms of bond lengths, bond angles, chirality and planarity. There are two tables providing a per-molecule summary and four tables that provide information on (some of) the outliers for each criterion (if any; otherwise the table is omitted).

Summary table for bond lengths and angles

Expected bond length and bond angle values (and standard deviations) for standard amino acids and nucleotides are available in a wwPDB compilation (wwPDB, 2012). The MolProbity Dangle program calculates Z-scores of bond length and bond angle values for each residue in the molecule relative to the expected values. (A Z score is generally defined as the difference between an observed value an expected or average value, divided by the standard deviations of the latter.)
The root-mean-square value of the Z-scores (RMSZ) of bond lengths (or angles) is calculated for individual residues and then averaged for each chain and over the whole molecule. RMSZ scores are expected to lie between 0 and 1. Individual bond lengths or angles with a Z-score greater than 5 or less than -5 merit inspection.

The bond/angle summary table:
Summary table for bond lengths and angles
has the following columns:

Mol	The molecule identifier
Chain	The instance identifier.
Bond lengths	The RMSZ sub-column gives the Root Mean Squared Z score of all bond lengths analyzed. The #\|Z\| >5 sub-column provides the number of bond lengths that have a Z-score > 5 or < -5 in comparison to the total number of bonds analyzed. ^†
Bond angles	The RMSZ sub-column gives the Root Mean Squared Z score of all bond angles analyzed. The #\|Z\| >5 sub-column provides the number of bond angles that have a Z-score > 5 or < -5 in comparison to the total number of angles analyzed. ^† :: ^† The percentage of outliers is listed in parentheses.

Summary table for chirality and planarity

Deviations from expected chirality and planarity in the model are calculated by Validation-pack (Feng et al.).
Chiral centres for all compounds occurring in the PDB are described in the chemical component dictionary. Chirality can be assessed in a number of ways, including calculation of the chiral volume, e.g. for the Cα of amino acids this is 2.6 or -2.6 Å³ for L or D configurations, respectively. If the sign of the computed volume is incorrect, the handedness is wrong. If the absolute volume is less than 0.7Å³ , the chiral centre has been modelled as a planar moiety which is very likely to be erroneous. Chirality deviations are summarised per chain.
Three kinds of potential planarity deviations are assessed:
- Sidechain: Certain groups of atoms in protein sidechains and nucleotide bases are expected to be in the same plane. An atom"s deviation from planarity is calculated by fitting a plane through these atoms and then calculating distance of individual atom from the plane. Expected value of such distances have been pre-calculated from data analysis (wwPDB, 2012). If an atom is modelled to be more than six times farther than the pre-calculated value, the residue is flagged to have a sidechain planarity deviation.
- Peptide: A deviation is flagged if the omega torsion angle of a peptide group differs by more than 30° from the values expected for a proper cis or trans conformation (0° and 180°, respectively).
- Main chain: The N atom of an amino acid residue is expected to be in the same plane as the Cα, C, and O atoms of the previous residue. If it is out of plane by more than 10°, this is flagged as a planarity deviation.

Outlier listing detailed tables

Where outliers exist, up to five for each category are listed in a table in the Summary Report, whereas the Full Report will list all the outliers found. For bond lengths and bond angles, the worst outliers are reported.

All the different outlier tables have the following columns in common:

Mol	The molecule identifier.
Chain	The instance identifier
Res	The residue number. Where applicable, an insertion code and alternative conformation identifier are specified as well.
Type	The residue name.

The following columns are specific to the bond length and bond angle outlier tables:

Atoms	names of atoms involved in the bond or angle.
Z	The Z-score of the bond length or angle.
Observed	The observed value of the bond length or angle.
Ideal	The ideal value of the bond length or angle.
	For example:

The following column is specific to the chirality outliers table:

Atom	The name of the atom that is asssessed to have an unusual chiralty (see above for details of chirality assessment)
	For example:

The following column is specific to the planarity outliers table:

Group	The planarity deviation type, i.e. sidechain, main chain or peptide as described above.
	For example:

5.2. Too-close contacts

This section provides details about too-close contacts between pairs of atoms that are not bonded where there is an unfavorable steric overlaps of van der Waals shells (clashes).

All-atom contacts are calculated by the Reduce and Probe programs within MolProbity (Word et al., 1999; Chen et al., 2010). This method was developed to quantify the detailed non-covalent fit of atomic interactions within or between molecules (H-bonds, favorable van der Waals, and steric clashes). Since most such interactions involve H atoms on one or both sides, all hydrogens must be present or added (Reduce optimizes rotation of OH, SH, NH3, etc. within H-bond networks, but methyls stay staggered). At present, in order to ensure comparable scores between NMR and X-ray, hydrogen atoms are removed from the analysed structure, and replaced by a different set placed by Reduce in idealised and optimized nuclear-H positions. All-atom unfavorable overlaps ≥0.4Å are then identified as clashes, using van der Waals radii tuned for the nuclear H positions suitable for NMR (rather than the electron-cloud H positions suitable for X-ray). MolProbity then calculates an all-atom clashscore, which is defined as the number of clashes per 1000 atoms (including hydrogens). Percentile scores of the clashscore are also computed, to allow assessment of how the structure compares to the rest of the archive.

Clashes are summarised in a table, for example:

(image EM clash summary table)

The columns are labelled:

Mol	The molecule identifier
Chain	The instance identifier
Non-H	The number of non-hydrogen atoms modelled.
H(model)	The number of hydrogen atoms modelled.
H(added)	The number of hydrogen atoms added by MolProbity.
Clashes	The number of clashes in which the atoms in this instance of the molecule are involved.
Symm-clashes	List the symmetry-related clashes in a crystal structure - the values should always be zero for EM structures.

If there are clashes a table with details will then be given:

(image em table showing individual clashes)

the table has the following columns:

Atom-1	The molecule identifier, instance identifier, residue number, residue name and atom name for the first atom. where applicable, the chain identifier is prefixed with model number and an alternative conformation identifier is shown as a suffix to the atom name.
Atom-2	Identifies the second atom in the clash.
Interatomic distance	The distance between Atom-1 and Atom-2 in Å.
Clash overlap	the "magnitude" of the clash is assessed by MolProbity. the MolProbity "magnitude" of a clash is defined as the difference between the observed interatomic distance and the sum of the van der Waals radii of the atoms involved (Chen et al., 2010). The radii used are tuned for use with nuclear H positions suited for NMR (rather than the electron-cloud H positions used for X-ray).

In a Summary Report up to five of the worst clashes are listed in the table, whereas in a Full Report all the clashes are listed.

Please see FAQs on: Why are there clashes reported between hydrogen atoms that are not present in the deposited model?. and What to do about reported clashes?

5.3. Torsion angles

5.3.1. Protein backbone

This section is populated if there are protein molecules present in the entry. The conformation of a protein backbone can be described by a pair of torsion angles (phi, psi) per residue (the remaining torsion angle, omega, is usually 180°). Ramachandran plots show the combinations of phi-psi values in a structure and typically compare these to a distribution of commonly observed values in high-resolution crystal structures. MolProbity’s Ramachandran plots are residue-type specific, derived from a high-quality subset of protein X-ray structures and divided into favoured, allowed and outlier regions. Favoured and allowed regions are defined to be the regions that include 98% and 99.95%, respectively, of the residues in the high-quality data (see (Chen et al., 2010). for more details).

This section contains a summary of analysis of the backbone torsion angles phi and psi by Molprobilty.

The summary table contains the following columns:

Mol	The molecule identifier
Chain	The instance identifier
Analysed	The first number here is the number of residues in the chain for which MolProbity output is available. The second number is the total number of residues in the chain. Phi and psi angles cannot be analysed for terminal residues, non-standard residues or for residues with incompletely modelled main chain.
Favoured, Allowed, Outliers	The number (and percentage) of residues in the favoured, allowed and outlier regions respectively, of the residue-specific phi-psi plots.
Percentiles	The percentile score based on the percentage of Ramachandran outliers in the chain. These are given relative to the whole archive (first value) and relative to (second value). The colours around the percentile values correspond to the slider positions in the Overall quality section of the report, as described above

Where Ramachandran outliers exist, up to five randomly chosen outlier residues are listed in a table in the Summary Report, whereas the Full Report will list all the outliers found. It has following columns:

Mol	The molecule identifier
Chain	The instance identifier
Res	The residue number
Type	The residue name

5.3.2. Protein sidechains

Protein sidechain conformation can be described by the chi torsion angles. Depending on residue type, these angles adopt certain preferred sets of values (also termed rotamers or rotameric conformers). Based on analysis of high quality X-ray entries in the PDB, MolProbity assesses whether a sidechain is similar to one of the preferred sets of torsion angles, or is an outlier (see (Chen et al., 2010). for more details). This section is based on MolProbity analysis of sidechains.

The summary table summarises of sidechain outliers and has the following columns:

Mol	The molecule identifier
Chain	The instance identifier
Analysed	The first number here is the number of residues in the chain which were analysed by MolProbity. The second number is the total number of residues in the chain. Chi torsion angles cannot be analysed for non-standard residues or for residues with incompletely modelled sidechains.
Rotameric, Outliers	The number (and percentage) of residues with favoured, and unusual chi torsion angles respectively.
Percentiles	The absolute and relative percentile scores based on the percentage of sidechain outliers in the chain. These are given relative to the whole archive (first value) and relative to (second value). The colours around the percentile values correspond to the slider positions in the Overall quality section of the report, as described above

Where outliers exist, up to five randomly chosen are listed in a table in the Summary Report, whereas the Full Report will list all the outliers found.

It has the following columns:

Mol	The molecule identifier
Chain	The instance identifier
Res	The residue number Type: :The residue name

5.3.3. RNA

This section describes the quality of RNA chains using MolProbity’s analysis of ribose sugar puckers and rotameric nature of "suites" of backbone torsion angles (see Richardson et al., 2008, and Chen et al., 2010 for details). A suite consists of the torsion angles between the sugars in two RNA nucleotides and is identified by the 3' nucleotide.

The summary table summarises the geometrical quality of an RNA chain using the following columns:

Mol	The molecule identifier
Chain	The instance identifier
Analysed	The first number here is the number of backbone suites for which analysis was carried out, and the latter number is the total number of nucleotides. The former is a smaller number because a suite is not defined at 5'-end, or a suite might be incompletely modelled.
Backbone outliers	The percentage of nucleotide suites in the chain which Molprobitiy identified as an outlier.
Pucker outliers	The percentage of sugar pucker outliers in the chain which Molprobitiy identified as an outlier. These are nucleotides where the strong correlation between sugar pucker and distance between the glycosidic bond vector and the following phosphate is violated.
Suiteness	The overall suiteness parameter as defined by Molprobity.

Where backbone or pucker outliers exist, up to five randomly chosen are listed in a table in the Summary Report, whereas the Full Report will list all the outliers found.

Both tables have the following columns:

Mol	The molecule identifier
Chain	The instance identifier
Res	The residue number.
Type	The residue name.

5.4 ⇒ 5.7. Non-standard residues in protein, DNA, RNA chains; Carbohydrates; Ligand geometry; Other polymers

These sections analyse the geometry of:

Non-standard amino acids within proteins and non-standard nucleotides within DNA or RNA
Carbohydrates
Ligands
Other polymers

Bond lengths, bond angles, acyclic torsions and isolated rings are assessed using the Mogul program (Bruno et al., 2004) by comparison with preferred molecular geometries derived from high-quality, small-molecule structures in the Cambridge Structural Database (CSD). Chirality is assessed by Validation-pack (Feng et al.).

There are two summary tables providing a per-molecule overview and detailed tables that provide information on (some of) the outliers for each criterion (if any; otherwise the table is omitted).

Summary table for bond lengths and angles

A Z-score is calculated for each bond length and bond angle in the molecule (A Z-score is generally defined as the difference between an observed value and an expected or average value, divided by the standard deviations of the latter.). Individual bond lengths or angles with a Z-score less than -2 or greater than 2 merit inspection.

The root-mean-square value of the Z-scores (RMSZ) of bond lengths (or angles) is calculated for the whole molecule. RMSZ scores are expected to lie between 0 and 1. For low-resolution structures, geometry should be tightly restrained and small values are expected. For very high-resolution structures, values approaching 1 may be attained. Values greater than 1 indicate over-fitting of the data.

At least 20 examples were required for each bond length and bond angle to be assessed.

Mogul summary table for a REA ligand

The bond/angle summary table has the following columns:

Mol	The molecule identifier.
Type	The residue name.
Chain	The instance identifier.
Res	The residue number.
Link	The identifier(s) of the molecule(s) to which the residue is linked, e.g. by a covalent bond, salt bridge etc.
Bond lengths (or angles)	This column is subdivided into three: Counts: This column gives 3 values: the number of bonds (or angles) analysed, the number of bonds (or angles) modelled in the residue and the number of bonds (or angles) defined in the PDB chemical component dictionary. The number of bonds (or angles) analysed may be less than observed due to the absence of comparable fragments in the Cambridge Structural Database. RMSZ: The root-mean-square value of the Z-scores (RMSZ) of all bond lengths (or angles). #\|Z\| >2: The number of bond lengths or bond angles that have a Z-score of less than -2 or greater than 2 compared to the total number of bonds / angles that have sufficient matches in the CSD is given in the #\|Z\| >2 column. In parentheses the number of outliers within the molecule is listed as a percentage.

Summary table for chirality, torsions and rings

For acyclic torsion angles, Mogul provides the local density measure. This measures the ratio of incidences in the Cambridge Structural Database within 10 degrees of the torsion angle in question, to the number of total incidences of the torsion angles in the Cambridge Structural Database. If this figure was less than 5% the torsion angle is considered an outlier.

For isolated rings, Mogul compares the given ring with comparable rings in small molecules structures in the Cambridge Structural Database and calculates an RMSD value based on corresponding constituent torsion angles for each comparable ring. The mean and minimum of these RMSDs both have to be above 60° for the ring to be flagged an outlier.

At least 15 examples were required for each torsion angle and ring to be assessed.

Note that the criteria used to flag a ring or torsion angle as an outlier are under development. The current criteria are very conservative. They will be refined following analysis of a large test set of ligands.

Mogul chirality, torsions and rings summary table for a K21 ligand

The chirality, torsion angles and rings summary table contains the following columns:

Mol	The molecule identifier.
Type	The residue name.
Chain	The instance identifier.
Res	The residue number.
Link	One or more molecule identifiers to which the residue is linked, e.g. by a covalent bond, salt bridge etc.
Chirals	This column lists: the number of chiral outliers in the chain, the number of chiral centers analysed, the number of these observed in coordinates and the number defined in the PDB chemical component dictionary.
Torsion	This column lists: the number of torsion angle outliers in the chain, the number of torsions analysed, the number of these observed in coordinates and the number defined in the PDB chemical component dictionary.
Rings	This column lists: the number of ring outliers in the chain, the number of rings analysed, the number of these observed in coordinates and the number defined in the PDB chemical component dictionary.

Information tables for bond length, bond angle, chirality, torsion angle and ring outliers

Where outliers exist, up to five for each category are listed in a table in the Summary report, while the Full report lists all of them. Bond length and bond angle outliers are sorted by the Z-score (worst first). Other outliers are selected randomly in the Summary report.

The outlier tables have the following columns in common:

Mol	The molecule identifier.
Type	The residue name.
Atom(s)	names of atoms involved in the bond, angle, torsion angle, ring, or the name of the chiral atom with the unusual deviation.
Chain	The instance identifier.
Res	The residue number.

The following columns are specific to the bond length and bond angle outliers tables:

Z	The difference between observed and ideal values in terms of standard deviations.
Observed	The observed value of the bond length or angle.
Ideal	The ideal value of the bond length or angle.
	For example: +

The two-dimensional graphical depiction (Smart and Bricogne, 2015) of Mogul quality analysis of bond lengths, bond angles, torsion angles, and ring geometry are provided for ligands that have been designated as ligand of interest (LOI) by the depositor, regardless of the validation assessment, and for any ligands with molecular weight greater than 250 Daltons that have outliers flagged in validation.

Color scheme is coded according to validation result with green indicating commonly observed values, magenta indicating unusual values, and gray indicating that there was insufficient data to derive a validation score. Unusual values include model quality and electron density fit. For model quality, individual bond lengths or angles with a Z-score less than -2 or greater than 2, the torsion angle with less than 5% of local density measure from Mogul calculation, or RMSD is above 60 degree are considered unusual and colored in magenta.

5.8. Polymer linkage issues

Any chain breaks are identified in this section.

6. Map visualisation

The map visualisation section contains visualisations of the map. These are intended to permit inspection of the internal detail of the map and identification of artifacts. If half maps were provided a raw map will be generated from them and displayed below the primary map. The items in this section are generated using the EMDB Validation-Analysis (To be published) which makes use of TEMPy (Farabella et al., 2015) for map analysis.

These artifacts can include but are not limited to

Streaking	May indicate insufficient representation in particular orientations
Mask artifacts	Can indicate whether masks were used and the types of masks applied during processing

6.1. Orthogonal projections

The images show the map projected in three orthogonal directions, in greyscale.

If half maps are provided the resultant rawmap will be shown projected in three orthogonal directions below.

(image of orthongonal projections)

6.2. Central slices

The images show the central slice of the map in three orthogonal projections.

If half maps are provided the central slice of the resultant rawmap will be shown projected in three orthogonal directions below.

(image of central slices)

6.3. Largest variance slices

The images show the largest variance slices of the map in three orthogonal directions. The index of the slice in the relevant axis is given below the image.

If half maps are provided the largest variance slices of the resultant rawmap will be shown projected in three orthogonal directions below. The index of the slice in the relevant axis is given below the image.

(image of largest variance slices)

6.4. Orthogonal standard-deviation projections (False-color)

The images show the map projected in three orthogonal directions in false colour.

If half maps are provided the resulting rawmap will also be shown projected in three orthogonal directions.

The greyscale images are matched to a colour by using colour lookup table GLOW (From Fiji). The minimum value of the projection Is matched to green with RGB (0, 138, 0) and the maximum value is matched to blue with RGB (0, 0, 255). Intermediate values are matched to varying shades of orange from dark to light where dark colours are low pixel values and light colours are high pixel values. (image of orthogonal directions in false colour)

6.5. Orthogonal surface views

The images show the 3D surface of the map at the recommended contour level. These images, in conjunction with the slice images, may facilitate assessment of whether an appropriate contour level has been provided. If half maps are provided the 3D surface of the resulting rawmap will be shown below. The raw map’s contour level was selected so that its surface encloses the same volume as the primary map does at its recommended contour level.

These images are generated using ChimeraX (Goddard et al., 2018). (image of orthogonal surface views)

6.6. Mask visualisation

This section shows the 3D surface view of the primary map at 50% transparency in yellow overlaid with the specified mask at 0% transparency in blue. A mask typically either encompasses the whole structure and indicates the removal of noise from the peripheries of the map, or seperates out a domain, a functional unit, a monomer or an area of interest from the larger structure. These images are generated using ChimeraX (Goddard et al., 2018).

(image of mask views)

7. Map analysis

The map analysis section contains statistical analysis of the EM volume. The information is given as a set of graphs.

7.1. Map value distribution

The map value distribution is plotted in 128 intervals along the x-axis. The y axis is logarithmic. A spike at around 0 usually indicates that the volume has been masked.

(image of map value distribution graph)

7.2. Volume estimate by contour

The volume estimate graph shows how the enclosed volume varies with contour level. The specified contour level is shown as a vertical line and the intersection between the line and the curve gives the volume of the enclosed surface at the given threshold.

If the molecular weight of the sample is provided by the author, the volume corresponding to the molecular weight is also indicated as a horizontal line. Ideally the horizontal and vertical lines will intersect at a single point on the volume estimate curve.

Volume curve calculation: a density value of 1.5 g/cm³ has been used to provide a rough estimate of the molecular volume, based on the molecular weight. The density of a biological sample can vary to large degree from as low as 1.2 g/cm³ for some proteins to close to 2 g/cm³ for nucleic acids with CsCl salts. The unit for molecular weight is kDa, and for the volume, nm³.

The volume estimate graph should be treated as experimental. Some reasons why the sample and map based weights may not agree are:

Molecular weight is given for a fraction of the sample, i.e. one repeating unit, when the sample includes many.
Weight is given for a larger unit than what is in the EM volume, e.g. a whole fiber
The sample has a heavier or lighter density than average.
No correction is attempted for stained samples.
A contour level that does not correspond to the estimated volume was provided by the author.

(image of volume estimate graph)

7.3. Rotationally averaged power spectrum

The rotationally averaged power spectrum (RAPS) may provide insight into the data processing steps leading to the map, in terms of:

CTF correction
Temperature factor correction
Low and/or high-pass filtering
Masking artifacts

The RAPS plot is only generated for cubic volumes. If half maps are provided the RAPS for the raw map is also provided.

(image of rotationally averaged power spectrum graph)

8. FSC validation

Fourier-Shell Correlation (FSC) is the most commonly used method to estimate the resolution for single particle and subtomogram averaging methods. The shape of the curve depends on the imposed symmetry, mask and whether or not the two 3D reconstructions used were processed from a common reference. The author-reported resolution is drawn as a vertical black line. A curve is displayed for 1/2 bit criterion in addition to lines showing the 0.143 gold standard cut-off and 0.5 cut-off.

8.1. FSC

Graph of the FSC curve(s). Displays the author provided curve if provided, and the curve calculated from half maps if present.

(image of the fsc graph)

8.2. Resolution estimate

The table contains global resolution estimates for the map.

(image of resolution estimates table)

9. Map-Model fit

This section contains information regarding the quality of fit between the map and the atomic structure. The items in this section are generated using the EMDB Validation-Analysis (To be published) which makes use of TEMPy (Farabella et al., 2015) for map analysis. Per residue inclusion information can be found in section 3

9.1. Map-model overlay

The images show the 3D surface view of the map at the recommended contour level at 50% transparency in yellow with a ribbon representation of the model coloured in blue. These images are generated using ChimeraX (Goddard et al., 2018).

(image of EM map-model overlay)

9.2. Q-score mapped to coordinate model

The images show Q-score for each residue mapped to the coordinate model. Q-score values are calculated for all entries above 1.25 Angstroms in resolution. We present high Q-score values in cyan, low values in red and negative values in magenta. The Q-score methodology implemented here comes from Pintilie et al., 2020.

Q-score correlates very highly with resolution as measured using the Fourier Shell Correlation of two independent half-maps at a correlation value of 0.143. With this in mind it is worth noting that low Q-score values are expected at resolutions where atoms are no longer easily resolved, and since Q-score is calculated on an atom by atom (non-hydrogen atoms) basis this also applies to Q-scores given to residues and chains. In some cases, negative Q-scores may be present in a structure, these describe a negative correlation between the electron density of the atoms being measured relative to a gaussian from a well resolved atom at high resolution.

(image of residue Q-score mapped to coordinate model)

9.3. Atom inclusion mapped to coordinate model

The images show Atom inclusion for each residue mapped to the coordinate model. High atom inclusion is shown in blue whilst low atom inclusion is shown in red. The atom inclusion is calculated atom by atom (non-hydrogen atoms) and averaged to produce the residue atom inclusion.

(image of residue Q-score mapped to coordinate model)

9.4. Atom inclusion

The atom inclusion graph displays the fraction of atoms that are inside the surface at a given contour level. For all atom models, two curves are drawn, one for backbone atoms in blue, and one for all non-hydrogen atoms in green. Backbone models includes atoms N, C, O, CA for amino acids, C3', C4', C5', O3', O5', P for nucleic acids and the first atom of any ligand. The all non-hydrogen atoms curve is most suitable for use in cases where the resolution is sufficiently high to resolve side chains.

(image of EM atom inclusion graph)

9.5. Map-model fit summary

The table shows atom inclusion and Q-score per chain and for the whole structure. These are the average values of Atom inclusion and Q-score calculated on an atom by atom bases (non-hydrogen atoms). All per-residue atom inclusion and Q-score information is also present in relevant validation Cif and XML files.

(image of residue Q-score mapped to coordinate model)

References

I. J. Bruno, J. C. Cole, M. Kessler, J. Luo, W. D. S. Motherwell, L. H. Purkis, B. R. Smith, R. Taylor, R. I. Cooper, S. E. Harris, and A. G. Orpen. Retrieval of crystallographically-derived molecular geometry information. J. Chem. Inf. Comput. Sci., 44:2133–2144, 2004. CrossRef
V. B. Chen, W. B. Arendall III, J. J. Headd, D. A. Keedy, R. M. Immormino, G. J. Kapral, L. W. Murray, J. S. Richardson, and D. C. Richardson. MolProbity: all-atom structure validation for macromolecular crystallography. Acta Cryst., D66:12–21, 2010. CrossRef
R. A. Engh and R. Huber. International Tables for Crystallography, Volume F. Crystallography of Biological Macromolecules., Chapter 18.3 Structure quality and target parameters, pages 382–392. Kluwer Academic Publishers, 2001. CrossRef
Z. Feng. Validation-pack. https://sw-tools.pdb.org/
S. Gore, S. Velankar and G. J. Kleywegt. Implementing an X-ray validation pipeline for the Protein Data Bank. Acta Cryst., D68:478–483, 2012. CrossRef
S. Gore, E. S. Garcia, P. M. S. Hendrickx, A. Gutmanas, J. D. Westbrook, H. W. Yang, Z. K. Feng, K. Baskaran, J. M. Berrisford, B. P. Hudson, Y. Ikegawa, N. Kobayashi, C. L. Lawson, S. Mading, L. Mak, A. Mukhopadhyay, T. J. Oldfield, A. Patwardhan, E. Peisach, G. Sahni, M. R. Sekharan, S. Sen, C. H. Shao, O. S. Smart, E. L. Ulrich, R. Yamashita, M. Quesada, J. Y. Young, H. Nakamura, J. L. Markley, H. M. Berman, S. K. Burley, S. Velankar, G. J. Kleywegt. Validation of Structures in the Protein Data Bank. Structure 25: 1916-1927, 2017. CrossRef)
R. Henderson, A. Sali, M. L. Baker, B. Carragher, B. Devkota, K. H. Downing, E. H. Egelman, Z. Feng, J. Frank, N. Grigorieff, W. Jiang, S. J. Ludtke, O. Medalia, P. A. Penczek, P. B. Rosenthal, M. G. Rossmann, M. F. Schmid, G. F. Schröder, A. C. Steven, D. L. Stokes, J. D. Westbrook, W. Wriggers, H. Yang, J. Young, H. M. Berman, W. Chiu, G. J. Kleywegt, C. L. Lawson, Outcome of the First Electron Microscopy Validation Task Force Meeting, Structure, 20:205–214, 2012. CrossRef
Goddard TD, Huang CC, Meng EC, Pettersen EF, Couch GS, Morris JH, Ferrin TE. UCSF ChimeraX: Meeting modern challenges in visualization and analysis., Protein Sci., 27(1):14–25, 2018 CrossRef
Farabella, I., Vasishtan, D., Joseph, A.P., Pandurangan, A.P., Sahota, H. & Topf, M. TEMPy: a Python library for assessment of three-dimensional electron microscopy density fits. Appl. Cryst. 48:1314–1323, 2015 CrossRef
Pintilie, G., Zhang, K., Su, Z., Li, S., Schmid, M. F., & Chiu, W. Measurement of atom resolvability in cryo-EM maps with Q-scores. Nature Methods. 17(3):328–334, 2020 CrossRef
G. N. Parkinson, J. Vojtechovsky, L. Clowney, A. T. Brünger, and H. M. Berman. New parameters for the refinement of nucleic acid containing structures. Acta Cryst., D52:57–64, 1996. CrossRef
R. J. Read, P. D. Adams, W. B. Arendall III, A. T. Brunger, P. Emsley, R. P. Joosten, G. J. Kleywegt, E. B. Krissinel, T. Lütteke, Z. Otwinowski, A. Perrakis, J. S. Richardson, W. H. Sheffler, J. L. Smith, I. J. Tickle, G. Vriend, and P. H. Zwart A new generation of crystallographic validation tools for the Protein Data Bank. Structure, 19:1395–1412, 2011. CrossRef

J. S. Richardson, B. Schneider, L. W. Murray, G. J. Kapral, R. M. Immormino, J. J. Headd, D. C. Richardson, D. Ham, E. Hershkovits, L. D. Williams, K. S. Keating, A. M. Pyle, D. Micallef, J. Westbrook and H. M. Berman. RNA backbone: consensus all-angle conformers and modular string nomenclature (an RNA Ontology Consortium contribution). RNA. 14:465–481, 2008. CrossRef
O. S. Smart, and G. Bricogne. Multifaceted Roles of Crystallography in Modern Drug Discovery (G. Scapin, D. Patel and E. Arnold eds.), Achieving High Quality Ligand Chemistry in Protein-Ligand Crystal Structures for Drug Design, pages 165–181. Springer Netherlands, Dordrecht, 2015. https://www.globalphasing.com/buster/wiki/index.cgi?BusterReport
S. Tsuchiya, N. P. Aoki, D. Shinmachi, M. Matsubara, I. Yamada, K. F. Aoki-Kinoshita and H. Narimatsu. Implementation of GlycanBuilder to draw a wide variety of ambiguous glycans. Carbohydrate Res. 445:104–116, 2017. CrossRef
J. M. Word, S. C. Lovell, T. H. LaBean, H. C. Taylor, M. E. Zalis, B. K. Presley, J. S. Richardson, D. C. Richardson. Visualizing and quantifying molecular goodness-of-fit: small-probe contact dots with explicit hydrogen atoms J Mol. Biol., 285:1711–1733, 1999. CrossRef
wwPDB. The standard geometry compilation used in wwPDB validation protocols, 2012.

User guide to the wwPDB EM validation reports

1. Overall quality at a glance

2. Entry composition

3. Residue-property plots

4. Experimental information

5. Model quality

5.1. Standard geometry

5.2. Too-close contacts

5.3. Torsion angles

5.3.1. Protein backbone

5.3.2. Protein sidechains

5.3.3. RNA

5.4 ⇒ 5.7. Non-standard residues in protein, DNA, RNA chains; Carbohydrates; Ligand geometry; Other polymers

5.8. Polymer linkage issues

6. Map visualisation

6.1. Orthogonal projections

6.2. Central slices

6.3. Largest variance slices

6.4. Orthogonal standard-deviation projections (False-color)

6.5. Orthogonal surface views

6.6. Mask visualisation

7. Map analysis

7.1. Map value distribution

7.2. Volume estimate by contour

7.3. Rotationally averaged power spectrum

8. FSC validation

8.1. FSC

8.2. Resolution estimate

9. Map-Model fit

9.1. Map-model overlay

9.2. Q-score mapped to coordinate model

9.3. Atom inclusion mapped to coordinate model

9.4. Atom inclusion

9.5. Map-model fit summary

References

Archive Snapshots