What's New

The December 10, 2014 release offers features related to large structure support.

Support for Integration of Large Structures with the Main PDB Archive

Support for Integration of Large Structures with the Main PDB Archive

Large structures now represented in single files

With this week's update, large structures (containing >62 chains and/or 99999 ATOM lines) represented as single files have been fully integrated into the main PDB FTP archive in both PDBx/mmCIF and PDBML formats. Previously, large structures were represented in multiple "SPLIT" entries, which have now been removed (obsoleted).

A separate directory in the PDB FTP archive contains a TAR file including a collection of "best-effort", minimal, PDB format files for large structures that contain authorship, citation details and coordinate data, and an index file that contains the mapping between the chains present in the large entry and the chains present in the limited PDB-format files. DOIs for large structures will point to these TAR files.

Large structures will only be distributed in the main PDB FTP directory in PDBx/mmCIF and PDBML formats, including biological assembly files. Structures that do not exceed the limitations of the PDB format will continue to be provided as PDB files in the archive for the foreseeable future.

Detailed information is available at http://wwpdb.org.

The RCSB PDB website has been updated to support these new files. Users searching for ID codes of "SPLIT" entries will be automatically redirected to the combined entry. Download and Display options for coordinate files access the corresponding files in the main archive.

4V99 Download Files
Download menu for a large Panicum Mosaic Virus (PDB ID 4V99). Note that the PDB File (Text) and PDB File (gz) are greyed out as they are not available. Instead, links are provided to the PDBx/mmCIF, PDBML/XML, and TAR file of PDB format-like files (tar.gz).

Visualization of Large Structures

A multi-scale rendering option has been implemented for the efficient display of large structures in Simple Viewer and Protein Workshop. These viewers are accessible from the Structure Summary page.

3IYV Simple Viewer
View links for Simple Viewer and Protein Workshop on Structure Summary page
(Example: PDB ID: 3IYV)

We thank Henry Truong (UCSD Computer Science) for working on this project as part of the UCSD 2014 STARS program (Summer Training Academy for the Research in the Sciences).

Very large structures can be challenging for visualization programs. To improve loading time, carbon-alpha atoms are rendered on a per protein residue level using an average radius for each residue type. This results in a low-resolution surface as shown in the images below.

Multi-scale renderings of a vault ribonucleoprotein PDB ID 2QZV (view in Protein Workshop).

4Q4W Rendering Protein Workshop Top Multi-scale renderings generated of coxsackievirus PDB ID 4Q4W (view in Simple Viewer).
3IYV Rendering Protein Workshop Top Multi-scale renderings generated of clathrin PDB ID 3IYV (view in Protein Workshop).

For large protein-nucleic acid complexes, such as ribosomes, protein chains are rendered as low-resolution surfaces and nucleic acids chains as ribbons.

4V4G_Rendering Protein Workshop Multi-scale rendering of 70s ribosome PDB ID 4V4G (view in Simple Viewer).

Queries for common large structures: Ribosomes and Viruses

Users can quickly find all ribosomes and viruses in the PDB using the top bar simple search. For example, entering "ribosome" in the text search box, returns "View Ribosomes" option in the search suggestions (see image below). Similarly, entering "virus" can retrieve all virus structures.

Text search for the keyword ribosome will present the "Retrieve" feature to quickly find all ribosome structures.

Retrieve Ribosomes