PDB Statistics: Growth in Number of Unique Protein Sequences in Released PDB Structures (Cumulative) at Identity 95%

This chart shows the annual and cumulative numbers of protein sequences in released PDB structures. The chart can be viewed for a few different levels of sequence identity since the beginning of the PDB archive. The cumulative bars represent the growth in unique protein sequences (number of polymeric entities) across history. The yearly bars (dark blue) tell how many new protein sequences were added in a certain year.

Note: The total number of sequence clusters in the statistics table is linked to the sequence cluster group search result page. There is a default precision threshold in calculating the numbers for performance balance. So the statistics count may have a slight discrepancy compared to the actual non-redundant group search result when the result count approaches or goes above 10,000. The group search result page provides an accurate count. The statistics page provides the trend.

Chart is currently loading

Sequence cluster level:

YearNumber of New Protein SequencesTotal Number of Protein Sequences
19761313
19771023
1978326
1979632
1980436
19811046
19821864
19831175
19841186
19851298
19869107
198711118
198825143
198946189
199052241
199156297
199266363
1993231594
19944621,056
19953431,399
19964061,805
19975612,366
19987573,123
19998974,020
200010065,026
200110426,068
200211137,181
200315588,739
2004212210,861
2005234213,203
2006264115,844
2007296818,812
2008275821,570
2009281524,385
2010287327,258
2011263829,896
2012288132,777
2013308835,865
2014379839,663
2015314642,809
2016372646,535
2017407050,605
2018372954,334
2019408758,421
2020501863,439
2021446867,907
2022550373,410
2023523378,643
2024550884,151
2025620490,355
2026215392,508