PDB Statistics: Growth in Number of Unique Protein Sequences in Released PDB Structures (Cumulative) at Identity 30%

This chart shows the annual and cumulative numbers of protein sequences in released PDB structures. The chart can be viewed for a few different levels of sequence identity since the beginning of the PDB archive. The cumulative bars represent the growth in unique protein sequences (number of polymeric entities) across history. The yearly bars (dark blue) tell how many new protein sequences were added in a certain year.

Note: The total number of sequence clusters in the statistics table is linked to the sequence cluster group search result page. There is a default precision threshold in calculating the numbers for performance balance. So the statistics count may have a slight discrepancy compared to the actual non-redundant group search result when the result count approaches or goes above 10,000. The group search result page provides an accurate count. The statistics page provides the trend.

Chart is currently loading

Sequence cluster level:

YearNumber of New Protein SequencesTotal Number of Protein Sequences
19761212
1977921
1978324
1979226
1980329
1981736
19821450
1983757
19841168
19851179
1986887
1987895
198818113
198928141
199033174
199140214
199252266
1993144410
1994296706
1995225931
19962641,195
19973801,575
19984612,036
19995592,595
20006353,230
20016803,910
20026924,602
20039705,572
200413786,950
200514588,408
2006159810,006
2007168911,695
2008155913,254
2009148414,738
2010142316,161
2011125417,415
2012135018,765
2013144620,211
2014167421,885
2015139123,276
2016159324,869
2017163626,505
2018163428,139
2019166429,803
2020205831,861
2021163233,493
2022204335,536
2023200337,539
2024203039,569
2025223841,807
202667542,482