PDB Statistics: Growth in Number of Unique Protein Sequences in Released PDB Structures (Cumulative) at Identity 30%

This chart shows the annual and cumulative numbers of protein sequences in released PDB structures. The chart can be viewed for a few different levels of sequence identity since the beginning of the PDB archive. The cumulative bars represent the growth in unique protein sequences (number of polymeric entities) across history. The yearly bars (dark blue) tell how many new protein sequences were added in a certain year.

Note: The total number of sequence clusters in the statistics table is linked to the sequence cluster group search result page. There is a default precision threshold in calculating the numbers for performance balance. So the statistics count may have a slight discrepancy compared to the actual non-redundant group search result when the result count approaches or goes above 10,000. The group search result page provides an accurate count. The statistics page provides the trend.

Chart is currently loading

Sequence cluster level:

YearNumber of New Protein SequencesTotal Number of Protein Sequences
19761212
1977921
1978324
1979226
1980329
1981736
19821450
1983757
19841168
19851179
1986887
1987895
198818113
198928141
199033174
199141215
199252267
1993144411
1994296707
1995225932
19962641,196
19973811,577
19984612,038
19995592,597
20006373,234
20016783,912
20026944,606
20039705,576
200413786,954
200514588,412
2006159610,008
2007168811,696
2008155713,253
2009148414,737
2010142316,160
2011125317,413
2012134818,761
2013144320,204
2014167321,877
2015139123,268
2016159224,860
2017163826,498
2018163128,129
2019166329,792
2020206131,853
2021163433,487
2022204635,533
2023200537,538
2024203039,568
2025223741,805
202670742,512