Deep Dive into EC3 Data
Intro
Over the last couple of years it’s been rare that a week goes by where I don’t hear about a new embodied carbon tracking tool in the AEC industry or a new policy requiring Environmental Product Declarations (EPDs) for building projects. Just last week my home state of New York announced a ‘buy Clean Concrete’ mandate that requires EPDs be submitted for all concrete mixes used in state construction projects beginning in 2025 (emissions limits are set to 150% of the regional baselines, which is a seemingly useless limitation, but that’s perhaps a topic for another day). While these new tools and policies represent a positive step forward in addressing carbon emissions in the building industry, I can’t help but wonder about the composition of the data underpinning their development.
Compared to operational carbon, which can be readily traced through energy bills, embodied carbon presents a much more challenging and complex data aggregation problem. And given its relatively recent emergence as a focal point within the industry, it’s no surprise that the data surrounding embodied carbon is still relatively new and evolving.
One of the best tools that the industry does have at the moment in regards to embodied carbon data is the EC3 database by Building Transparency. In short, the EC3 database contains digitized EPDs from manufacturers around the world, which is free and openly accessible. (I should probably note that while I’m admittedly a fanboy of EC3, I have no affiliation with them.) This accessibility of the database has made it a common data source for a variety of embodied carbon tools that have come out in recent years. While it’s great that the industry is beginning to make strides in connecting our design processes to real data, it’s important that we have a sound understanding of the data that decisions are being based on.
When I began collecting and analyzing the data that follows in this post it was mainly to satisfy my own curiosities around the composition of the dataset. I had also recently seen a presentation where somebody had shown that based on EC3 data, the Global Warming Potential (GWP) of concrete mixes in their region had been declining in recent years. I was curious if this finding would be true beyond a local region, which could imply that some combination of awareness and good policies is having a positive impact at scale (TLDR: sort of… hard to say conclusively at this point). That all said, the following investigation into the dataset is really more about understanding the data rather than trying to draw some grand conclusions.
Choosing the Data
For this analysis of the EC3 dataset I chose to focus on Ready-Mix concrete EPDs issued in the United States since 2018. This was a logical choice for a few reasons:
- Quantity of Data: Ready-Mix concrete EPDs make up a significant portion of the EC3 database due to the nature of how concrete plants are distributed geographically, etc.
- Impact of Material: Concrete – specifically the cement within the concrete – is known to be one of the major sources of carbon emissions within the built environment.
- Personal Interest: As somebody who works primarily in the structural engineering realm, I have a familiarity and interest with the material and its contribution to embodied carbon.
Additionally, limiting the dataset to the US allowed for removing some of the variables that could be at play when comparing between countries. I chose 2018 as the cutoff year for the dataset since the data uploaded prior to that was scarce. I’ve included further details on the dataset at the end of the post for anyone interested in getting a bit further in the weeds. (I’ll note now that the Jupyter Notebook to replicate this study is available on my Github site, and I welcome others to tweak it to their own regions and/or materials of interest.)
The plots below mainly focus on 4000 psi Normal Weight (NW) Concrete, which is one of the most commonly used strengths of concrete in buildings, and is the strength with the most EPDs available in EC3’s database (34,898 EPDS within the US at the time of this writing to be exact). I also looked at the data for 3000 psi and 5000 psi NW mixes, which yielded similar findings, so I’ve mainly just focused on 4000 psi for the sake of brevity. The GWP values for the EPDs and any baselines mentioned apply to product stages A1-A3 (raw material supply, transport to plant, and manufacturing)
Analysis: GWP of Concrete Mixes Issued over Time
This first plot aims to visualize the spread of when EPDs were uploaded and the range of GWP values. Each dot represents an individual EPD for a 4000 psi NW concrete mix. A few observations that can be made from this data include:
- There seem to be large uploads of data at specific dates rather than a gradual accumulation of data. This can be seen by the dark vertical lines formed at specific times. (More on this in the next plot)
- There’s not an immediately recognizable trend in terms of GWP values increasing or decreasing over time. (More on this in another upcoming plot…. oh the suspense!)
- The average reported GWP based on EC3 data thus far for 2023 is higher than the 2023 CLF baseline.
I further investigate the first two observations in some of the following plots. Regarding the baselines, the 2023 Carbon Leadership Forum (CLF) National baseline GWP for 4000 psi NW concrete is 235.6 kgCO2e/yd³ (often shown as 308 kgCO2e/m³), which is based on values provided in the NRMCA Industry Wide LCA Project Report v3.2 (2021). It’s important to note that the 2021 CLF National baseline GWP numbers for concrete mixes were based off of an older report that provided GWP values for ranges (e.g., 3001-4000 psi) rather than specific strength values (e.g., 4000 psi). Therefore, if you see a baseline value for 4000 psi NW concrete at 359 kgCO2e/yd³, as is seen in the boxplots on EC3s website, just be aware of which year the baseline is referencing. Why a national baseline would decrease over 34% in two years is a good question, but is also beyond the scope of this post.
Based on the data we have thus far in EC3 in 2023, our average GWP is at 266 kgCO2e/yd³, a bit higher than the 2023 baseline value (though well below the 2021 baseline value). Currently, 31% of the EPDs issued in 2023 for 4,000 psi NW concrete fall below the CLF baseline. This feels like an appropriately aggressive, yet achievable baseline value that anyone specifying concrete on projects should be aware of.
Analysis: EPD Counts Over Time by State
Returning to the first observation made in the previous plot, I wanted to better visualize the aggregation of EPDs over time, as well as the distribution across states. The following area chart shows the count of EPDs issued by state beginning in 2018. While the previous chart suggested there were specific dates where large amounts of data was uploaded, we can better visualize the quantity of these large data dumps in the area chart. Specifically, we can see large amounts of data being uploaded in late 2021 thru early 2022, before beginning a more gradual aggregation starting in 2022.
Upon some further investigation, these large data dumps appear to be a result of a single manufacturer issuing a large number of EPDs at once. For example, of the dataset for the 4000 psi NW concrete EPDs that totaled 32,036 after filtering, there were 5,278 EPDs issued on 10/04/21 by Eastern Concrete, and 3,064 EPDs were issued on 11/29/2021 by National Ready Mix. It was later explained to me that this is typically the result of manufacturers using Climate Earth’s platform to rapidly generate EPDs in bulk, which get auto-populated into EC3’s database. While that doesn’t mean this data is bad, it’s worth noting that these large jumps in the data introduce a bit of an oddity for the purposes of trying to look at trends in GWP values over time (as we will attempt next).
The other striking – although not particularly surprising – finding here is that the data is heavily skewed towards certain states. EPDs from California and New Jersey make up over two thirds of all of the data for this particular strength, and over 90% of the data is coming from less than ten states. The explanation for this is likely a messy mix of regional construction methods, population densities, and policies. That said, similar biases in the geographic source of the data can be found across most, if not all, of the EC3 product categories. These geographic biases should not prevent us from using the data, but they are something that users of this data should be aware of.
Analysis: Annual GWP Averages by Strength and State
As previously mentioned – I wanted to further explore whether or not GWP values were trending in the right direction. The original scatter plot was a bit too much of a mess to discern much in this regard, so I decided to take annual averages instead (keeping in mind there are still 3+ months in 2023). Overall, we see a downward trend (woo!) in the national averages for GWP, but we also see an uptick from 2022 to 2023, which raises some questions.
The plot above looked at the data for 3000 psi, 4000 psi, and 5000 psi NW concrete mixes over time. This was mainly done as a gut-check to make sure that there weren’t wildly different trends, which would have been a bit of a red flag. Fortunately, the results were what we would expect to see in terms of relative averages since higher strength mixes will have higher cement content, and therefore higher GWP values.
Next, I wanted to further explore how the trends compared between states. I picked six of the states with the most data, and calculated averages for years in which they had 30 or more EPDs issued. I was curious to know if most of the states were following the national trend or not, which is heavily skewed by California. The results don’t necessarily point to any clear conclusions here, but there are a couple of interesting things to note. For one, Oregon and Washington are the only two where the GWP values have decreased from 2022 to 2023. Also, mixes from plants in New Jersey show a consistently higher GWP value than all other states shown in the plot.
Summary
While these are just a handful of potentially useful visualizations of a narrow sliver of the EC3 data, we can begin to get a sense of how the data is spread out both geographically and chronologically. As noted, this was mainly an exercise in poking around the data and gaining a better awareness of what’s there. Whether somebody is planning to use EC3 as a data source for a design tool, or a machine learning application, or some other study, it’s important to know what is and isn’t included.
As far as the idea that we’re seeing evidence of GWP reductions as a result of better policies and awareness – there does appear to be some early evidence of this in the data for certain regions, but it’s hard to say conclusively at this point. At least for concrete, I think we would want to see at least a couple more years of data in order to get a better sense of the trends in GWP values.
While EPDs have been around for a while, they’ve mainly been limited to LEED or other more specialized projects. As more state and local governments implement requirements and incentives around providing EPDs for building products, we should hopefully begin aggregating more data that represents a larger spread of manufacturers and regions. Many current and future baseline values will be somewhat determined by the data currently in EC3, which should be good motivation for designers and manufacturers to control their own destiny by contributing to the dataset.
Further Details on the Data
If you’ve made it this far, you’re probably a bit of a nerd (no offense), and may be curious about some of the details of the dataset I used. Below I’ve noted most of the steps taken when filtering the data, and I’ve also made the majority of this analysis available on Github as a Jupyter notebook. This notebook makes use of the ec3-python-wrapper project, which is another project of mine that I’ve covered in a previous post.
- All of the data plotted was pulled on 9/17/23.
- The initial query of the EC3 database included mixes that were classified as ReadyMix, 4000 psi, Normal Weight, had a plant based in the US, and were product specific. This returned 34,898 EPDs.
- I removed any data that was missing values for GWP or Compressive Strength, which slightly trimmed the dataset to 34,485 EPDs
- I removed outliers based on GWP value using the Interquartile Range (IQR) method. It’s possible this threw out a handful of valid EPDs that may have been unique products, but I didn’t want these skewing the results too much anyway. This further trimmed the dataset to 34,033 EPDs.
- After looking at the initial dataset, while there was some data prior to 2018, I decided to cut it off since it was fairly scarce compared to the data from 2018 and beyond. This left me with 32,036 EPDs, which is what was ultimately the basis for the plots.
- When comparing between the different concrete strengths, the same filtering mentioned in the previous bullet points was applied to the other strengths.
- When plotting states, I picked six of the top seven states by EPD count, and only plotted averages for years where there were 30 or more EPDs issued. Minnesota was excluded since there was only three years of data for it.