Exploring the CLF WBLCA Dataset

On April 2, 2025April 2, 2025By Jared

Intro

A few weeks ago the Carbon Leadership Forum released their ‘WBLCA Benchmark Study v2’ report, which presents a detailed harmonized dataset for 292 building projects across the US and Canada that have been collected from 30 different design firms. The report is based on what is likely the most detailed and robust dataset yet to be made public related to Whole Building Life Cycle Assessment (WBLCA) data. As noted in the report, “This dataset fills critical gaps for the building industry, research, and policy communities, enabling them to analyze and compare the impacts of buildings, test or set performance targets, and motivate sustainable design and construction practices.”

While 292 projects may not seem like much, the level of detail and transparency in the data is rare for our industry. Other industry efforts, like Architecture 2030 and SE 2050, have collected larger datasets, but with less granularity (at least in what’s been shared publicly). It’s great to see firms from across the industry contributing to a common dataset, and I hope the results of this report will encourage further reporting in the future!

After reviewing the report and exploring the data, I had several questions I wanted to investigate further. Below, I share a few of these explorations. The report itself provides excellent details on data preparation and processing, so I won’t repeat that here, but I encourage anyone interested to read the report.

The dataset is split into two Excel files: one by project, covering metadata and LCA parameters, and another by material, with LCI and LCIA results. For now, my analysis focuses on the per-project data, though I may explore the material-level data in the future.

Data Explorations

I’ve included visualizations below that looked at answering the following three questions:

How do environmental impact results differ based on the LCA Software used (Tally LCA vs. One Click)?
How much do embodied carbon impacts change when looking at carbon intensity per area vs. per occupant? And what building typologies are most impacted by the difference?
How much does building height / number of stories impact embodied carbon intensity?

I’ve shared the code for generating each of the studies in a Python Jupyter Notebook on github, which also includes a couple other plots that I opted not to show below for the sake of brevity.

1. Impact of LCA Software

For this study, I explored how the LCA software used affects environmental impact results. The dataset includes projects analyzed with either Tally LCA or One Click LCA (the ‘Data Collection User Guide’ also accepted projects captured using Athena, but entries that used Athena appear to have been removed from the finalized dataset). The report details some of the differences in terms of the scope covered by the two different software and what’s been done to harmonize this data as best as possible. The report also noted that many of the outliers generated seemed to come from One Click LCA more so than Tally LCA – a trend I confirmed when analyzing the data (I’ve cropped the plots below to remove extreme outliers).

About the Data:

For the six different impacts shown I’ve utilized the intensities normalized for the total constructed floor area (CFA) as opposed to the total gross floor area (GFA). As noted in the report, “For this dataset, CFA includes the floor area of any attached or integrated parking components whereas GFA (as defined by IPMS 2) is effectively the difference between the building’s constructed floor area and its parking area.”
Intensities for all metrics include life-cycle stages A to C
Data was filtered to only show New Construction
The total data points for the two software after filtering are as follows: Tally LCA: 159 / One Click LCA: 84

Takeaways

For each of the six impact intensities explored there is a significant difference in results based on the software being used. This suggests that if you are comparing results from an LCA study on a single project to a larger dataset, it would be best to filter the data based on the tool to match what you have used whenever possible.

2. Embodied Carbon by Area vs. by Occupant

The de facto unit of measurement for many building metrics has always been to look at things per building area. It’s one of the few metrics that we consistently know on projects across typologies and scale. However, looking at energy metrics on a square foot or meter basis can often be deceiving. That’s why I was excited to see the dataset also include impact intensities per occupant and residential unit (where applicable), which may offer a more meaningful comparison for certain building types.

Since we have intensities for all of the projects calculated by both square meter and per occupant, I was curious how shifting between the two would change how the projects rank relative to each other on embodied carbon intensity. I focused on how the shifts differed between different market sectors to evaluate their sensitivities. To find the shift in rankings I needed to normalize the values for both embodied carbon per area and the embodied carbon per occupant from 0 to 1, and then find the difference. Details on filtering and methodology are outlined below.

About the Data:

Embodied carbon considered for life-cycle stages A to C
As noted in the report, “the type of occupancy provided by data contributors is based on their applicable building and fire codes. Thus, occupancies reported are effectively the maximum allowable occupants of the buildings for fire safety, and not the average building occupants or full-time equivalent occupants of the buildings.”
Data was filtered to only show New Construction
Data was filtered to only buildings with no parking. This helped simplify the comparison since it meant that the values for constructed floor area (CFA) and gross floor area (GFA) were the same.
There were a small handful of projects with null values for the per occupant impacts which were removed.
Extreme outliers were removed based on the IQR method
The formula for the normalized shift was: Normalized Intensity Shift = Normalized Embodied Carbon Per Occupant – Normalized Embodied Carbon By Area
The remaining data was grouped by the Building Primary Use. There were five uses where there were only one or two projects representing a particular use: Transportation Hub, Industrial, Lodging, Mercantile, and Food Service. For graphic simplicity, I removed these projects from the visuals, though they were still included during the normalization step.
After filtering, a total of 132 projects were included in the normalization calculations.

Takeaways

How we measure embodied carbon intensity significantly affects how buildings compare, with some typologies more impacted than others when switching between per area and per occupant metrics. Unsurprisingly, warehouses showed the largest shift, suggesting they’re best compared per area – though some sort of volumetric storage unit would probably be the most ideal. Residential, Public Assembly, and Education also experienced major shifts, reinforcing the need for careful unit selection – intuitively one might lean towards a per occupant unit for these typologies.

To see whether building size influenced these shifts, I plotted rankings against total area. Aside from some larger degrees of variation at the very small end of the spectrum – which is to be expected since the smaller area numbers are more sensitive – there did not appear to be a significant correlation between building area and the shift in ranking.

3. Impact of Building Height

I was curious to see if the data would reveal any significant correlations between building height and embodied carbon intensity (using the constructed floor area method). Numerous studies have been done (mostly based on simulated building data rather than real building data) on studying the optimal building heights from an embodied carbon perspective. While the exact heights offered as “optimal” may vary based on how you measure, there’s some general consensus that as you get into highrise buildings, the need for heavier structures and more foundations would mean a higher carbon intensity.

About the Data:

Intensities for all metrics include life-cycle stages A to C
Removed any projects where the building height was null
Extreme outliers were removed based on the IQR method
Data was filtered to only show New Construction
The total data points for each bucket are shown in brackets next to the building heights in the chart below. The total projects included after filtering was 234

Takeaways

The dataset is fairly skewed towards low rise and mid rise buildings below 30 meters tall, and so it’s a bit difficult to draw any obvious conclusions around the impact of building heights on embodied carbon intensity from this dataset. I also did a similar plot based on the number of building stories (not shown above, but can be found in the Jupyter Notebook), which also did not reveal any clear correlations, but does show that approximately half of the data falls within 2 to 5 building stories. More data – particularly around buildings greater than 30 meters – would likely be needed to better evaluate impacts of building height.

Summary

I see this dataset and the corresponding report as a great tool for the industry. Buildings are inherently messy and complex and it’s important to collect data that reflects this. While a sample size of 292 projects will have its limitations, it’s a great step in the right direction, and provides a framework for future data collection.

This deep dive was driven by curiosity, but I hope these explorations offer useful insights or spark new questions. As the industry continues to refine its approach to measuring environmental impacts, I see datasets like this being essential in shaping more informed policies and effective design strategies.