AI Services

STA Consulting Engineers

Assessing AI for Building Plan Parameter Extraction

Structured AI Evaluation

Testing what AI can reliably extract from real building plans

Benchmarking parameters, prompts, and the role of human review.

The Challenge

STA Consulting Engineers wanted to explore whether AI could help their engineering team extract important parameters from PDF building plans.

This was a valuable opportunity because plan review and parameter extraction can be time-consuming, repetitive, and difficult to scale. The key question was not simply whether AI could read the plans, but whether it could do so accurately enough to support a practical workflow.

Our Approach

We conducted a structured study using a large dataset of real building plans supplied by STA.

The work involved testing AI performance across a range of important parameters and comparing the impact of different prompt approaches on extraction accuracy. Rather than treating AI as a black box, the study was designed to evaluate performance in a systematic and measurable way.

What we evaluated

AI extraction accuracy across multiple plan parameters
The effect of different prompt designs on output quality
Which information types were more or less suitable for AI extraction
Where human interpretation remained important

What We Found

The results showed mixed accuracy across the different parameters tested.

Some parameters were more suitable for AI extraction than others, while certain types of information remained difficult to extract consistently. We also found that prompt design had a noticeable impact on results, with some prompt approaches performing better than others.

One of the most important findings was that the areas where AI struggled were often the same areas that humans found difficult to interpret from the source plans.

Illustrative results from the benchmark

STA supplied 200 test cases covering a range of parameters they wanted investigated. By way of example, this case study highlights two of them: roof pitch and roof shape.

Example 1: Roof Pitch

The aim was to get results within a 5 degree tolerance.
The outcome was that 157 out of 200 plans were within that tolerance.
10 out of 200 plans, or 5%, were flagged as having no result or requiring review.

Closer inspection of those no-result cases showed that some PDFs passed to the AI did not contain usable roof information. That is a useful result in itself, because those cases can be immediately flagged for human review rather than forcing a low-confidence answer.

Example 2: Roof Shape

The aim was to classify the roof shape against the baseline categories used in the study.
The outcome was that 156 out of 200 plans matched the baseline directly.
34 plans disagreed with the baseline, while 10 plans returned either no result or Human review Required.

Importantly, the largest disagreement cluster was Human = GABLE versus AI = HIP. The results suggest this may partly reflect interpretation differences as a human knows that for roof shapes which are hard to classify it is best to select the more conservative Gable shape.

The Insight

Instead of asking, "Can AI replace this task?", the more useful question became:

How can AI and human review be combined to make the overall process faster, simpler, and more reliable?

The Outcome

The project gave STA clearer evidence about where AI could add value and where caution was needed.

It also helped shape a more practical path forward by highlighting:

where AI extraction showed promise
where prompt optimisation could improve results
where human-in-the-loop review would remain important
where workflow simplification could reduce checking effort
how better process design could improve efficiency overall

Why This Matters

For organisations considering AI, the real value is often not a simple yes-or-no answer.

It is gaining a clearer understanding of what AI can do well, where it struggles, what level of human oversight is still needed, and how the workflow can be redesigned to achieve better outcomes.

This case study shows how systematic testing can turn AI from an uncertain idea into a more informed and practical decision.