top of page

Boulder County Housing Price Modeling

This project used regression and clustering to identify how age, size, and location affect Boulder County housing prices. Larger, newer homes cost more, while location alone doesn’t explain market variation.

Tools: Excel (Linear Regression), JMP Pro 17 (K-Means Clustering), Python (Elbow Method)

Skills: Data cleaning, descriptive statistics, regression modeling, clustering, visualization, interpretation

Team: Megan Delasantos, Kevin Gonzalez, Evan Steinmetz

Course: CVEN 3227: Probability, Statistics & Decision

Year: 2023

Read Full Project Report: Boulder Real Estate Pricing Model.pdf


Overview

This project explored the relationships between location, building age, and square footage in predicting housing sale prices across Boulder County, Colorado. Using over 4,500 real-estate transactions, we built a regression and clustering framework to identify how different urban and environmental factors influence property values—insights useful for both civil and environmental engineering decision-making.


Objective

To determine which factors most significantly affect housing prices and evaluate whether location alone defines market clustering across the county.


Methodology

1. Data Processing and Descriptive Analysis

  • Cleaned and standardized dataset (“Recent Sales by Property and Time Frame,” Boulder County 2023).

  • Examined continuous variables — price, age, above-ground area — and a categorical variable — location (city).

  • Calculated mean, median, standard deviation, and coefficient of variation to assess variability.


Key finding: Boulder County’s average home price was $513 k ± $462 k, showing substantial variation driven by mixed housing ages and market diversity.


2. Linear Regression Modeling

  • Built a multi-variable regression model correlating sale price with age, square footage, and dummy-coded location variables (relative to Boulder).

  • Achieved R² ≈ 0.49, sufficient for socio-economic data with human-driven variability.

  • Identified negative correlation with age (older homes → lower price) and positive correlation with area (larger homes → higher price).

  • Found Boulder consistently outpriced surrounding cities; e.g., homes in Erie averaged $600 k less than equivalent Boulder homes.

3. K-Means Clustering and Elbow Method

  • Performed clustering in JMP Pro 17 with k = 11 to test if price clusters followed geographic patterns.

  • No significant location-based separation was observed.

  • Used a Python elbow-method script to find the optimal number of clusters (k = 5).

  • Even with 5 clusters, location was not a primary driver, implying latent variables (perhaps neighborhood quality, amenities, or environmental factors) shaped market segmentation.

Log-log plot of building value vs. above ground square footage
Log-log plot of building value vs. above ground square footage
Log-log plot of building value vs. age
Log-log plot of building value vs. age

Insights & Implications

  • Engineering Relevance: Regression outcomes offer a predictive tool for urban planning, cost estimation, and housing-demand forecasting. Environmental engineers can apply similar models to link socio-economic data with sustainability metrics.

  • Policy & Development Application: Helps local agencies predict infrastructure needs and target sustainable urban growth, ensuring equitable development across fast-growing communities.


Key Deliverables

  • Regression model equation and coefficients table (Excel).

  • Cluster summaries and frequency charts (JMP Pro).

  • Python script and scree-plot visualization for elbow-method determination.

Results Summary

Predictor

Relationship

Significance (p < 0.05)

Interpretation

Age (yrs)

Negative

Older homes depreciate faster

Area (ft²)

Positive

Larger homes command higher price

Location

Negative (vs Boulder)

✔ for most

Peripheral cities → lower average value


Read Full Project Report



Project Gallery

bottom of page