Boulder County Housing Price Modeling

This project used regression and clustering to identify how age, size, and location affect Boulder County housing prices. Larger, newer homes cost more, while location alone doesn’t explain market variation.

Tools: Excel (Linear Regression), JMP Pro 17 (K-Means Clustering), Python (Elbow Method)

Skills: Data cleaning, descriptive statistics, regression modeling, clustering, visualization, interpretation

Team: Megan Delasantos, Kevin Gonzalez, Evan Steinmetz

Course: CVEN 3227: Probability, Statistics & Decision

Year: 2023

Read Full Project Report: Boulder Real Estate Pricing Model.pdf

Overview

This project explored the relationships between location, building age, and square footage in predicting housing sale prices across Boulder County, Colorado. Using over 4,500 real-estate transactions, we built a regression and clustering framework to identify how different urban and environmental factors influence property values—insights useful for both civil and environmental engineering decision-making.

Objective

To determine which factors most significantly affect housing prices and evaluate whether location alone defines market clustering across the county.

Methodology

1. Data Processing and Descriptive Analysis

Cleaned and standardized dataset (“Recent Sales by Property and Time Frame,” Boulder County 2023).
Examined continuous variables — price, age, above-ground area — and a categorical variable — location (city).
Calculated mean, median, standard deviation, and coefficient of variation to assess variability.

Key finding: Boulder County’s average home price was $513 k ± $462 k, showing substantial variation driven by mixed housing ages and market diversity.

2. Linear Regression Modeling

Built a multi-variable regression model correlating sale price with age, square footage, and dummy-coded location variables (relative to Boulder).
Achieved R² ≈ 0.49, sufficient for socio-economic data with human-driven variability.
Identified negative correlation with age (older homes → lower price) and positive correlation with area (larger homes → higher price).
Found Boulder consistently outpriced surrounding cities; e.g., homes in Erie averaged $600 k less than equivalent Boulder homes.

3. K-Means Clustering and Elbow Method

Performed clustering in JMP Pro 17 with k = 11 to test if price clusters followed geographic patterns.
No significant location-based separation was observed.
Used a Python elbow-method script to find the optimal number of clusters (k = 5).
Even with 5 clusters, location was not a primary driver, implying latent variables (perhaps neighborhood quality, amenities, or environmental factors) shaped market segmentation.

Log-log plot of building value vs. above ground square footage

Insights & Implications

Engineering Relevance: Regression outcomes offer a predictive tool for urban planning, cost estimation, and housing-demand forecasting. Environmental engineers can apply similar models to link socio-economic data with sustainability metrics.
Policy & Development Application: Helps local agencies predict infrastructure needs and target sustainable urban growth, ensuring equitable development across fast-growing communities.

Key Deliverables

Regression model equation and coefficients table (Excel).
Cluster summaries and frequency charts (JMP Pro).
Python script and scree-plot visualization for elbow-method determination.

Results Summary

Predictor	Relationship	Significance (p < 0.05)	Interpretation
Age (yrs)	Negative	✔	Older homes depreciate faster
Area (ft²)	Positive	✔	Larger homes command higher price
Location	Negative (vs Boulder)	✔ for most	Peripheral cities → lower average value

Portfolio

Boulder County Housing Price Modeling

Overview

Objective

Methodology

1. Data Processing and Descriptive Analysis

2. Linear Regression Modeling

3. K-Means Clustering and Elbow Method

Insights & Implications

Key Deliverables

Results Summary

Read Full Project Report

Project Gallery

Let's Connect!

Email: kevin.gsanchez@outlook.com
Telephone: 786-838-0096

Portfolio

Boulder County Housing Price Modeling

Overview

Objective

Methodology

1. Data Processing and Descriptive Analysis

2. Linear Regression Modeling

3. K-Means Clustering and Elbow Method

Insights & Implications

Key Deliverables

Results Summary

Read Full Project Report

Project Gallery

Let's Connect!

Email: kevin.gsanchez@outlook.com Telephone: 786-838-0096

Email: kevin.gsanchez@outlook.com
Telephone: 786-838-0096