
Boulder County Housing Price Modeling
This project used regression and clustering to identify how age, size, and location affect Boulder County housing prices. Larger, newer homes cost more, while location alone doesn’t explain market variation.
Tools: Excel (Linear Regression), JMP Pro 17 (K-Means Clustering), Python (Elbow Method)
Skills: Data cleaning, descriptive statistics, regression modeling, clustering, visualization, interpretation
Team: Megan Delasantos, Kevin Gonzalez, Evan Steinmetz
Course: CVEN 3227: Probability, Statistics & Decision
Year: 2023
Read Full Project Report: Boulder Real Estate Pricing Model.pdf
Overview
This project explored the relationships between location, building age, and square footage in predicting housing sale prices across Boulder County, Colorado. Using over 4,500 real-estate transactions, we built a regression and clustering framework to identify how different urban and environmental factors influence property values—insights useful for both civil and environmental engineering decision-making.
Objective
To determine which factors most significantly affect housing prices and evaluate whether location alone defines market clustering across the county.
Methodology
1. Data Processing and Descriptive Analysis
Cleaned and standardized dataset (“Recent Sales by Property and Time Frame,” Boulder County 2023).
Examined continuous variables — price, age, above-ground area — and a categorical variable — location (city).
Calculated mean, median, standard deviation, and coefficient of variation to assess variability.
Key finding: Boulder County’s average home price was $513 k ± $462 k, showing substantial variation driven by mixed housing ages and market diversity.
2. Linear Regression Modeling
Built a multi-variable regression model correlating sale price with age, square footage, and dummy-coded location variables (relative to Boulder).
Achieved R² ≈ 0.49, sufficient for socio-economic data with human-driven variability.
Identified negative correlation with age (older homes → lower price) and positive correlation with area (larger homes → higher price).
Found Boulder consistently outpriced surrounding cities; e.g., homes in Erie averaged $600 k less than equivalent Boulder homes.
3. K-Means Clustering and Elbow Method
Performed clustering in JMP Pro 17 with k = 11 to test if price clusters followed geographic patterns.
No significant location-based separation was observed.
Used a Python elbow-method script to find the optimal number of clusters (k = 5).
Even with 5 clusters, location was not a primary driver, implying latent variables (perhaps neighborhood quality, amenities, or environmental factors) shaped market segmentation.


Insights & Implications
Engineering Relevance: Regression outcomes offer a predictive tool for urban planning, cost estimation, and housing-demand forecasting. Environmental engineers can apply similar models to link socio-economic data with sustainability metrics.
Policy & Development Application: Helps local agencies predict infrastructure needs and target sustainable urban growth, ensuring equitable development across fast-growing communities.
Key Deliverables
Regression model equation and coefficients table (Excel).
Cluster summaries and frequency charts (JMP Pro).
Python script and scree-plot visualization for elbow-method determination.
Results Summary
Predictor | Relationship | Significance (p < 0.05) | Interpretation |
Age (yrs) | Negative | ✔ | Older homes depreciate faster |
Area (ft²) | Positive | ✔ | Larger homes command higher price |
Location | Negative (vs Boulder) | ✔ for most | Peripheral cities → lower average value |