A Briefing from COCO Consultancy
2026-03-17
Property sales and structural housing characteristics
Source: Philadelphia Property Sales
18,485 residential property sales from 2023 to 2024
Structural housing characteristics data includes home size, year built, bedrooms, bathrooms, interior and exterior condition, and other housing features
Neighborhood socioeconomic characteristics
Source: U.S. Census Bureau ACS
Data include median household income and households without a vehicle
Neighborhood amenities and accessibility
Source: OpenDataPhilly
Data include schools, major parks, transit stops, crime incidents, and vacant properties
Model 1: Structural
Housing characteristics only, including livable area, bedrooms, bathrooms, garage spaces, fireplaces, interior condition, and housing age
Model 2: + Spatial and socioeconomic context
Added crime, distance to school, distance to park, distance to SEPTA, nearby vacancy, median income, and households without a vehicle
Model 3: + Non-linear location effect
Added distance to Center City and its squared term
Model 4: + Interaction
Added an interaction between SEPTA distance and the share of households without a vehicle
Model 5: + Census tract fixed effects
Added tract fixed effects to capture unobserved neighborhood differences
| Model | R² (CV) | RMSE (CV) | MAE (CV) |
|---|---|---|---|
| Structural | 0.555 | 135,959 | 92,434 |
| Spatial + Socioeconomic | 0.687 | 114,037 | 70,907 |
| Non-linear term | 0.702 | 111,347 | 68,884 |
| Interaction | 0.702 | 111,302 | 68,918 |
| Fixed Effect | 0.771 | 97,499 | 57,279 |
The final model performs best with an R² of 0.771, which means that this model explains about 77% of the variation in home prices. The predicted price is usually about $97,500 away from the actual sale price, which is about a 28% improvement over the baseline structural model.
Residual is calculated as actual sale price minus predicted sale price.
Census Tract 1, highlighted in yellow, has the largest residual, which means it is the hardest neighborhood for the model to predict accurately and that prices there are often underpredicted.
It is located in Old City, which may reflect unique neighborhood characteristics, historic housing stock, or localized market dynamics that are not fully captured by the included variables.
The model predicts market value, not intrinsic or socially fair value
Because it is estimated from observed sale prices, the model reflects how the market values housing rather than what they may be worth in a broader social or policy sense.
Some important local factors are still missing
The model does not fully capture things such as renovation quality, historic character, school reputation.
The model is not equally accurate everywhere
Some neighborhoods remain harder to predict and need closer local review.
Next step
Add more detailed neighborhood-level factors and update the model annually update the model regularly with the latest property value data.