Research on factors influencing housing price index-take the USA as an example

. This article aims to identify the factors that have an impact on housing prices. The significance factors of 241 samples from the United States from 2003 to 2022 were analyzed using multiple linear regression method. Based on a hypothesis, the selected 5 variables are indeed related to housing prices. This article also collected a lot of other data related to the housing price index for multivariate judgment analysis, and used exploratory factors to test the research significance of each variable. To test the effectiveness of this operation, the study compared the VIF values and significance of these variables. The conclusion is that correlation analysis has been used to test the relationship between DATE and five variables: income, housing subsidies, unemployment rate, unsold or sold houses and total houses, as well as the magnitude of the impact of these factors on the housing price index. Overall, the volatility of the US housing price index can be considered based on the degree to which these factors affect it.


Introduction
The global economic landscape is experiencing unprecedented changes, with rapid advancements in economy and technology leading to frequent fluctuations in the housing price index.Several factors, including regional demand variations and policy differences, contribute to the unpredictability of these fluctuations.A stable price index environment is crucial for the health of the market economy, making it vital to comprehend the trends and determinants of price index volatility to make informed decisions.This knowledge is increasingly sought after across various sectors.
Given that real estate plays a fundamental role in economic growth, shifts in the real estate value index significantly impact the national economy and reflect on living standards.Research by He et al. highlighted that workforce costs, interest rates, and geographic locations substantially influence China's real estate market prices [1].Furthermore, Zhao et al. noted that local regulations affect real estate values, suggesting the need for local governments to adopt appropriate and scientific approaches to real estate management [2].Real estate prices are also influenced by other factors, as shown by Markus Baldauf et al., who demonstrated how housing prices vary in response to climate risks.They linked pricing to predictions of individual home flooding risks and climate change sentiments, revealing varied opinions on climate change's long-term effects on housing prices [3].The aim of exploring the relationship between these diverse variables and prices is to enhance the accuracy and logical consistency of price forecasts.
In their research, numerous scholars employ different models to predict prices.Pai et al. compared four machine learning models-least squares support vector regression (LSSVR), classification and regression tree (CART), general regression neural network (GRNN), and back propagation neural network (BPNN)-for forecasting real estate prices, finding that LSSVR had the lowest mean absolute percentage error (MAPE), making it a promising forecasting technique [4].Mawuli et al. introduced a logical smooth transition autoregressive fractional integration process for modeling and forecasting US house price volatility, showing that incorporating Markov switched multifractal (MSM) and the FIGARCH framework could enhance predictive accuracy [5].
Additionally, some researchers focus on analyzing large datasets to forecast real estate values.Archana et al. collected extensive data, including home sales from 2006 to 2010 in Ames, Iowa, to develop models for predicting home final sale prices [6,7].Similarly, Wang et al. applied a housing characteristic price model in Shenzhen, using 100 housing samples to prove the model's effectiveness, providing valuable insights for developers, consumers, and policy assessment [8].
This article aims to analyze the factors influencing the real estate price index using multiple linear regression models, offering recommendations for investors and government decision-making based on the findings.The growing interest in housing price index data among scholars underlines the importance of such studies in addressing current challenges in real estate markets.

Data source
The data is taken from Factors influencing US houses from Kaggle.This data is a US housing price index calculated in US dollars, with a total of 241 observations from January 2003 to December 2022.

Variable selection
House price indexes are the result of changes in national economic development and government intervention, reflecting the overall trend of national development and being influenced by government and major economic events [9,10].The variables are listed in Table 1.The occurrence of major events is irregular and unpredictable, therefore, changes in prices can be very frequent, and difficult to determine the magnitude of the changes, as illustrated in Figure 1.From Figure 1, it can be concluded that from January 2003 to March 2007, house price index had entered an upward phase, and then experienced a decrease trend from April 2007 to February 2012.It has almost fallen to a level as low as nine years ago.This can be seen as a microcosm of the 2007 subprime mortgage crisis.Afterward, the house price index showed a stable upward trend until August 2022.From August 2022 to December 2022, the house price index showed a stable downward trend.

Data processing
We obtained a US housing price index dataset from Kaggle from January 2003 to December 2022, which includes fields such as housing subsidies, building_permits, const_priceIndex, GDP, house for_sale_or_sold, income, interest_rate, total_houses, and housing price index.To ensure that all features are on the same scale, we standardized the housing area.We have also integrated other data from real estate companies, such as total_const_spending, urban_population, unemployment_rate, and so on, to enrich our dataset and include more detailed housing information.We then conducted a consistency check to ensure that the relationship between housing price index and housing area is reasonable.Finally, we will store the processed data in an Excel file for further analysis and modeling purposes.

Correlation analysis
The analysis in this paper shows that there are many factors Influencing Housing Price index.As the table 2 shows: The analysis of the Pearson correlation coefficient between various factors and housing prices reveals significant relationships.Specifically, a correlation analysis was used to study the connection between DATE and five variables: income, housing subsidies, unemployment rate, houses for sale or sold, and total houses.Income and DATE have a strong positive correlation of 0.830, signifying a substantial positive relationship.Housing subsidies and DATE are even more closely linked, with a correlation coefficient of 0.956, pointing to a significant positive relationship.
There's an insignificant correlation between the unemployment rate and DATE, with a coefficient of 0.047 and a p-value of 0.656, indicating no meaningful relationship.The relationship between houses for sale or sold and DATE is moderately positive, with a correlation coefficient of 0.478.
The total number of houses and DATE have a very strong positive correlation, with a coefficient of 0.967, suggesting a significant positive correlation.In summary, the analysis highlights the strong positive correlations between DATE and income, housing subsidies, houses for sale or sold, and total houses, with the unemployment rate showing no significant correlation.

Multiple linear regression
From all the above, what affect the housing prices are comprehensive.People nowadays are longing for a perfect house from many different angles.After analyzing the Pearson correlation matrix of various factors, multiple regression analysis was conducted (Table 3).The general mathematical model for multiple linear regression is: In the above formula:  0 is a constant term, and  is a residual term.In the presented model, the linear regression analysis formula includes independent variables such as housing_subsidies, house-for-sale-or_sold, unemployment-rate, const_price Index, and GDP, with home_price Index serving as the dependent variable.The model equation is expressed as follows: The R 2 value of 0.879 indicates that approximately 87.9% of the variation in home_price Index can be explained by the factors of housing_subsidies, house-for-sale-or_sold, unemployment_rate, const_price Index, and GDP.An F-test on the model reveals that it passes successfully (F=137.018,p=0.000<0.05),suggesting that at least one of the factors among housing_subsidies, house-for-sale-or_sold, unemployment-rate, const_price Index, and GDP significantly influences home_price Index.
Furthermore, a test for multicollinearity in the model reveals that a VIF value exceeding 10 indicates collinearity issues.To address this, Ridge regression or stepwise regression can be employed, and close examination of highly correlated independent variables is recommended for elimination and reanalysis.
The detailed analysis on the impact of various factors on the home_price Index reveals significant findings.Housing subsidies possess a noticeable positive effect on the home_price Index with a regression coefficient of 2.392, verified by significant statistics (t=2.234,p=0.028).Similarly, the number of houses for sale or already sold significantly enhances the home_price Index, as demonstrated by its regression coefficient of 0.249 and strong statistical support (t=5.959, p<0.01).GDP also significantly boosts the home_price Index, with a regression coefficient of 0.016, further supported by solid statistics (t=4.579,p<0.01).On the contrary, the unemployment rate adversely affects the home_price Index, indicated by a regression coefficient of -6.771 and equally significant statistics (t=-6.856,p<0.01).The construction price index, however, shows no significant correlation with the home_price Index, as its effects are statistically insignificant (regression coefficient of 0.075, t=0.563, p=0.575).In summary, housing subsidies, the house-for-sale_or_sold status, and GDP positively influence the home_price Index, whereas the unemployment rate negatively impacts it.Nonetheless, the construction price index appears to have no significant effect.

Conclusion
The study selected 241 samples from the dataset from January 2003 to December 2022, including 5 variables.Its method (multiple linear regression analysis) is accurate, effective, and comprehensive.This is because it conducts multiple factor analysis and obtains the Pearson correlation coefficients for each variable.
In the analysis phase, this article uses a multiple linear regression model to identify possible relationships between variables and housing prices.To gain more information, this study considered exploratory factors and added interaction terms with coefficients to the equation.Therefore, the factors that have a positive impact on housing prices are income, housing subsidies, houses for sale or sale, and total houses.The unemployment rate has little to do with US housing prices.From these factors, income, housing subsidies, houses for sale or sale, and total houses are the main factors.
Through research, people can have a reference for their dream house from different perspectives, thus having an overall determination of housing price budget.However, there are still some shortcomings, such as the inability to find causal relationships between variables, relatively small sample sizes, and the fact that the data is not the latest version.To improve this, search for new data and use the control variable method to identify possible causal relationships between housing prices and factors.The reference factors provided in this article will have different impacts due to regional changes and personal preferences, and specific analysis is needed in real situations.

Table 1 .
List of variables.
5 National income of individuals Figure 1.The trend of housing price index.