Case Study | Green Finance Enterprise Project
19 July, 2018
NEWS | Professor Maria Fasli is appointed Executive Dean of the Faculty of Science and Health at the University of Essex
31 July, 2018
Show all

Case Study | Prediction model for imputation of missing data for young homeless people

Prediction model for imputation of missing data for young homeless people

Research overview

In England, youth homelessness is documented poorly in official statistics. To describe the problem more carefully, a leading charity for young homeless people in the UK created a comprehensive Youth Homelessness Databank, but they found that the quality of the information provided was not satisfactory.

For the purpose of campaigning, a clearer picture of youth homelessness and its scale is needed. To provide an estimate for the young people presenting as homeless to their local authorities a model was required to fill the patchiness of the Youth Homelessness Databank and assist with promoting social awareness of the overall issue.

The objective was to build a model that would predict the missing data values in the databank, and to this end, the stepwise method was determined as the most potentially beneficial.

How the research helps

The research aimed to analyse the data from the Youth Homelessness Databank, which consisted of five years of Freedom of Information (FOI) returns from questionnaires submitted to all local authorities in England. The objective of the research was to create a model that accurately estimates the missing data for the number of presenting young people to represent those local authorities that did not respond to the FOI request. Although written in 2017, the research and subsequent model are based on the data for 2015.

The research and subsequent model will assist the youth homeless charity the Centre worked with by providing a more realistic level of youth homelessness in England, which is believed to be significantly higher than the official statistics.

By highlighting the gap representing the unofficial young people presenting as homeless or at risk of homelessness, charities working in this area aim to promote change in the Government policy and highlight social awareness of the issue.


This research represents a detailed examination of the data available from local authorities in England regarding young people who are deemed homeless or at risk of homelessness. It has provided an important analysis of the data available and identified a number of inconsistencies, irregularities and highlighted where more data should be collected.

The findings have been used to produce and test a model which can be used by charities working in this area to better understand the number of young people who are homeless or at risk of homelessness.

The data

The research makes use of the following data:

Primary Data

A questionnaire was used to gather information on the number of 16-24-year-olds that presented themselves as homeless, or at risk of homelessness, as well as their reasons for leaving their last settled place of living. Predictors were chosen based on the literature review undertaken.

Secondary Data

Statistical data was gathered from providers such as, NOMIS and ONS to quantify the varying risk factors.

A number of variables were considered when building the model that might lead to homelessness these included – area characteristics, migration, poverty, poor health, substance abuse, poor education, housing, immigration, and ethnicity.

Variables that were missing but would have been included if data had been available include relationship breakdown, domestic violence and de-institutionalisation (e.g. care leavers or ex-prisoners).

The research

The lack of commitment to create a prediction or even estimate the real levels of homelessness were evident from discussions with those working in the area. Although previous research had been published in this area they were based on counting the existing levels of vagrancy and no prediction models had been previously attempted in the UK.

A literature review to determine the possible risk factors that lead to homelessness was conducted and a number of models were considered to fulfil the requirements.

A stepwise regression model was created to impute the missing data however there were a number of limitations, which prevented the model from obtaining more robust results. These included: Poor data quality, variances in definition regarding young homeless people significantly skewing the data available, irregularities in the data and lack of available information on local author level.  Although the model was not stable it is strongly believed that having more temporary and appropriate statistical variables for the risk factors would potentially increase the validity of the model, alongside the number of statistically significant predictors.

It is believed that better results could be achieved. A non-statistical linear model could be more beneficial, and recommendations were made regarding the abnormalities in data to be discussed at source with those who gather the data and suggestions of other data sources to be considered especially from the private sector.

Further information and links

Ivanov, Martin

Prediction model for imputation of missing values of youth homelessness provided by the local authorities in England

Youth Homeless Data Bank

Report author

Martin Ivanov


Dr Dila Agrizzi – University of Essex