West Yorkshire Crime Analysis

Published 22 Jun 2025 Written by Hal Kolb Source Code

Executive Summary

Crime Type

Crime type is the primary feature of interest for this analysis and the stacked area plot below shows the absolute counts of each type of crime for each month in the dataset.

Note that each month's data is plotted on the last day of the corresponding month, comprising the sum of the preceding month's data.

Figure 1: Stacked area plot of the count of each crime. The height of the shaded area indicates the number of occurrences of that crime type for a given month.

The plot shows a notably rapid increase in reported crime for such a short period. Given that the data begins in April of 2020, these trends will have been impacted by COVID-19 and the corresponding national lockdown.

To examine the link between these factors, the plot below shows the data alongside the dates of key lockdown changes.

Figure 2: Plot of the counts of each crime type per month with labels to indicate key COVID-19 lockdown changes. Lockdown timeline data from: Institute for Government (https://www.instituteforgovernment.org.uk/sites/default/files/timeline-lockdown-web.pdf)

The timeline on this plot shows that the number of crimes in the dataset was low during the peak of the lockdown and began to rise following the easing of restrictions, before then falling again as measures are reintroduced during the second wave.

For a better sense of the changes in each specific crime, the two plots below show the month-on-month changes in crime counts in both absolute terms and relative to the total count the previous month. These figures are interactive and each crime type can be shown or hidden by clicking its name in the legend.

Figure 3: Plot of the absolute change in the number of occurrences of each crime type relative to the previous month.
Figure 4: Plot of the percentage change in the number of occurrences of each crime type relative to the previous month.

Unexpected or Unusual Features

Crime type is the primary feature of interest for this analysis and contains some unusual features and missing data. The crime type column has 14 unique values which are described on police.uk’s FAQ at: https://www.police.uk/pu/about- police.uk-crime-data. Each of these types are present in the dataset, however, there is an additional type present in the data which is not documented in the FAQ: ”exclusive”. There are 119 entries with this crime type, occurring roughly 20 times each month, at a range of locations, with various LSOA codes and names, and differing outcomes. The meaning of this crime type is unclear and (beyond occurring almost the same number of times each month) they do not exhibit a clear pattern. Given the lack of documentation regarding the meaning of this categorisation and my lack of familiarity with policing, these data were excluded from the analysis.

There are also 2000 entries with null values for the crime type. These null entires all come from the May and September files, with each having exactly 1000. The exact counts of these null entries indicate a possible systematic error which caused them be be added. These entries also have null values in all fields (except crime ID) meaning it is not possible to infer or impute the missing crime types, and as such these entries were also excluded from the analysis.


This section details the chronological process of loading, understanding, plotting and evaluating the data with accompa- nying code and outputs. The full source code is available in the Jupyter notebook here.

1. Load Data

First load a single csv to investigate the structure of the data.

Next, add a general function to load in all csvs from the data folder.

This works when each directory contains only a single csv - as is the case in the provided data. However, to make the process more general it would be useful to support multiple csvs per folder.

This works but can be improved using list comprehension for efficiency.

Create functions for loading data.

Next to check it works.

Now that all data is loaded it can be investigated.

2. Data Understanding

The meanings of each column are explained in this table from: https://data.police.uk/about/#columns

2.1 General info

2.2 Crime Types

These are similar to the crime categories listed above however, understandably the "all crime" category is not present and there is an extra category called "Exclusive".

The meaning of this "exclusive" crime type is not mentioned in the FAQ, nor in the list of Home Office Offence Codes provided here: https://www.police.uk/SysSiteAssets/police-uk/media/downloads/crime-categories/police-uk-category-mappings.csv

There are 119 of entries of this type occurring mostly 20 times a month with a range of outcomes.

Unfortunately, I cannot identify any pattern with the data recorded for this crime type

Given my lack of familiarity with this area and the lack of documentation regarding the meaning of this categorisation I will exclude these data from the analysis.

2.3 Null Values

Additionally, there are null values present for "crime type", and likely other columns, which should be examined.

The documentation on the crime types notes regarding the "Context" field that: "Currently, for newly added CSVs, this is always empty.". Which explains the large number of null values.

Otherwise, missing data may be due to the data not being available, not being entered or occurring due to an error.

2.3.1 Crime Type

2000 entries have no crime type.

These 2000 entries have no data present for any field except a crime ID. A round number of exactly 2000 missing values may indicate a systematic error, although 2000 would not divide equally across the 6 csvs. Given this, it is worth examining how many erroneous entries are present in each csv.

May and September each have exactly 1000 crimes with null types while the other do not have any.

I am unable to discern a pattern in which entries are missing, or infer/impute any data due to the records being entirely blank. As such these entries will be dropped for the analysis.

2.3.2 Dataset Summary

The dataset comprises six csv files of street-level crime data between April and September 2020 (inclusive) from the West Yorkshire police. Cumulatively the data consists of 158,898 records, with the following 12 columns: Crime ID, Month, Reported by, Falls within, Longitude, Latitude, Location, LSOA code, LSOA name, Crime type, Last outcome category and Context.

3. Data Cleaning

4. Plotting

4.1 Crime Type

First create a plot of the total number of crimes each month by type.

4.1.1 Plot overall crime each month

This plot shows an unusually rapid increase then decrease in the number of crimes over a short period. Given that this data is between April 2020 and August 2020 this striking change can likely be attributed to the national COVID-19 lockdown which began on 23 March 2020, causing a depression in the rate and a gradual increase as restrictions eased - before then decreasing during the second lockdown in September.

To examine the link between these factors, it will be useful to plot the data alongside the dates of key lockdown rule changes.

4.1.2 Plot overall crime each month noting key lockdown regulation changes

To get a more specific view of how each crime's prevelance is increasing/decreasing it is helpful to plot the month-on-month changes on another graph.

4.1.3 Month-On-Month Changes

Figure 3: Plot of the absolute change in the number of occurrences of each crime type relative to the previous month.
Figure 4: Plot of the percentage change in the number of occurrences of each crime type relative to the previous month.

Conclusion

Over the 6 month period, the counts of the each type of crime in the dataset increased during June and July as COVID-19 lockdown restrictions were eased before falling again as measures came back into effect in September. Shoplifting and theft saw their largest increase (of roughly 40\%) in July as retail reopened on July 15th, while ``anti-social behaviour" and ``violence and sexual offences" had their largest increases in June and July respectively. The data also has some unusual features in the crime type column - namely two months featuring an unexpected pattern of exactly 1000 blank entries, as well as an undocumented "exclusive" crime type occurring roughly 20 times each month.

It is also worth noting that this data only represents crimes which the police were aware of and documented. It is therefore useful to consider to what extent these trends are caused by actual changes in the amount of crime committed, and what is caused by fewer crimes getting spotted, reported and documented by the police. The effect of unreported crimes over this period will vary by crime type. For instance, shoplifting can only occur in stores, and as such is expected to fall as shops close, whereas, violent and sexual crimes can occur anywhere and will be harder to detect - and harder for victims to report - when they take place in the home. With more time, quantifying the effects of under-reporting by cross-referencing with other data sources would be a key area of focus, in addition to including, and comparing with, data from other years.


Extras

This sections contains additional analysis which, while not pertinant to the task outlined, caught my interest and curiosity.

Latitude and Longitude

The latitudes and longitudes are extremely skewed with many values exceeding the bounds of the UK let alone West Yorkshire.

UK Long range: -8.23 <-> 1.75 (NI to Norwich)
UK Lat range: 49.16 <-> 62.28 (Faroe to Jersey)

West Yorkshire Long range: -2.21 <-> -1.09
West Yorkshire Lat range: 54 <-> 53.5

While this is a somewhat interesting plot to look at, as crime only occurs where there are people to commit it, the data is mostly just showing the distribution of population.

Missing Crime IDs

29689 entries have no crime ID which is ~18.7% of the records. A cursory glance at some of the data indicates that many of these

4.1 Outcome