Crime type is the primary feature of interest for this analysis and the stacked area plot below shows the absolute counts of each type of crime for each month in the dataset.

Note that each month's data is plotted on the last day of the corresponding month, comprising the sum of the preceding month's data.

The plot shows a notably rapid increase in reported crime for such a short period. Given that the data begins in April of 2020, these trends will have been impacted by COVID-19 and the corresponding national lockdown.

To examine the link between these factors, the plot below shows the data alongside the dates of key lockdown changes.

The timeline on this plot shows that the number of crimes in the dataset was low during the peak of the lockdown and began to rise following the easing of restrictions, before then falling again as measures are reintroduced during the second wave.

For a better sense of the changes in each specific crime, the two plots below show the month-on-month changes in crime counts in both absolute terms and relative to the total count the previous month. These figures are interactive and each crime type can be shown or hidden by clicking its name in the legend.

Unexpected or Unusual Features

Crime type is the primary feature of interest for this analysis and contains some unusual features and missing data. The crime type column has 14 unique values which are described on police.uk’s FAQ at: https://www.police.uk/pu/about- police.uk-crime-data. Each of these types are present in the dataset, however, there is an additional type present in the data which is not documented in the FAQ: ”exclusive”. There are 119 entries with this crime type, occurring roughly 20 times each month, at a range of locations, with various LSOA codes and names, and differing outcomes. The meaning of this crime type is unclear and (beyond occurring almost the same number of times each month) they do not exhibit a clear pattern. Given the lack of documentation regarding the meaning of this categorisation and my lack of familiarity with policing, these data were excluded from the analysis.

There are also 2000 entries with null values for the crime type. These null entires all come from the May and September files, with each having exactly 1000. The exact counts of these null entries indicate a possible systematic error which caused them be be added. These entries also have null values in all fields (except crime ID) meaning it is not possible to infer or impute the missing crime types, and as such these entries were also excluded from the analysis.

This section details the chronological process of loading, understanding, plotting and evaluating the data with accompa- nying code and outputs. The full source code is available in the Jupyter notebook here.

1. Load Data

# Read data into frame and display the columns
df = pd.read_csv("data/2020-04/2020-04-west-yorkshire-street.csv", index_col=0)
df.columns

Index(['Crime ID', 'Month', 'Falls within', 'Longitude', 'Latitude', 'Location', 'LSOA code',
'LSOA name', 'Crime type', 'Last outcome category', 'Context'], dtype='object')

# Examine the frame head
df.head()

	Crime ID	Month	Falls within	Longitude	Latitude	Location	LSOA code	LSOA name	Crime type	Last outcome category	Context
0	NaN	2020-04	West Yorkshire Police	-1.550626	53.597400	On or near Swithen Hill	E01007359	Barnsley 005C	Anti-social behaviour	NaN	NaN
1	5ea1997471c9de64fcfcf1145cadfff71ba37f21668d25...	2020-04	West Yorkshire Police	-1.670108	53.553629	On or near Huddersfield Road	E01007426	Barnsley 027D	Burglary	Investigation complete; no suspect identified	NaN
2	NaN	2020-04	West Yorkshire Police	-1.862742	53.940068	On or near Smithy Greaves	E01010646	Bradford 001A	Anti-social behaviour	NaN	NaN
3	0d8ee70dbd3096b4d07059d7f7c310fbf5de9cb7d44c31...	2020-04	West Yorkshire Police	-1.879031	53.943807	On or near Cross End Fold	E01010646	Bradford 001A	Shoplifting	Investigation complete; no suspect identified	NaN
4	cb4709a03d98dc63ba4c1771171bc7a9353097f5851d80...	2020-04	West Yorkshire Police	-1.882481	53.924936	On or near Moorside Lane	E01010646	Bradford 001A	Violence and sexual offences	Unable to prosecute suspect	NaN

# Get a list of .csv files from the data directory

csvs = [f"{root}/{files[0]}" for root, dirs, files in os.walk("data/") if "west-yorkshire-street.csv" in files[0]]

# Combine all csv data into a single dataframe.

data: pd.DataFrame = None
for file in csvs:
      newData = pd.read_csv(file, index_col=0)
      if data is None:
         data = newData
      else:
         data = pd.concat((data, newData))

# Confirm that all the data is present

data["Month"].unique()

array(['2020-06', '2020-08', '2020-09', nan, '2020-07', '2020-05', '2020-04'], dtype=object)

This works when each directory contains only a single csv - as is the case in the provided data. However, to make the process more general it would be useful to support multiple csvs per folder.

csvs = []
for root, dirs, files in os.walk("data/"):
      for file in files:
         if ".csv" in file:
            csvs.append(f"{root}/{file}")
csvs

['data/2020-06/2020-06-west-yorkshire-street.csv',
 'data/2020-08/2020-08-west-yorkshire-street.csv',
 'data/2020-09/2020-09-west-yorkshire-street.csv',
 'data/2020-07/2020-07-west-yorkshire-street.csv',
 'data/2020-05/2020-05-west-yorkshire-street.csv',
 'data/2020-04/2020-04-west-yorkshire-street.csv']

csvs = [f"{root}/{file}" 
            for root, dirs, files in os.walk("data/")
               for file in files 
                  if ".csv" in file]
csvs

['data/2020-06/2020-06-west-yorkshire-street.csv',
     'data/2020-08/2020-08-west-yorkshire-street.csv',
     'data/2020-09/2020-09-west-yorkshire-street.csv',
     'data/2020-07/2020-07-west-yorkshire-street.csv',
     'data/2020-05/2020-05-west-yorkshire-street.csv',
     'data/2020-04/2020-04-west-yorkshire-street.csv']

def findCSVsInDir(dir: str) -> list:
   '''
   Takes in a directory as an input string and returns a list of paths to each csv in the folder or
   any subfolders.
   '''
      return [f"{root}/{file}" 
            for root, dirs, files in os.walk(dir)
               for file in files 
                  if ".csv" in file]

def readAllCSVsInDir(dir: str) -> pd.DataFrame:
   '''
   Returns a single dataframe containing a concatenation of all csvs within a particular folder and any
   of its subfolders.
   '''
      data = None
      for file in findCSVsInDir(dir):
         newData = pd.read_csv(file, index_col=0)
         if data is None:
            data = newData
         else:
            data = pd.concat((data, newData))
      return data

crimesDF = readAllCSVsInDir("data")
print(crimesDF["Month"].unique())
crimesDF

['2020-06' '2020-08' '2020-09' nan '2020-07' '2020-05' '2020-04']

	Crime ID	Month	Reported by	Falls within	Longitude	Latitude	Location	LSOA code	LSOA name	Crime type	Last outcome category	Context
0	9cfc0ed854bc20e2402d91de03c01bb0eec53ca7d1e52f...	2020-06	West Yorkshire Police	West Yorkshire Police	-1.764583	53.534617	On or near Park/Open Space	E01007426	Barnsley 027D	Burglary	Status update unavailable	NaN
1	e8ef06134d7cbd661b44b14b0090f533d767b1c56702fc...	2020-06	West Yorkshire Police	West Yorkshire Police	-1.764583	53.534617	On or near Park/Open Space	E01007426	Barnsley 027D	Burglary	Investigation complete; no suspect identified	NaN
2	NaN	2020-06	West Yorkshire Police	West Yorkshire Police	-1.873004	53.941724	On or near Cornerstones Close	E01010646	Bradford 001A	Anti-social behaviour	NaN	NaN
3	NaN	2020-06	West Yorkshire Police	West Yorkshire Police	-1.882481	53.924936	On or near Moorside Lane	E01010646	Bradford 001A	Anti-social behaviour	NaN	NaN
4	NaN	2020-06	West Yorkshire Police	West Yorkshire Police	-1.873004	53.941724	On or near Cornerstones Close	E01010646	Bradford 001A	Anti-social behaviour	NaN	NaN
...	...	...	...	...	...	...	...	...	...	...	...	...
21780	d7659bf05dccb87e33d38dea8167489103fe6247a522de...	2020-04	NaN	West Yorkshire Police	NaN	NaN	No Location	NaN	NaN	Other crime	Investigation complete; no suspect identified	NaN
21781	d0f26c21c2c0aac15d667afa2a0406a2cdf1069be2b075...	2020-04	NaN	West Yorkshire Police	NaN	NaN	No Location	NaN	NaN	Other crime	Unable to prosecute suspect	NaN
21782	413e818c0b01d0614e55398654d6cb7a64d83bd12e5833...	2020-04	NaN	West Yorkshire Police	NaN	NaN	No Location	NaN	NaN	Other crime	Unable to prosecute suspect	NaN
21783	e124b1d4f0c201248c69971b862cec6425343b0342f73a...	2020-04	NaN	West Yorkshire Police	NaN	NaN	No Location	NaN	NaN	Other crime	Unable to prosecute suspect	NaN
21784	be1051a23575910d1b81883cf4f2cee81473bdf4ae069b...	2020-04	NaN	West Yorkshire Police	NaN	NaN	No Location	NaN	NaN	Other crime	Unable to prosecute suspect	NaN

158898 rows × 12 columns

2. Data Understanding

The meanings of each column are explained in this table from: https://data.police.uk/about/#columns

Field	Meaning
Reported by	The force that provided the data about the crime.
Falls within	At present, also the force that provided the data about the crime. This is currently being looked into and is likely to change in the near future.
Longitude and Latitude	The anonymised coordinates of the crime. See Location Anonymisation for more information.
LSOA code and LSOA name	References to the Lower Layer Super Output Area that the anonymised point falls into, according to the LSOA boundaries provided by the Office for National Statistics.
Crime type	One of the crime types listed in the Police.UK FAQ.
Last outcome category	A reference to whichever of the outcomes associated with the crime occurred most recently. For example, this crime's 'Last outcome category' would be 'Formal action is not in the public interest'.
Context	A field provided for forces to provide additional human-readable data about individual crimes. Currently, for newly added CSVs, this is always empty.

From: https://data.police.uk/about/#columns

The details of each crime type are as follows:

Crime type	Description
All crime	Total for all categories.
Anti-social behaviour	Includes personal, environmental and nuisance anti-social behaviour.
Bicycle theft	Includes the taking without consent or theft of a pedal cycle.
Burglary	Includes offences where a person enters a house or other building with the intention of stealing.
Criminal damage and arson	Includes damage to buildings and vehicles and deliberate damage by fire.
Drugs	Includes offences related to possession, supply and production.
Other crime	Includes forgery, perjury and other miscellaneous crime.
Other theft	Includes theft by an employee, blackmail and making off without payment.
Possession of weapons	Includes possession of a weapon, such as a firearm or knife.
Public order	Includes offences which cause fear, alarm or distress.
Robbery	Includes offences where a person uses force or threat of force to steal.
Shoplifting	Includes theft from shops or stalls.
Theft from the person	Includes crimes that involve theft directly from the victim (including handbag, wallet, cash, mobile phones) but without the use or threat of physical force.
Vehicle crime	Includes theft from or of a vehicle or interference with a vehicle.
Violence and sexual offences	Includes offences against the person such as common assaults, Grievous Bodily Harm and sexual offences.

From: https://www.police.uk/pu/about-police.uk-crime-data/

2.1 General info

# Check the number of records
len(crimesDF)

# Examine columns and their types
print(crimesDF.convert_dtypes().dtypes)

Crime ID                 string[python]
Month                    string[python]
Reported by              string[python]
Falls within             string[python]
Longitude                       Float64
Latitude                        Float64
Location                 string[python]
LSOA code                string[python]
LSOA name                string[python]
Crime type               string[python]
Last outcome category    string[python]
Context                           Int64
dtype: object

# Examine unique values for each column

for col in crimesDF.columns:
      numUnique = len(crimesDF[col].unique())
      print(f"{col}: has {numUnique} unique entries")

Crime ID: has 129208 unique entries
Month: has 7 unique entries
Reported by: has 2 unique entries
Falls within: has 2 unique entries
Longitude: has 25818 unique entries
Latitude: has 25131 unique entries
Location: has 18712 unique entries
LSOA code: has 1419 unique entries
LSOA name: has 1419 unique entries
Crime type: has 16 unique entries
Last outcome category: has 14 unique entries
Context: has 1 unique entries

2.2 Crime Types

# List unique crime types present in the dataset

sorted(crimesDF["Crime type"].dropna().unique()) + ["Null"]

['Anti-social behaviour',
'Bicycle theft',
'Burglary',
'Criminal damage and arson',
'Drugs',
'Exclusive',
'Other crime',
'Other theft',
'Possession of weapons',
'Public order',
'Robbery',
'Shoplifting',
'Theft from the person',
'Vehicle crime',
'Violence and sexual offences',
'Null']

These are similar to the crime categories listed above however, understandably the "all crime" category is not present and there is an extra category called "Exclusive".

# Show crimes which have an "exclusive" type

crimesDF.loc[crimesDF["Crime type"] == "Exclusive"]

	Crime ID	Month	Reported by	Falls within	Longitude	Latitude	Location	LSOA code	LSOA name	Crime type	Last outcome category	Context
223	NaN	2020-06	West Yorkshire Police	West Yorkshire Police	-1.734433	53.891805	On or near Parking Area	E01010773	Bradford 005C	Exclusive	NaN	NaN
2982	27d686161e51ad5806bc54eeb20eccf69141eb2f49cca7...	2020-06	West Yorkshire Police	West Yorkshire Police	-1.732338	53.807628	On or near Beldon Place	E01010828	Bradford 035A	Exclusive	Investigation complete; no suspect identified	NaN
3118	NaN	2020-06	West Yorkshire Police	West Yorkshire Police	-1.732609	53.800735	On or near Wingfield Mount	E01010832	Bradford 035E	Exclusive	NaN	NaN
3532	abbaa995c0069f4ac0a7a9d411c6004858a60d78fe811c...	2020-06	West Yorkshire Police	West Yorkshire Police	-1.734881	53.799194	On or near Alcester Garth	E01010607	Bradford 039B	Exclusive	Investigation complete; no suspect identified	NaN
3849	NaN	2020-06	West Yorkshire Police	West Yorkshire Police	-1.744363	53.792968	On or near A6181	E01033693	Bradford 039J	Exclusive	NaN	NaN
...	...	...	...	...	...	...	...	...	...	...	...	...
16266	013a511d6d775aabf07ce9c08ed0b23a4d4c6f5056eba1...	2020-04	NaN	West Yorkshire Police	-1.549470	53.775533	On or near Bude Road	E01011372	Leeds 086C	Exclusive	Investigation complete; no suspect identified	NaN
18231	ccb10c712da0ebf2162bf1ae6635c2406668bb15aa73c0...	2020-04	NaN	West Yorkshire Police	-1.501665	53.765960	On or near Park/Open Space	E01011470	Leeds 112C	Exclusive	Investigation complete; no suspect identified	NaN
18725	NaN	2020-04	NaN	West Yorkshire Police	-1.340439	53.722974	On or near Morton Crescent	E01011751	Wakefield 005C	Exclusive	NaN	NaN
19995	40c974c6a4274c85a103f7fc63725715b8816c37443a12...	2020-04	NaN	West Yorkshire Police	-1.310717	53.682332	On or near Pease Close	E01011846	Wakefield 023B	Exclusive	Action to be taken by another organisation	NaN
20322	NaN	2020-04	NaN	West Yorkshire Police	-1.362931	53.670685	On or near Verner Street	E01011780	Wakefield 027B	Exclusive	NaN	NaN

119 rows × 12 columns

# List the counts of each "last outcome" for these "exclusive" crimes 

exclusiveDF = crimesDF.loc[crimesDF["Crime type"] == "Exclusive"]
exclusiveDF["Last outcome category"].value_counts()

Last outcome category
Unable to prosecute suspect                            45
Investigation complete; no suspect identified          42
Court result unavailable                                4
Status update unavailable                               3
Offender given a caution                                1
Further investigation is not in the public interest     1
Formal action is not in the public interest             1
Local resolution                                        1
Action to be taken by another organisation              1
Name: count, dtype: int64

# Most of the crimes have the outcomes of "Unable to prosecute suspect" 
# and "Investigation complete; no suspect identified" which is
# roughly consistent with the distribution throughout the data.

crimesDF["Last outcome category"].value_counts(ascending=False)

Last outcome category
Unable to prosecute suspect                            63395
Investigation complete; no suspect identified          43585
Court result unavailable                               10575
Status update unavailable                               3605
Local resolution                                        2915
Offender given a caution                                1294
Further investigation is not in the public interest     1121
Action to be taken by another organisation               501
Formal action is not in the public interest              279
Further action is not in the public interest             140
Awaiting court outcome                                   131
Suspect charged as part of another case                   22
Offender given penalty notice                              1
Name: count, dtype: int64

# List the counts of each "last outcome" for these "exclusive" crimes 
exclusiveDF["LSOA code"].value_counts()

LSOA code
E01011811    3
E01010782    2
E01011677    2
E01033693    2
E01010627    2
            ..
E01011757    1
E01011752    1
E01011493    1
E01011433    1
E01011780    1
Name: count, Length: 108, dtype: int64

# List the counts of each "month" for these "exclusive" crimes 

crimesDF.loc[crimesDF["Crime type"] == "Exclusive"]["Month"].value_counts()

Month
2020-06    20
2020-08    20
2020-09    20
2020-07    20
2020-04    20
2020-05    19
Name: count, dtype: int64

There are 119 of entries of this type occurring mostly 20 times a month with a range of outcomes.

Unfortunately, I cannot identify any pattern with the data recorded for this crime type

Given my lack of familiarity with this area and the lack of documentation regarding the meaning of this categorisation I will exclude these data from the analysis.

2.3 Null Values

Additionally, there are null values present for "crime type", and likely other columns, which should be examined.

# Count null values in each column

null_counts = pd.DataFrame({
       'column': crimesDF.columns,
       'null_count': [crimesDF[col].isnull().sum() for col in crimesDF.columns]
})
null_counts.sort_values(by="null_count", ascending=False)

	column	null_count
11	Context	158898
10	Last outcome category	31334
3	Falls within	31278
0	Crime ID	29689
2	Reported by	23785
7	LSOA code	5488
8	LSOA name	5488
4	Longitude	5487
5	Latitude	5487
1	Month	2000
6	Location	2000
9	Crime type	2000

The documentation on the crime types notes regarding the "Context" field that: "Currently, for newly added CSVs, this is always empty.". Which explains the large number of null values.

Otherwise, missing data may be due to the data not being available, not being entered or occurring due to an error.

2.3.1 Crime Type

# View crimes with missing types

crimesDF.loc[crimesDF["Crime type"].isna()]

	Crime ID	Month	Reported by	Falls within	Longitude	Latitude	Location	LSOA code	LSOA name	Crime type	Last outcome category	Context
22	013d7f93d61de0036327474674b7d5767ffc9f0b2787a8...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
25	e0ec19a6355822b23d0d2a8be119d730449fa107e34173...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
30	fff65e2c833cff7fed3e9f4e9e15ca33008f1b55584743...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
63	e1aa4834bce7d8b7a0c8d761eddcc944514809e61f6c27...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
74	056dab0d794c00ff3ed10dffb56d7bf4702c2adbf60700...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
...	...	...	...	...	...	...	...	...	...	...	...	...
24494	301c7de516f283f7796cb0a41c287849c4a3169b17880e...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
24496	f1c8b597d0830e2bed561230f8be7f3c52c9243b24b15f...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
24500	3430117d1065ff30530009ceaec4a9c75467244cee35d6...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
24572	960848317f66498ee2566de2a638046f8af6421c04f1f4...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
24577	66fa8bf450b0ea292e06165eaae98b8194e47bdff03194...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN

2000 rows × 12 columns

# Check how many entries are totally blank

crimesDF.loc[crimesDF.drop(columns="Crime ID").isna().all(1)]

	Crime ID	Month	Reported by	Falls within	Longitude	Latitude	Location	LSOA code	LSOA name	Crime type	Last outcome category	Context
22	013d7f93d61de0036327474674b7d5767ffc9f0b2787a8...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
25	e0ec19a6355822b23d0d2a8be119d730449fa107e34173...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
30	fff65e2c833cff7fed3e9f4e9e15ca33008f1b55584743...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
63	e1aa4834bce7d8b7a0c8d761eddcc944514809e61f6c27...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
74	056dab0d794c00ff3ed10dffb56d7bf4702c2adbf60700...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
...	...	...	...	...	...	...	...	...	...	...	...	...
24494	301c7de516f283f7796cb0a41c287849c4a3169b17880e...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
24496	f1c8b597d0830e2bed561230f8be7f3c52c9243b24b15f...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
24500	3430117d1065ff30530009ceaec4a9c75467244cee35d6...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
24572	960848317f66498ee2566de2a638046f8af6421c04f1f4...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
24577	66fa8bf450b0ea292e06165eaae98b8194e47bdff03194...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN

2000 rows × 12 columns

These 2000 entries have no data present for any field except a crime ID. A round number of exactly 2000 missing values may indicate a systematic error, although 2000 would not divide equally across the 6 csvs. Given this, it is worth examining how many erroneous entries are present in each csv.

# Count missing entries in each csv

for month in ["04", "05", "06", "07", "08", "09"]:
      csvPath = f"data/2020-{month}/2020-{month}-west-yorkshire-street.csv"
      monthDF = pd.read_csv(csvPath)
      print(month, len(monthDF.loc[monthDF["Crime type"].isna()]))

May and September each have exactly 1000 crimes with null types while the other do not have any.

# Show entries with null crime types from May csv

mayDF = pd.read_csv(f"data/2020-05/2020-05-west-yorkshire-street.csv", index_col=0)
mayDF.loc[mayDF["Crime type"].isna()]

	Crime ID	Month	Reported by	Falls within	Longitude	Latitude	Location	LSOA code	LSOA name	Crime type	Last outcome category	Context
1	472dce151845a2dda9743bb01df3023f43e65137eb8409...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
53	612459be5a572b07fd7231477945210d86d0ebaa866199...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
61	7e94a8ca2bdd2e9d274e91b19305e98d014aa3902978b1...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
74	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
82	bce93b2672320dc226a1218a9ee5b55b1b482ef124cea8...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
...	...	...	...	...	...	...	...	...	...	...	...	...
24494	301c7de516f283f7796cb0a41c287849c4a3169b17880e...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
24496	f1c8b597d0830e2bed561230f8be7f3c52c9243b24b15f...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
24500	3430117d1065ff30530009ceaec4a9c75467244cee35d6...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
24572	960848317f66498ee2566de2a638046f8af6421c04f1f4...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
24577	66fa8bf450b0ea292e06165eaae98b8194e47bdff03194...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN

1000 rows × 12 columns

# Show entries with null crime types from May csv

sepDF = pd.read_csv(f"data/2020-09/2020-09-west-yorkshire-street.csv", index_col=0)
sepDF.loc[sepDF["Crime type"].isna()]

	Crime ID	Month	Reported by	Falls within	Longitude	Latitude	Location	LSOA code	LSOA name	Crime type	Last outcome category	Context
22	013d7f93d61de0036327474674b7d5767ffc9f0b2787a8...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
25	e0ec19a6355822b23d0d2a8be119d730449fa107e34173...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
30	fff65e2c833cff7fed3e9f4e9e15ca33008f1b55584743...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
63	e1aa4834bce7d8b7a0c8d761eddcc944514809e61f6c27...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
74	056dab0d794c00ff3ed10dffb56d7bf4702c2adbf60700...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
...	...	...	...	...	...	...	...	...	...	...	...	...
26950	be87c653d51bc7d5712d603efbb0758059e643f720e890...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
26962	dbb921c1ba8c326ae976ef64624fa075c289228c7e391a...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
26976	31141c7965263b6edb33ca7c98d5dae70b96fb90d0f60f...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
26990	6ab052be140ec8ec56f40c6e1e2c95bfff832c2bf58da1...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
26992	f763e9f3fbd4120a465a58e8f3191c5d2a66098079d5be...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN

1000 rows × 12 columns

I am unable to discern a pattern in which entries are missing, or infer/impute any data due to the records being entirely blank. As such these entries will be dropped for the analysis.

2.3.2 Dataset Summary

The dataset comprises six csv files of street-level crime data between April and September 2020 (inclusive) from the West Yorkshire police. Cumulatively the data consists of 158,898 records, with the following 12 columns: Crime ID, Month, Reported by, Falls within, Longitude, Latitude, Location, LSOA code, LSOA name, Crime type, Last outcome category and Context.

3. Data Cleaning

# Drop entries with "exclusive" crime types - as discussed above
crimesDF.drop(crimesDF.loc[crimesDF['Crime type']=="Exclusive"].index, inplace=True)

# Drop entries with null crime type
crimesDF.drop(crimesDF.loc[crimesDF.isna()], inplace=True)

# Convert dtypes
crimesDF = crimesDF.convert_dtypes()

# Convert month to datetime for ease of plotting
crimesDF["Month"] = pd.to_datetime(crimesDF["Month"], format="%Y-%m")

4. Plotting

4.1 Crime Type

4.1.1 Plot overall crime each month

# Create a frame with the counts for each crime grouped by month and type
crimeTypeCountsDF = crimesDF.groupby(["Month End", "Crime type"]).size().reset_index()
crimeTypeCountsDF = crimeTypeCountsDF.rename(columns={0: "Count"})

# Selection of nice colour based on the those used by OWiD from: https://gist.github.com/Digma/b91db287f8f577fae41c406892d46b15#file-ourworldindata_ghg_pandas-py

COLOR_SCALE = [
    "#6D3E91", "#C05917", "#58AC8C", "#286BBB", "#883039", "#BC8E5A", "#00295B", "#C15065", 
    "#18470F", "#9A5129", "#E56E5A", "#A2559C", "#38AABA", "#578145", "#970046", "#00847E", 
    "#B13507", "#4C6A9C", "#CF0A66", "#00875E", "#B16214", "#8C4569", "#3B8E1D", "#D73C50"
]

# Function to add labels to the figure

def addLabel(figure: plt.Figure, text: str, xPos:float = dt.datetime(year=2020, month=9, day=30), yPos:float = 0.5, ax: float = 45, ay: float = 0) -> None:
   figure.add_annotation(x=xPos, y=yPos, text=text, showarrow=True, arrowhead=0, xanchor="left", ax=ax, ay=ay, arrowcolor="#4d4d4d", arrowwidth=1)

fig = px.area(crimeTypeCountsDF, x="Month End", y="Count",
              line_group="Crime type",
              color="Crime type", 
              category_orders={"Crime type": crimesDF["Crime type"].value_counts().index},
              markers=True,
              color_discrete_sequence=COLOR_SCALE,
              template="seaborn",
              title="Count of Each Crime Type Per Month")

# NB: These labels have hard-coded positions and should be removed/amended if using different data
yScale = 26000
addLabel(fig, "Violence and<br>sexual offences",   yPos=0.200*yScale, ay=32)
addLabel(fig, "Anti-social<br>behaviour",          yPos=0.440*yScale, ay=38)
addLabel(fig, "Public order",                      yPos=0.590*yScale, ay=32)
addLabel(fig, "Criminal damage<br> and arson",         yPos=0.690*yScale, ay=32)
addLabel(fig, "Other theft",                       yPos=0.765*yScale, ay=32)
addLabel(fig, "Burglary",                          yPos=0.817*yScale, ay=28)
addLabel(fig, "Vehicle crime",                     yPos=0.864*yScale, ay=22)
addLabel(fig, "Shoplifting",                       yPos=0.905*yScale, ay=18)
addLabel(fig, "Drugs",                             yPos=0.932*yScale, ay=11)
addLabel(fig, "Other crime",                       yPos=0.959*yScale, ay= 7)
addLabel(fig, "Robbery",                           yPos=0.972*yScale, ay=-1)
addLabel(fig, "Possession of weapons",             yPos=0.985*yScale, ay=-10)
addLabel(fig, "Bicycle theft",                     yPos=0.990*yScale, ay=-22)
addLabel(fig, "Theft from the person",             yPos=1.000*yScale, ay=-33)

fig.update_layout(
    height=500,
   showlegend=False, # Hide the legend when using the hard-coded labels. Remove this for an auto generated legend.
    margin=dict(l=0, r=10, t=40, b=0),
)

fig.update_xaxes(title="Date", range=[pd.Timestamp('2020-04-20'), pd.Timestamp('2020-11-10')], tickformat="%d %B<br>%Y", ticks="outside", showgrid=False)
fig.update_yaxes(ticks="outside", col=1)

fig.layout.xaxis.fixedrange = True
fig.layout.yaxis.fixedrange = True

fig_html = to_html(fig, include_plotlyjs=False, full_html=False, div_id="countGraphCrimeTypesPlot")
# print(fig_html)
fig.show()

This plot shows an unusually rapid increase then decrease in the number of crimes over a short period. Given that this data is between April 2020 and August 2020 this striking change can likely be attributed to the national COVID-19 lockdown which began on 23 March 2020, causing a depression in the rate and a gradual increase as restrictions eased - before then decreasing during the second lockdown in September.

To examine the link between these factors, it will be useful to plot the data alongside the dates of key lockdown rule changes.

4.1.2 Plot overall crime each month noting key lockdown regulation changes

fig = px.area(crimeTypeCountsDF, x="Month End", y="Count",
              line_group="Crime type",
              color="Crime type", 
              category_orders={"Crime type": crimesDF["Crime type"].value_counts().index},
              markers=True,
              color_discrete_sequence=COLOR_SCALE,
              template="seaborn",
              title="Count of Each Crime Type Per Month")

# set showlegend property by name of trace
for trace in fig['data']: 
    if(trace['name'] != 'B'): trace['showlegend'] = False

# NB: These labels have hard-coded positions and should be removed/amended if using different data
yScale = 26000
addLabel(fig, "Violence and<br>sexual offences",   yPos=0.200*yScale, ay=32)
addLabel(fig, "Anti-social<br>behaviour",          yPos=0.440*yScale, ay=38)
addLabel(fig, "Public order",                      yPos=0.590*yScale, ay=32)
addLabel(fig, "Criminal damage<br> and arson",     yPos=0.690*yScale, ay=32)
addLabel(fig, "Other theft",                       yPos=0.765*yScale, ay=32)
addLabel(fig, "Burglary",                          yPos=0.817*yScale, ay=28)
addLabel(fig, "Vehicle crime",                     yPos=0.864*yScale, ay=22)
addLabel(fig, "Shoplifting",                       yPos=0.905*yScale, ay=18)
addLabel(fig, "Drugs",                             yPos=0.932*yScale, ay=11)
addLabel(fig, "Other crime",                       yPos=0.959*yScale, ay= 7)
addLabel(fig, "Robbery",                           yPos=0.972*yScale, ay=-1)
addLabel(fig, "Possession of weapons",             yPos=0.985*yScale, ay=-10)
addLabel(fig, "Bicycle theft",                     yPos=0.990*yScale, ay=-22)
addLabel(fig, "Theft from the person",             yPos=1.000*yScale, ay=-33)


# Add lines for covid rule dates
fig.add_vline(x=pd.Timestamp('2020-03-23'), line_color="#bd6f51", line_dash="dash", line_width=3, opacity=1.0, showlegend=True, label=dict(
        text="National<br>lockdown<br>begins",
        textposition="end",
        yanchor="top",
        textangle=0,
        padding=10,
        xanchor="right"
    ))

fig.add_vline(x=pd.Timestamp('2020-04-30'), line_color="#B13507", line_dash="dash", line_width=3, opacity=1.0, showlegend=True, label=dict(
        text="PM says<br>“we are past<br>the peak”<br>of the<br>pandemic",
        textposition="end",
        yanchor="top",
        textangle=0,
        padding=10,
        xanchor="right"
    ))

fig.add_vline(x=pd.Timestamp('2020-05-10'), line_color="#984976", line_dash="dash", line_width=3, opacity=1.0, showlegend=True, label=dict(
        text="PM calls<br>for return<br>to work",
        textposition="end",
        yanchor="top",
        textangle=0,
        padding=10,
        xanchor="left"
    ))

fig.add_vline(x=pd.Timestamp('2020-06-01'), line_color="#5e6c8e", line_dash="dash", line_width=3, opacity=1.0, showlegend=True, label=dict(
        text="Phased<br>school<br>return",
        textposition="end",
        yanchor="top",
        textangle=0,
        padding=10,
        xanchor="left"
    ))

fig.add_vline(x=pd.Timestamp('2020-06-15'), line_color="#764c9d", line_dash="dash", line_width=3, opacity=1.0, showlegend=True, label=dict(
        text="Retail<br>reopens",
        textposition="end",
        yanchor="top",
        textangle=0,
        padding=10,
        xanchor="left"
    ))

fig.add_vline(x=pd.Timestamp('2020-09-14'), line_color="#00295B", line_dash="dash", line_width=3, opacity=1.0, showlegend=True, label=dict(
        text="Gatherings<br>above six<br>banned",
        textposition="end",
        yanchor="top",
        textangle=0,
        padding=10,
        xanchor="right"
    ))

fig.add_vline(x=pd.Timestamp('2020-09-22'), line_color="#883039", line_dash="dash", line_width=3, opacity=1.0, showlegend=True, label=dict(
        text="PM announces<br>return to working<br>from home",
        textposition="end",
        yanchor="top",
        textangle=0,
        padding=10,
        xanchor="left"
    ))


fig.update_layout(
    showlegend=False,
    height=500,
    margin=dict(l=0, r=0, t=40, b=0),
    legend_title="Lockdown<br>Regulation Changes"
)

fig.update_xaxes(title="Date", range=[pd.Timestamp('2020-03-01'), pd.Timestamp('2020-11-20')], tickformat="%d %B<br>%Y", ticks="outside", showgrid=False)
fig.update_yaxes(range=[0, 35e3], ticks="outside", col=1)

fig.layout.xaxis.fixedrange = True
fig.layout.yaxis.fixedrange = True

fig_html = to_html(fig, include_plotlyjs=False, full_html=False, div_id="countGraphWithCovidData")
# print(fig_html)

fig.show()

To get a more specific view of how each crime's prevelance is increasing/decreasing it is helpful to plot the month-on-month changes on another graph.

4.1.3 Month-On-Month Changes

# Group crime types and compute the month-on-month change and percentage change
crimeTypeCountsDF["Change"] = crimeTypeCountsDF.groupby("Crime type")["Count"].diff()
crimeTypeCountsDF["Change Proportion"] = crimeTypeCountsDF.groupby("Crime type")["Count"].pct_change()

crimeTypeCountsDF = crimeTypeCountsDF.replace(np.nan, 0) #Replace initial values with 0

# Plot the month-on-month changes in crime counts 

fig = px.line(crimeTypeCountsDF, x="Month End", y="Change Proportion",
              line_group="Crime type",
              color="Crime type", 
              category_orders={"Crime type": crimesDF["Crime type"].value_counts().index},
              markers=True,
              color_discrete_sequence=COLOR_SCALE,
              template="seaborn",
              title="Absolute Change in Crime Counts Over Time; Grouped by Crime Type")

fig.update_layout(
    height=500,
    margin=dict(l=0, r=10, t=40, b=0),
    yaxis_title="Change in crime count<br>relative to previous month"
)

fig.add_hline(
    y=0,
    line_color="black",
    line_width=1,
    line_dash="solid",
    opacity=0.5
)

fig.update_xaxes(ticks="outside", showgrid=False)
fig.update_yaxes(tickformat=".0%", dtick=0.05, ticks="outside", col=1)

fig.update_traces(opacity=1.0, selector=dict(type='scatter'))

# Set crime types to be shown initially
visible_traces = {"Violence and sexual offences", "Anti-social behaviour", "Public order", "Criminal damage and arson"}
for trace in fig.data:
    trace.visible = True if trace.name in visible_traces else "legendonly"

# When you click on a trace in the legend, show/hide it
fig.update_layout(legend_itemclick="toggle", legend_itemdoubleclick="toggleothers")

# Prevent cropping/moving figure when clicking on the plot
fig.layout.xaxis.fixedrange = True
fig.layout.yaxis.fixedrange = True

fig_html = to_html(fig, include_plotlyjs=False, full_html=False, div_id="absoluteChangeCrimeGraph")
# print(fig_html)

fig.show()

# Plot the proportional month-on-month changes in crime counts 

fig = px.line(crimeTypeCountsDF, x="Month End", y="Change Proportion",
              line_group="Crime type",
              color="Crime type", 
              category_orders={"Crime type": crimesDF["Crime type"].value_counts().index},
              markers=True,
              color_discrete_sequence=COLOR_SCALE,
              template="seaborn",
              title="Proportional Change in Crime Counts Over Time; Grouped by Crime Type")

fig.update_layout(
    height=500,
    margin=dict(l=0, r=10, t=40, b=0),
    yaxis_title="Percentage change in crime count<br>relative to previous month"
)

fig.add_hline(
    y=0,
    line_color="black",
    line_width=1,
    line_dash="solid",
    opacity=0.5
)

fig.update_xaxes(ticks="outside", showgrid=False)
fig.update_yaxes(tickformat=".0%", dtick=0.05, ticks="outside", col=1)

fig.update_traces(opacity=1.0, selector=dict(type='scatter'))

# Set crime types to be shown initially
visible_traces = {"Shoplifting", "Anti-social behaviour", "Public order", "Theft from the person"}
for trace in fig.data:
    trace.visible = True if trace.name in visible_traces else "legendonly"

# When you click on a trace in the legend, show/hide it
fig.update_layout(legend_itemclick="toggle", legend_itemdoubleclick="toggleothers")

# Prevent cropping/moving figure when clicking on the plot
fig.layout.xaxis.fixedrange = True
fig.layout.yaxis.fixedrange = True

fig_html = to_html(fig, include_plotlyjs=False, full_html=False, div_id="proportionalChangeCrimeGraph")
# print(fig_html)

fig.show()

Conclusion

Over the 6 month period, the counts of the each type of crime in the dataset increased during June and July as COVID-19 lockdown restrictions were eased before falling again as measures came back into effect in September. Shoplifting and theft saw their largest increase (of roughly 40\%) in July as retail reopened on July 15th, while ``anti-social behaviour" and ``violence and sexual offences" had their largest increases in June and July respectively. The data also has some unusual features in the crime type column - namely two months featuring an unexpected pattern of exactly 1000 blank entries, as well as an undocumented "exclusive" crime type occurring roughly 20 times each month.

It is also worth noting that this data only represents crimes which the police were aware of and documented. It is therefore useful to consider to what extent these trends are caused by actual changes in the amount of crime committed, and what is caused by fewer crimes getting spotted, reported and documented by the police. The effect of unreported crimes over this period will vary by crime type. For instance, shoplifting can only occur in stores, and as such is expected to fall as shops close, whereas, violent and sexual crimes can occur anywhere and will be harder to detect - and harder for victims to report - when they take place in the home. With more time, quantifying the effects of under-reporting by cross-referencing with other data sources would be a key area of focus, in addition to including, and comparing with, data from other years.

Extras

This sections contains additional analysis which, while not pertinant to the task outlined, caught my interest and curiosity.

Latitude and Longitude

sns.violinplot(crimesDF["Latitude"], orient="h", width=0.9, gridsize=1000, linewidth=0.5);

The latitudes and longitudes are extremely skewed with many values exceeding the bounds of the UK let alone West Yorkshire.

UK Long range: -8.23 <-> 1.75 (NI to Norwich)
UK Lat range: 49.16 <-> 62.28 (Faroe to Jersey)

West Yorkshire Long range: -2.21 <-> -1.09
West Yorkshire Lat range: 54 <-> 53.5

# Check the ranges of the entries outside the UK
locDF_non_UK.max(), locDF_non_UK.min()

(Latitude     99.528496
Longitude     98.15016
dtype: Float64,
Latitude    -96.415042
Longitude   -99.342253
dtype: Float64)

# Get a frame with all non-null latitude and longitude coordinates 
locDF = crimesDF[["Latitude", "Longitude"]]

# Locate all entries which are within the expected bounds of West Yorkshire
locDF_Yorkshire = locDF.loc[(locDF["Latitude"] <= 54) & (locDF["Latitude"] >= 53) & (locDF["Longitude"] >= -2.21) & (locDF["Longitude"] <= -1.1)]

# Locate all entries which are roughly within the bounds of the UK
locDF_UK = locDF.loc[(locDF["Latitude"] <= 62) & (locDF["Latitude"] >= 49) & (locDF["Longitude"] >= -8.2) & (locDF["Longitude"] <= 1.75)]

# Locate all entries which are beyond the UK
locDF_non_UK = locDF.loc[~((locDF["Latitude"] <= 62) & (locDF["Latitude"] >= 49) & (locDF["Longitude"] >= -8.2) & (locDF["Longitude"] <= 1.75))]

# View the points outside the UK on a world map

fig = px.scatter_map(locDF_non_UK, lat="Latitude", lon="Longitude",
                           center=dict(lat=df["Latitude"].mean(), lon=df["Longitude"].mean()),
                           zoom=2,
                           opacity=1,
                           map_style="open-street-map",
                           )

# fig.show()
plot(fig, auto_open=True)

# Check for points which are within the UK but outside of West Yorkshire

df_all = locDF_UK.merge(locDF_Yorkshire.drop_duplicates(), on=['Latitude','Longitude'], 
                      how='left', indicator=True)

ukNotYorkshireDF = df_all.loc[df_all['_merge'] == 'left_only']
print(ukNotYorkshireDF["Latitude"].max(), ukNotYorkshireDF["Latitude"].min())
print(ukNotYorkshireDF["Longitude"].max(), ukNotYorkshireDF["Longitude"].min())
ukNotYorkshireDF

54.821672 54.089975
-0.891841 -1.598217

	Latitude	Longitude	_merge
9018	54.821672	-1.598217	left_only
9019	54.821672	-1.598217	left_only
9020	54.821672	-1.598217	left_only
9021	54.821672	-1.598217	left_only
9022	54.821672	-1.598217	left_only
9023	54.821672	-1.598217	left_only
9024	54.821672	-1.598217	left_only
9025	54.821672	-1.598217	left_only
9026	54.821672	-1.598217	left_only
35925	54.33734	-1.42805	left_only
50451	54.089975	-0.891841	left_only
63431	54.821672	-1.598217	left_only
89414	54.821672	-1.598217	left_only
89415	54.821672	-1.598217	left_only
89416	54.821672	-1.598217	left_only
89417	54.821672	-1.598217	left_only
89418	54.821672	-1.598217	left_only
89419	54.821672	-1.598217	left_only
89420	54.821672	-1.598217	left_only
89421	54.821672	-1.598217	left_only
89422	54.821672	-1.598217	left_only
89423	54.821672	-1.598217	left_only
89424	54.821672	-1.598217	left_only
89425	54.821672	-1.598217	left_only
89426	54.821672	-1.598217	left_only
89427	54.821672	-1.598217	left_only
89428	54.821672	-1.598217	left_only
89429	54.821672	-1.598217	left_only
116400	54.821672	-1.598217	left_only
116401	54.33734	-1.42805	left_only
116402	54.181446	-1.457476	left_only
138841	54.308725	-1.566156	left_only

fig = px.density_map(latLongDF_UK, lat="Latitude", lon="Longitude", z=None,
                           radius=7,
                           center=dict(lat=df["Latitude"].mean(), lon=df["Longitude"].mean()),
                           zoom=10,
                           opacity=0.5,
                           range_color=[0,len(latLongDF)*9e-5],
                           map_style="open-street-map",
                           hover_data=None)

# fig.show()
plot(fig, auto_open=True)

While this is a somewhat interesting plot to look at, as crime only occurs where there are people to commit it, the data is mostly just showing the distribution of population.

Missing Crime IDs

# Show crimes with missing IDs

crimesDF.loc[crimesDF["Crime ID"].isna()]

	Crime ID	Month	Reported by	Falls within	Longitude	Latitude	Location	LSOA code	LSOA name	Crime type	Last outcome category	Context
2	NaN	2020-06	West Yorkshire Police	West Yorkshire Police	-1.873004	53.941724	On or near Cornerstones Close	E01010646	Bradford 001A	Anti-social behaviour	NaN	NaN
3	NaN	2020-06	West Yorkshire Police	West Yorkshire Police	-1.882481	53.924936	On or near Moorside Lane	E01010646	Bradford 001A	Anti-social behaviour	NaN	NaN
4	NaN	2020-06	West Yorkshire Police	West Yorkshire Police	-1.873004	53.941724	On or near Cornerstones Close	E01010646	Bradford 001A	Anti-social behaviour	NaN	NaN
11	NaN	2020-06	West Yorkshire Police	West Yorkshire Police	-1.890771	53.946029	On or near Green Lane	E01010648	Bradford 001C	Anti-social behaviour	NaN	NaN
16	NaN	2020-06	West Yorkshire Police	West Yorkshire Police	-1.828609	53.920224	On or near Queen'S Gardens	E01010692	Bradford 001D	Anti-social behaviour	NaN	NaN
...	...	...	...	...	...	...	...	...	...	...	...	...
21420	NaN	2020-04	NaN	West Yorkshire Police	NaN	NaN	No Location	NaN	NaN	Anti-social behaviour	NaN	NaN
21421	NaN	2020-04	NaN	West Yorkshire Police	NaN	NaN	No Location	NaN	NaN	Anti-social behaviour	NaN	NaN
21422	NaN	2020-04	NaN	West Yorkshire Police	NaN	NaN	No Location	NaN	NaN	Anti-social behaviour	NaN	NaN
21423	NaN	2020-04	NaN	West Yorkshire Police	NaN	NaN	No Location	NaN	NaN	Anti-social behaviour	NaN	NaN
21424	NaN	2020-04	NaN	West Yorkshire Police	NaN	NaN	No Location	NaN	NaN	Anti-social behaviour	NaN	NaN

29689 rows × 12 columns

len(crimesDF.loc[crimesDF["Crime ID"].isna()]) / len(crimesDF)

0.18684313207214692

29689 entries have no crime ID which is ~18.7% of the records. A cursory glance at some of the data indicates that many of these

West Yorkshire Crime Analysis

Executive Summary

Crime Type