Methodology | Geob479

Emerging Hotspot Analysis

For the emerging hotspot analysis, the city of Vancouver’s crime dataset from 2007 to 2017 were downloaded as shapefiles from the city’s Open Data Catalogue, and subsequently merged into a single layer via the ‘merge’ data management tool. Each year’s dataset was projected in the city of Vancouver’s default NAD 1983 10 N projection system, which was used as the common environment for all subsequent analyses. The dataset classified a number of types of crime, and point data for residential and commercial break-and-enters, theft of vehicles, theft from vehicles, thefts of bicycles, and ‘other theft’ (defined by the Vancouver Police Department’s data as a combination of personal property and bicycle theft) were then selected as the attributes for further analysis. The date field from the Vancouver crime dataset– which was initially provided in a numerical format– was then converted to a string using the field calculator, and subsequently converted into a time format (YYYYmmDD) with the ‘convert time field’ tool.

In order to conduct the hotspot analysis, a space time cube needed to first be created by aggregating points of the different types of property crime of interest, which was done via the ‘create space time cube by aggregating points’ tool. This data structure allows for the aggregation of points into space-time bins, which subsequently allows for the comparison of point data that occurred within a single cell through time. The three categories of property crime space time cubes were created for were all selected methods of crime together, combined residential and commercial break-and-enters, and vehicle thefts. No template cube was used, a time step interval of one year, an end time step alignment, and fishnet aggregation shapes were chosen for all analyses to better compare longer-term trends in the data and to reduce temporal bias.

Once the three space time cubes were created, the emerging hotspot analyses for the three categories of crime were completed by using the ‘emerging hotspot analysis’ tool. The analysis variable for the complete crime dataset, the residential and commercial break-and-enters and vehicle theft categories was simply the count of the point data, the default neighborhood distances calculated by the tool were 747, 740, and 731 metres and the distance intervals 146, 144, and 141 metres respectively. A polygon analysis mask created from a shapefile of the local area boundary was also used, so that the analyses only and did not include areas beyond the city where point data did not exist and could subsequently skew any results. Emerging hotspot analyses produces surfaces that identify hot and cold spots, as well as whether those spots are changing through time. The tool can classify spots into 17 different categories, and a precise definition of each of these categories can be found in the appendix (Table A1).

Maximum Entropy Crime Suitability

To begin the maximum entropy analysis it was first necessary to find and download appropriate source data for both the crime occurrence points and the socioeconomic rasters used as predictor variables. These layers were sourced from the City of Vancouver’s Open Data catalogue, and consisted first of the ‘crime’ layer.

In order to avoid overpopulating the map with records and swamp the model’s ability to determine the relative contribution of the variables introduced it was decided to only utilize the set of data from 2017 in this analysis, and from that year choose further five main types of crime to analyze: commercial break-and-enters, residential break-and-enters, ‘other’ theft, theft of vehicle, theft from vehicle and theft of bicycles. These particular types of crime were highlighted due to the relative explicability of the underlying criminal activity – they represent targeted crimes of opportunity that would likely have strong spatial correlates (such as property value for break-and-enters, proximity to bike lanes or rapid transit stations for bike theft). A ‘select by attributes’ command was used to create new layers from the larger ‘crime’ dataset for each category and saved within the project geodatabase.

After the initial crime sub-layers were created the predictor variables were then sourced, also from the City of Vancouver’s Open Data catalogue. Capturing socioeconomic factors into the model was considered a priority in developing the initial parameters and so two files were downloaded from the Open Data website – the ‘Property Tax Report’ file and ‘ICIS_GIS’ layer. The former represented a catalogue of properties within the city of Vancouver’s jurisdiction with information on zoning, dates of improvements and tax brackets, but most importantly BC Assessment values for the land and the buildings present on each site. These values together were identified as a valuable proxies for the socioeconomic status of each particular property parcel, and provided the opportunity to have a very granular view of the relative value of the properties where crime may occur. The two files shared a common ‘PID’ or property identifier attribute, sourced from BC Assessment records, that allowed the CSV records within the ‘Property Tax Report’ file to be spatially joined to the ‘ICIS_GIS’ file, which contained the necessary georeferenced polygons for each parcel. Before this could be accomplished the ‘Property Tax Report’ file land value and improvement value assessments were added together within Excel to create a new column called ‘TotalValue’. To save on extraneous processing time/power the PID column and the new ‘TotalValue’ were extracted from the larger CSV and placed in a new file, ‘PTR2’, for upload into ArcMap using the ‘Excel to Table’ function.

Once ‘PTR2’ was uploaded into ArcMap the attribute table was then edited and a ‘find and replace’ command used to remove the extraneous hyphens from the PID table so that the records contained therein could be matched to those of the ‘ICIS_GIS’ layer – this new layer was saved as a new file and called ‘PTR_Clean’. Once this was done a spatial join on the PID was performed, appending the ‘TotalValue’ attribute to the ‘ICIS_GIS’ layer. Unfortunately when this layer was displayed with symbology apportioned based on property value it appears as though many records for the properties in the ‘ICIS_GIS’ file were missing in the CSV ‘Property Tax Report’ file – with sufficiently many gaps in coverage present that the data had to be discarded for lack of usability.

The other socioeconomic/spatial variables included in the model were also sourced from the City of Vancouver – namely from the shapefiles ‘rapid_transit_stations’, ‘bikeways’, ‘homeless_shelters’, ‘park_polygons’, ‘public_streets’, ‘schools’ and ‘zoning_districts’. The ‘city_boundary’ shapefile was also uploaded from the Open Data catalogue as a convenient processing extent control metric for all that was to follow. These files were all uploaded to ArcMap and their common projection (NAD 1983 10 N) assigned as the common environment. The ‘zoning_district’ file was intended to have its qualitative land use types assigned specific values depending on socioeconomic status/density (ie ‘agricultural’ having a lower assigned value than ‘single family dwelling’), but a literature search indicated that there is a paucity of empirically derived methods for doing so. In light of this and in the interest of avoiding assignment that was unjustifiably arbitrary ‘zoning_district’ was discarded as a variable.

To begin, the ‘Euclidean distance’ calculator tool was used to calculate distances from ‘schools’, ‘rapid transit stations’, ‘bikeways’, ‘homeless shelters’ and ‘public streets’. Once this processing was complete each resultant file was reclassified from default binning to ‘geometrical interval’ to better represent the surfaces within the test environment. As Maxent requires its raster inputs to be in the ASCII format the ‘raster to ASCII tool’ was used to convert each of the above layers into .asc files, which were then stored in a common folder for Maxent to analyze.

After the required raster files were produced the crime data was prepared for export by using the ‘Add XY’ tool to add each point occurrence’s coordinates in latitude and longitude to the attribute table of the commercial break and enters, residential break and enters, ‘other’ theft, theft of vehicle, theft from vehicle and theft of bicycle layers. Each of these layers were then exported to Excel using the ‘table to Excel’ function, and the resultant .xls file converted again to .csv. Maxent requires only a title column and the latitude/longitude coordinates of each file for its analysis so all of the extraneous information was deleted and all of the completed .csv files were placed in a folder together for feeding into Maxent. Maxent was then downloaded and run, with the ‘environmental layers’ directory set to the folder containing all of the resultant rasters, allowing them all to be accessed together. Each set of sample occurrences was then chosen for the ‘samples’ and the program was run using the settings shown in Figure 1, producing graphical representations of the model output, response curves, omission/commission analyses and variable contribution analyses. The ASCII file outputs were uploaded into ArcMap and projected based upon the common NAD 1983 10 N projection system and are included within the results section and appendix.

Fig 1. MaxEnt model settings for suitability analyses of residential and commercial break-and-enters, thefts of vehicles, thefts from vehicles, thefts of bicycles, and 'other' property theft.

Spatiotemporal and Crime Suitability Analysis in the City of Vancouver

Graham Brownlee

GEOB 479