CompStat: A Major Advance in Police Management that Requires Forecasting

NYPD Police Commissioner William Bratton is credited with starting CompStat (Computer Statistics or Comparative Statistics) in 1994. Furthermore, the NYPD credits CompStat and related policies with the over 70% reduction in murders and large decreases in other major crimes in New York City.

CompStat is a police version of management by objectives. It is a monthly peer review, accountability, and problem-solving process in which precinct commanders review the past month's performance measures and discuss actions/plans for the coming month.

CompStat has short-term forecasting needs and requirements that are fueling the interest in crime forecasting and provide forecast specifications:

  • Short-term Hot Spot Forecasting - Needed are one-month-ahead forecasts for major crimes (so called part 1 crimes including homicide, aggravated assault, larceny, robbery, burglary, rape, and motor vehicle theft) in as small geographic areas as possible. Needed also is a GIS that can display forecasts as choropleth (color-shaded area) maps with drill down to relevant individual crime points and records.
  • Counterfactual Forecasts - Business-as-usual, extrapolative forecasts are needed for evaluating the most recent historical month's crimes. Was last month's crime level in a particular area significantly higher or lower? Has there been a pattern change or just outlier data point? This is an area where traditional forecast tracking signals (e.g., CUSUM) and prediction intervals have application.

Research Papers

  • From the Special Section on Crime Forecasting, International Journal of Forecasting, Volume 19, No. 4 (October-November 2003):

Gorr, W.L. and R. Harries, “Introduction to crime forecasting,” pp. 551-555.

This short paper introduces the six papers comprising the Special Section on Crime Forecasting. A longer title for the section could have been ‘‘Forecasting crime for policy and planning decisions and in support of tactical deployment of police resources.’’ Crime forecasting for police is relatively new. It has been made relevant by recent criminological theories, made possible by recent information technologies including geographic information systems (GIS), and made desirable because of innovative crime management practices. While focused primarily on the police component of the criminal justice system, the six papers provide a wide range of forecasting settings and models including UK and US jurisdictions, long- and short-term horizons, univariate and multivariate methods, and fixed boundary versus ad hoc spatial cluster areal units for the space and time series data. Furthermore, the papers include several innovations for forecast models, with many driven by unique features of the problem area and data.

  1. Harries, R., “Modelling and predicting recorded property crime trends in England and Wales – a retrospective,” pp. 557-566.

In 1999 the Home Office published, for the first time ever, 3-year ahead projections of property crime in England and Wales. The projections covered the period 1999–2001 and indicated strong upward pressure after five full years of falling crime. This pressure was generated by three factors: the number of young men in the general population, the state of the economy and the fact that property crime appeared to be well below its underlying trend level. The projections received a mixed response, with some agreeing that crime was set to rise while questioning the scale of any increase, to others who doubted the value of this type of econometric modeling. In fact, property crime did increase in 1999, although not at the rate suggested by the models—and indeed levels of burglary continued to fall. This paper addresses some of the reasons for this disparity as well as considering various criticisms of the Home Office approach.

  1. Deadman, D., “Forecasting residential burglary,” pp. 567-578.

Following the work of Dhiri et al. [Modeling and predicting property crime trends. Home Office Research Study 198 (1999). London: HMSO] at the Home Office predicting recorded burglary and theft for England and Wales to the year 2001, econometric and time series models were constructed for predicting recorded residential burglary to the same date. A comparison between the Home Office econometric predictions and the less alarming econometric predictions made in this paper identified the differences as stemming from the particular set of variables used in the models. However, the Home Office and one of our econometric models adopted an error correction form which appeared to be the main reason why these models predicted increases in burglary. To identify the role of error correction in these models, time series models were built for the purpose of comparison, all of which predicted substantially lower numbers of residential burglaries. The years 1998–2001 appeared to offer an opportunity to test the utility of error correction models in the analysis of criminal behavior. Subsequent to the forecasting exercise carried out in 1999, recorded outcomes have materialized, which point to the superiority of time series models compared to error correction models for the short-run forecasting of property crime. This result calls into question the concept of a long-run equilibrium relationship for crime.

  1. Gorr, W.L., A. Olligschlaeger, and Y. Thompson, “Short-term forecasting of crime,” pp. 579-594. >

The major question investigated is whether it is possible to accurately forecast selected crimes 1 month ahead in small areas, such as ¨ police precincts. In a case study of Pittsburgh, PA, we contrast the forecast accuracy of univariate time series models with naıve methods commonly used by police. A major result, expected for the small-scale data of this problem, is that average crime count by precinct is the major determinant of forecast accuracy. A fixed-effects regression model of absolute percent forecast error shows that such counts need to be on the order of 30 or more to achieve accuracy of 20% absolute forecast error or less. A second major result is that practically any model-based forecasting approach is vastly more accurate than current police practices. Holt exponential smoothing with monthly seasonality estimated using city-wide data is the most accurate forecast model for precinct-level crime series.

  1. Felson, M. and E. Poulsen, “Simple indicators of crime by time of day,” pp. 595-601.

Crime varies greatly by hour of day—more than by any other variable. Yet numbers of cases declines greatly when fragmented into hourly counts. Summary indicators are needed to conserve degrees of freedom, while making hourly information available for description and analysis. This paper describes some new indicators that summarize hour-of-day variations. A basic decision is to pick the first hour of the day, after which summary indicators are easily defined. These include the median hour of crime, crime quartile minutes, crime’s daily timespan, and the 5-to-5 share of criminal activity; namely, that occurring between 5:00 AM and 4:59 PM. Each summary indicator conserves cases while offering something suitable to forecast.

  1. Liu, H. and D.E. Brown, “Criminal incident prediction using a point-patterb-based density model,” pp. 605-622.

Law enforcement agencies need crime forecasts to support their tactical operations; namely, predicted crime locations for next week based on data from the previous week. Current practice simply assumes that spatial clusters of crimes or ‘‘hot spots’’ observed in the previous week will persist to the next week. This paper introduces a multivariate prediction model for hot spots that relates the features in an area to the predicted occurrence of crimes through the preference structure of criminals. We use a point-pattern-based transition density model for space–time event prediction that relies on criminal preference discovery as observed in the features chosen for past crimes. The resultant model outperforms the current practices, as demonstrated statistically by an application to breaking and entering incidents in Richmond, VA.

  1. Corcoran, J.J., I.D. Wilson, and A. Ware, “Predicting the geo-temporal variations of crime and disorder,” pp. 624-634.

Traditional police boundaries—precincts, patrol districts, etc.—often fail to reflect the true distribution of criminal activity and thus do little to assist in the optimal allocation of police resources. This paper introduces methods for crime incident forecasting by focusing upon geographical areas of concern that transcend traditional policing boundaries. The computerized procedure utilizes a geographical crime incidence-scanning algorithm to identify clusters with relatively high levels of crime (hot spots). These clusters provide sufficient data for training artificial neural networks (ANNs) capable of modeling trends within them. The approach to ANN specification and estimation is enhanced by application of a novel and noteworthy approach, the Gamma test (GT).

Return to top of page

Working Papers and Technical Reports

Cohen, J., C. K. Durso, and W. L. Gorr, "Estimation of crime seasonality: a cross-sectional extension to time series classical decomposition," Heinz School Working Paper, 2003-18.

Reliable estimates of crime seasonality are valuable for law enforcement and crime prevention. Seasonality affects many police decisions from long-term reallocation of uniformed officers across precincts to short-term targeting of patrols for hot spots and serial criminals. This paper shows that crime seasonality is a small-scale, neighborhood-level phenomenon. In contrast, the vast literature on crime seasonality has almost exclusively examined crime data aggregations at the city or even larger scales. Spatial heterogeneity of crime seasonality, however, often gives rise to opposing seasonal patterns in different kinds of neighborhoods, canceling out seasonality at the city-wide level. Thus past estimates of crime seasonality have vastly underestimated the magnitude and impact of the phenomenon. We present a model for crime seasonality that extends classical decomposition of time series based on a multivariate, cross-sectional, fixed-effects model…

Gorr, W.L. and A.M. Olligschlaeger, Final project report crime hot spot forecasting: modeling and comparative evaluation, NIJ Grant 98-IJ-CX-K005, 2002.

This report is a detailed summary of early work on time-series-based crime forecasting, based on Pittsburgh, Pennsylvania crime data. It provides a comprehensive test of univariate and multivariate time series methods for one-month-ahead crime forecasts for use in COMPSTAT meetings or other organizational contexts for tactical deployment of police resources. Results indicate that seasonality and time-space-lagged leading indicators play important roles in accurately forecasting crime. The strongest determinant of high forecast accuracy is the average crime volume in individual time series for univariate methods, with at least 35 or more crime needed per month.


Olligschlager's Dissertation

One of the factors leading to increased attention to crime forecasting in the U.S. was the completion of Andreas M. Olligschlaeger's seminal dissertation in 1997, Spatial Analysis of Crime Using GIS-Based Data: Weighted Spatial Adaptive Filtering and Chaotic Cellular Forecasting with Applications to Street Level Drug Markets.

Presentations by Olligschlaeger at the second and third Crime Mapping Research Conferences held by the National Institute of Justice showed that short-term, leading indicator models could forecast crime in small areas with reasonable accuracy. About this time, police departments across the country were having big successes in mapping real-time crime data, and were thus primed for the next step of one-month-ahead crime forecasts.

Olligschlaeger used a good experimental design, a large crime space/time data sample, spatial data processing using a geographic information system based on uniform grid cells covering a police jurisdiction, and comparison of alternative methods including simple and advanced.

Crime Data for Download

Available as downloads from this Web page are crime space and time series data for several crimes in Pittsburgh, Pennsylvania and Rochester, New York. Also available are corresponding contiguity matrices (for use in calculating spatial statistics) and GIS map layers. These are the data sets that were used in the research reported in Cohen, & Gorr (2005), Final Report: Development of Crime Forecasting and Mapping Systems for Use by Police, National Institute of Justice Grant 2001-IJ-CX-0018. By agreement with the Pittsburgh Bureau of Police, the Rochester Police Department, and the Carnegie Mellon University Internal Review Board, we are not allowed to release the individual report point data. All that can be made available are the aggregate crime space and time series data.

Click the following links to download documentation:

The data are unadjusted (for days per month) monthly time series for four geographies: 1990 census tracts, police car beats, an aggregation of police car beats called beats plus which are smaller than precincts, and police precincts (also an aggregation of car beats). We refer to an individual tract, car beat, etc. as a district. There are eight multivariate time series tables, stored as comma separated value (.csv) files, one for each geography and city. Click links below to download corresponding data sets:

PghCarBeat.csv

RochCarBeats.csv

PghCarBeatPlus.csv

RochCarBeatsPlus.csv

PghPrecincts.csv

RochPrecincts.csv

PghTracts.csv

RochTracts.csv

 

 

where Pgh = Pittsburgh and Roch = Rochester.

The rows are monthly observations by district. The columns are crime counts by month and district. Variable names are in the first rows of tables and include district, year, month, and several crime types. Crimes are counts of police offense reports, except for C_Drugs and C_Shots in the Pittsburgh data sets which are counts of 911 calls for service with duplicate calls for the same incidents removed.

Included are zipped GIS folders for Pittsburgh

Primer on Police Data

  • Computer Aided Dispatch (CAD) or 911 Call-for-Service Data- This data is primarily from citizen complaints about crimes or disturbances; however, CAD calls are officer initiated; for example, when an officer sees a crime being committed while on patrol. Generally, all police events start with a CAD data record and ID number. A problem with this kind of data is that CAD calls are, for the most part, perceptions from untrained observers and sometimes citizens distort calls to get faster service(for instance, report a more severe type of incident than actually occurred). So, police often view individual CAD data points as being unreliable measures. Nevertheless, CAD data is more representative of the volume and extent of crimes that do not have victims, such as drug dealing, prostitution, and gambling than offense or arrest records. CAD data also provide some excellent leading indicator variables such as shots fired reports and various loitering and public disturbances.

     

  • Offense Report Data- When an officer believes that a crime has been committed, he/she should write an offense report giving the crime type (or all crime types if multiple offenses were committed), the address, date and time (or date and time interval), and many other variables. Offense data are the best indicator of crimes with victims such as homicide, robbery, aggravated assault, burglary, larceny, motor vehicle theft, etc. Generally statistics from offense reports are thought to under-represent the true levels of crime; for example, police might not report crimes with low solvability factors to keep case closure rates high. Also, victims sometimes do not report crimes such a rapes.

     

  • Arrest Report Data- When a suspect is arrested, a report is filed relating back to an offense report and giving data on the crimes committed, the arrested person, arrest location, etc.

     

  • Special Event Data- These are discrete events that are generally known ahead of time, such as sporting events, concerts, etc. that are associated with increased crime levels. While not collected systematically, they should be. Furthermore they should be incorporated in forecasts.

     

Crime Model Variables

  • Dependent Variables - are crime counts per unit time and observation unit; for example, burglaries per month in a particular car beat, census tratc, or grid cell. In areas with few crimes, such variables are often Poisson distributed. In high crime areas, these variables can be treated as continuous. The map below from Chapter 3 of CrimeMapTutorialis an example with 2,000 foot grid cells. (Our research has shown that this grid size is too small, and that 4,000 is about as small as grid cells can be for a place like Rochester, NY.)

     

  • Leading Indicator Variables - Our researchhas found some success in using CAD data and lesser crimes as leading indicators for serious crimes. These include CAD data such as on shots fired, public disturbances, and prostitution and lesser offenses such as simple assaults and trespassing. We have lagged grid counts of such indicators over time by one month and over space from contiguous grid cells. Follow-on research with these variables has used car beats and census tracts instead of grid cells.

     

  • Causal Variable Fixed Effects - These variables do not vary over time, but only across space. When building multivariate forecast models that include observations across space and over time, the results often exhibit severe spatial heteroscedasticity. In other words, the fitted model does not pass through the center of the crime count data cloud, but there are areas that are consistently fitted too low or too high. Including fixed effects on crime potential, with variables on socio-economic status and land uses, is a remedy. These include census variables for populations that have low human capital and family status, populations that are crime age, etc. Another source of data are electronic yellow pages that include commercial sites of various kinds; for example, bars, check cashing businesses, retail stores, restaurants, etc., and street addresses. See Cohen, J., C.K. Durso, and W.L. Gorr, "Estimation of Crime Seasonality: A Cross-Sectional Extension to Time Series Classical Decomposition," Heinz School Working paper 2003-18, August 2003 for a working paper that develops causal fixed effects variables for crime forecasting.

     

  • Dummy Variable Fixed Effects- It is possible to include a dummy variable for each grid cell, except one that is suppressed for estimation purposes. These variables do a good, albeit, non-insightful job of reducing spatial heterogeneity.

     

  • Seasonal Dummy Variables- Seasonal dummies, for example, for each month except one that is suppressed, are often important components of crime forecast models. These can have multiplicative or additive forms.

     

Spatial Data Processing

There are several steps to spatial and data aggregation processing of crime forecast model variables. See Chapter 2 of CrimeMapTutorial. For police data, these include:

  1. Preprocessing or cleaning address data- includes standardizing addresses to eliminate irrelevant text (like apartment number or notes like "rear of store"), replace place names such as "Carnegie Mellon University" with a street address like "5000 Forbes Ave", standardizing connectors for street intersections such as the "&" in "Craig St & Forbes Ave", and various other data cleaning steps. Many of these steps are included as parts of geographic information systems (GIS).

     

  2. Address matching incident location data- uses sophisticated matching algorithms in GIS packages with street centerline maps to transform street addresses into map coordinates.

     

  3. Spatial overlay of incident points - Once data are address matched and exist as mapped points, it is possible to use spatial processing to assign correct area identifiers like zip code, census tract, police car beat, or grid cell number. This step allows data aggregation into crime counts by geographic area and time interval.

 

Kate J. Bowers, Shane D. Johnson, Ken Pease, “Prospective Hot-Spotting,”
Jill Dando Institute of Crime Science, University College London, Third Floor, 1 Old Street London, EC1V 9HL.

Recent research conducted by the authors demonstrates that the risk of burglary is communicable, with properties within 400 meters of a burgled household being at a significantly elevated risk of victimization for up to two months after an initial event. We discuss how, using this knowledge, recorded crime data can be analyzed to generate an ever-changing prospective risk surface…>

Full text of six papers from Special Section on Crime Forecasting, International Journal of Forecasting, Volume 19, No. 4 (October-November 2003).

The six papers comprising the special section represent the first effort to establish this new application area. For over 30 years, businesses have forecasted product demand to improve planning and increase efficiency. It was not until recently, however, that advances in crime theories and widespread diffusion of IT and management innovations in police departments made forecasting feasible and relevant for police. The special section addresses unique features and challenges of forecasting crime and presents corresponding modeling and methodological innovations.

Wired article, "Cloudy with a Chance of Theft," September 2003.

New Scientist article, "Computer Model Forecasts Crime Sprees," August 2003


Crime forecasting is an emerging application area for forecast research. While there have been isolated papers in the literature, it is only recently that there has been major interest and thus research programs in the area. This interest has been fueled by the availability of electronic police records for analysis, availability of geographic information systems ( (GIS) software and street maps for spatial data processing and display, advances in criminology for model specification, and advances in police management that place the focus on performance measures. Andreas M. Olligschlaeger's Ph.D. dissertation was seminal in opening the field. The National Institute of Justice, and in particular its Mapping and Analysis for Public Safety Program, has funded research grants in the U.S. and the UK's Home Office has had an active research program in the area. This research brings to bear many of the advances in the forecast literature, but also addresses unique aspects of the crime forecasting problem including those as follow.

  • Short-Term Crime Forecasting - has the requirement to forecast over space and time series data such as monthly crime levels across uniform, square grid cells within a city. The grid cells need to be as small as possible, less than a mile on a side, in order to support targeting patrols and other police interventions. In this setting, it is critical to manage the small-area estimation problem; namely, to find means to accurately estimate models based on small and therefore noisy data aggregates. Data pooling across grid cells, in some form, is necessary to improve accuracy. My feeling, based on results from our current research, is that multivariate models estimated across all grid cells, instead of univariate models for each grid cell, is perhaps the best approach.

  • Multivariate Crime Forecasting - both for the short and long-term, draws on the vast and fascinating criminology literature plus modeling approaches from the field of spatial econometrics. These literatures provide appealing theories for controlling fixed effects of place (i.e., crime patterns depend on the nature of local populations and land uses, both of which do not change rapidly over time), for incorporating spatial interactions (e.g., using spatial and time lags to represent crime displacement to nearby areas caused by a crack down on drug dealing), and for specifying leading indicators for use in short-term forecasting (a version of the "Broken Windows" theory suggests that "soft crimes" harden over time to become serious crimes).

Other issues and materials of interest on this site include:

Crime Data

A primer on standard police data is provided, as well as a discussion of variables and spatial data processing.

Police Management

NYPD Police Commissioner William Bratton is credited with starting CompStat (Computer Statistics or Comparative Statistics) in 1994. Furthermore, the NYPD credits CompStat and related policies with the over 70% reduction in murders and large decreases in other major crimes in New York City.

Tutorials (To be updated)

 

Software (To be added)

Crime Forecast Audit Tool (To be added)