USTAA: United States Traffic Accident Analysis

date:May 2020
team:3
tech:Bootstrap Template, HTML, JS, XAMPP (Apache Webserver, PHP), ORACLE RDBMS, SQL, C++

Introduction

USTAA is a web application that was built for my Database Design class as part of my BS in Computer Science from the University of Florida. Primarily a learning exercise, this project required a database application to be designed and implemented around a sufficiently large set of time-series data to facilitate construction of complex SQL queries.

USTAA dashboard view.
USTAA dashboard view.

See the system architecture diagram describing USTAA below. We used a Bootstrap Template to allow us to quickly build out our frontend dashboard and charts. We used XAMPP to easily deploy Apache Webserver with the PHP Module to processes our PHP scripts. OCI-8 allows our backend scripts to access the ORACLE database instance.

USTAA System Architecture Diagram
USTAA System Architecture Diagram.

See the Entity Relationship Model describing the USTAA database below. This model was primarily built around the data in "A Countrywide Traffic Accident Dataset" (Moosavj et al.)1 and extended with data from the United States Census Bureau. The USTAA database holds ~ 5.5 million records.

USTAA Entity Relationship Model
USTAA Entity Relationship Model.

My Contributions

ER Model and Database Schema Design

I worked with my teammates to develop the USTAA Entity Relationship Model and Schema Design. Our design process started with an intuitive solution based on our prior experience with relational databases. We then used techniques learned in class to normalize the model to the third normal form (3NF).

Database Schema Implementation

I wrote the SQL DDL scripts to transform our ER-Model into the actual relational schema of the USTAA database.

USTAA DDL script, left. USTAA data script, right.
USTAA database scripts. DDL script, left. Data script, right.

Data Preparation and Data Script Creation

I cleaned our data and created the SQL insertion scripts so that we could load them into the USTAA database. At ~ 5.5 million records, this was more challenging than I would have initially expected. The .csv file containing the Moosavj et al.1 data set was so large it couldn't be opened for easy review in Excel. I was able to use Bruce Martin's reCsvEdit to break the file into 6 part files for easier review.

I initially tried using Excel to clean the data and produce the necessary SQL insertion statements, however this proved impractical even with automatic calculation disabled. Trying to do this with cell formulas only resulted in Excel running for hours before eventually crashing. I ended up using C++ to clean the data and generate the SQL insertion statements I needed for the USTAA install scripts.

A screenshot of C++ data cleaning code. The "observations" function is being called from main.
C++ data cleaning excerpt. The collisionDetection map is passed into observations() to detect unique constraint errors in the input data. The observations() function consumes CSV data and outputs insertion statements for the "Observation" table of the USTAA database.
Backend SQL query for "Comparison of Traffic Events between Cities" custom query.
C++ data cleaning excerpt. Here the observations() function checks a line of input data for incomplete or duplicate primary keys, sets the value of missing fields to NULL, and then writes the corresponding insertion statement to the output string stream buffer.

Custom Reports Implementation

I implemented the custom reports functionality for USTAA allowing users to query the database more directly.

Demo of the "Comparison of Traffic Events between Cities" custom query.
Demo of the "Comparison of Traffic Events between Cities" custom query.

Each custom report consists of several input fields and a submission button. When the query is submitted, JQuery reads the input fields from the UI then makes an asynchronous request via ajax() to the custom reports PHP script.

Frontend asynchronous request code using JQuery for "Comparison of Traffic Events between Cities" custom query.
Frontend asynchronous request code using JQuery for "Comparison of Traffic Events between Cities" custom query, JavaScript.

This script identifies and runs the appropriate query against the USTAA database, then formats the output in JSON and passes the result back to the frontend. Once the response is received, JQuery is used to update the appropriate elements in the webpage.

Backend SQL query for "Comparison of Traffic Events between Cities" custom query.
Backend SQL query for "Comparison of Traffic Events between Cities" custom query, PHP.

Project Context and Relevance

USTAA was one of the first web development projects I ever completed. Prior to working on USTAA I had extensive experience with ORACLE DBMS in my position at Info Tech, Inc. where I routinely set up and managed ORACLE database instances for my product team. As a result, this project was a nice bridge from my prior experience in automated testing to database design and web development.

I also liked this project because it gave me a chance to develop an older style WAMP-like application. I had prior experience working with Apache Webserver and OpenSSL from Info Tech, Inc., but I had never developed a web application before this project. Working with JQuery helped me better understand the historical progression from the use of XHR, to libraries like JQuery, to promises, and finally to async/ await when implementing AJAX in web development.

Work Cited

  1. Moosavi, Sobhan, Mohammad Hossein Samavatian, Srinivasan Parthasarathy, and Rajiv Ramnath.“A Countrywide Traffic Accident Dataset.”, arXiv preprint arXiv: 1906.05409 (2019).
Back to portfolio.