Government and Non-Emergency Services Data — EDA

İsmail Can Demirkan
3 min readDec 21, 2020

In this story, I will try to tell how I investigated the data of NYC 311 service. What is 311? It serves the public and handles all the requests for government and non-emergency services in New-York.

When I found the data first called 311.csv, I thought it will not be a tough one for EDA but it was not easy at all. My data contains 518K entries with 38 columns and it includes all the requests in 2014 June-July and August.

Since you need to make a methodology first, I first looked at how is the shape of the data, info of the data. Then which columns are the categorical values and what are the main complaint types. I’ve also checked if there are any missing data or nulls in the data using Pandas & Numpy. As a result, I tried to show complaints by borough, data visualization of complaints using Matplotlib & Seaborn.

These were my main columns that I need to investigate in details.

Columns of the Data

For every complaint, there is a creation and close date so I needed a new metric for every complaint to calculate average response time. First of all, I need to say it was hard for me to change the datetime format to date format. For each day, I pulled day-month-year and wrote to a new column. After this, it was quiet easy to calculate the other side of the data.

As a result from 518K entries, there are 189 different complaint types and the top complaint is <Noise Residential>

Unique Borough categories were; Brooklyn, Queens, Manhattan, Bronx, Staten Island, Other

Borough Brooklyn has the highest complaint count when it is compared to the others.

As a conclusion, Brooklyn and Queens has 50% of all complaints. Extra staff should be located to these boroughs. Most of the complaints is about noist situations. Avg.Response Time is calculated fro every complaint. Jult has the biggest colume of complaints. General construction complaints has the highest Avg.Response Time. Least frequent complaints can be ignored because of their total number like 1–2–3.

What future work can be done? 189 different complaints can be mapped into smaller categories. How many complaints can be predicted using Linear Regression for every region. Raw data can be expended for several years.

--

--