During my time at General Assembly I worked on a project with the goal of optimizing the use of pesticides to limit the occurrence of West Nile Virus in the city of Chicago. Optimization came in the form of predictions on the occurrence of West Nile Virus based on environmental factors in order to increase the efficacy of pesticides. This was a Kaggle competition so it required a lot of exploratory data analysis and data cleaning before one could move on to modeling and prediction.
The city of Chicago has several hundred mosquito traps that it examines to measure the amount of mosquitos in the area and check for the presence of West Nile Virus. Environmental data are gathered by two weather stations that collect information each day, which includes temperature, humidity, and precipitation.
During the course I had wanted to explore more powerful visualizations tools, so I have revisited the project and made an effort to explore the data using Bokeh. Below is a map of the mosquito trap locations and pesticide spray locations. Bokeh allows you to easily zoom and pan on interactive maps provided by Google. The visualizations below are served up by a Bokeh server running on Heroku. While this is overkill for the simple EDA, if one wants to share interactive dashboards with Bokeh over the internet, they must be hosted somewhere that people can access.
Below are some visualizations of yearly weather patterns. Correlations were found between high temperatures, drier conditions and higher occurrences of West Nile virus.
The Kaggle competition splits the odd years off as training data and uses even years as test data, which made establishing year to year trends very difficult.
While many models including deep and convolutional neural networks were used, the best performing model was a logistic regression model with a ROC - AuC score of .68. This can be seen in the visualization below. The Jupyter Notebook is also available on my Github.