The data we decided to use is from the Yellow Taxi trip sheet data from the NYC Taxi & Limousine Commission. Stumbling upon the data we surprised with the amount of data that was available. The data had a variety of potential ways it could be displayed.
Since the data was about taxi routes, it reminded us about line visualizations for data. The first one shows plane flights and the second is an illustration of mouse movement tracking. These ideas motivated us to draw the trails for the taxi route locations. We thought it would be interesting to see trends and data concentrations of certain categories.
Pursing the idea, we quickly realized how much data we were dealing with. Referring back to the image of the data above, there were over a million data entries a month. We knew that bringing in that much data would result in unfavorable load times. Briefing it over again, our solution was to bring in and store the data through a custom API and store it on a local server. Doing so will greatly increase the load times when loading in the data points for the data. However, this was the first experience with this approach. We soon had to shrink our data pool to a manageable size, opting to do a single day’s worth of data instead; which still yielded a bit under 300 thousand data entries.
As we split ways to tackle the project on two fronts, I tasked myself to sketch out a rough idea of what we wanted. On the front end, our goal was to utilize the Google Maps API to display our data. The visualization will plot both pick up and drop off locations using latitude and longitude information from each route. Each route will be drawn in the way of “as the crow flies,” connecting the pick up and drop off points with a line. Each route’s drop off location on hover will display additional information about the route such as distance, actual distance this time, tip amount, etc. In addition, we would have a time slider to limit the amount of routes being shown at a time to reduce clutter.