The When and Where of Pittsburgh’s Bunched Buses

Categories Analytics, Transpotation, Visualization

Having previously established that Pittsburgh’s buses do bunch and a technical platform for archiving and accessing historic bus service data, I sought to extend this inquiry by quantifying aspects of frequency, timing, and location of bus bunching. The resulting project is available online at:


Vehicle locations for all buses servicing routes 61, 71, P1, and G2 were recorded once every sixty seconds throughout the month of March, 2016, resulting in a dataset of some 1.5m records of timestamped bus locations. Each location data point is processed to measure the minimum distance to the next closest bus operating at the same moment on the same route in the same direction. If less than 250 meters away from the next bus servicing the same route, the point is classified as “bunched.”

Across all routes, approximately 18% of all data points were “bunched.” With only 5% of observations bunched, Route G2 is the best performer among our selected routes. Route 71A measured worst, with over 26% of all observations bunched (most bunched P1 observations are attributable to observations near the terminals on either end of its run).



Humans are really good at pattern detection, and especially at detecting visual patterns. In hopes of finding patterns to shed some light on the bunching phenomenon, I created two distinct visualizations. The first presents bunching proportionally to all bus services. The second visualizes bunching by the frequency of occurrence. Proportions can reveal the onset and persistence of a phenomenon (within specific trips, say). The frequency presentation provides a high level overview of patterns across multiple simultaneous routes throughout the course of the day.

For the proportional view, time and space are bucketed—time into 10 minute windows, and space into 1000 foot segments along the bus’s route path. The portion of bunched observations is calculated for each bucket.

This project’s one true innovation is the choice of labels for the vertical axis (bus distance along its route path). The axis labels correspond to the neighborhood boundaries crossed as the bus progresses. This makes the distance scale instantly comprehensible for anyone familiar with Pittsburgh’s neighborhoods.


Visualized in this manner, clear horizontal lines emerge corresponding with areas of congestion (e.g. the intersection of Forbes Ave and Murray Ave is the obvious band between South Squirrel Hill and Greenfield).

The proportional visualization can only display one route at a time. Gaining a 10,000 foot, multi-route view requires a heatmap and a timelapse. Fortunately, CartoDB has an “off the shelf” tool to build exactly such a visualization. Daily observations for March 1 – 31 were combined by time of day. The resulting heapmap shows all routes in the project at once, or the user can zoom in and look any specific area (e.g. comparing compare the patterns along Forbes Avenue (outbound 61) and Fifth Ave (inbound / outbound 71 and inbound 61).


Results and Discussion

Both visuals show trends, though the trends are not entirely obvious. Location seems far more deterministic of bunching than time. Bunching seems to peak at approximately hourly intervals, and the movement of heat trails in the heatmap show repeated patterns of bunching. I found surprising the limited impact of rush hour (or the persistence of bunching outside rush hour), and the dramatic difference between the on-street routes of 61 and 71 versus the busway routes of P1 and G2. (When the Port Authority suggests it wants to build a busway through Oakland, it’s easy to see why.)

These visualizations provide a foundation to begin to address the real question of “why does bunching occur,” and “what can be done about it?” The presence trends in the data points to the opportunity to predict when and where bunching will occur. If predicted, it may be possible to reduce bunching by, for example, making microadjustments to the bus departures (leaving, for example, 90 seconds earlier than scheduled), or by taking advantage of natural differences between drivers (such as staggering fast drivers with slow drivers when bunching is likely to occur). By developing a method to identify the location and time of bunching occurances, this project lays a foundation for future causal analysis

The two visualizations and a short narrative are presented together at


After additional user feedback and revision, it was time to share my findings. I shared the link via social media, which in turn was picked up by Pittsburgh’s excellent Eat That, Read This (mention in ETRT is a quick way to make one “Pittsburgh famous”, or “Pittsburgh infamous” as the case may be). From there, the project was picked up and shared out by several local civic organizations, including Pittsburgh’s hip, tech-savvy mayor, Bill Peduto.


I’ve also had occasion to speak with several news organizations, and to present the visualization on several occasions on campus. While my project hasn’t ended the bunching problem, I’m humbled to get to contribute to the conversation about Pittsburgh’s transportation needs.

Press & Presentations

Mark is data science masters student and practitioner at Carnegie Mellon University with interests in transit, environmental policy and social justice.