Transit Time to Clinic

At the Children’s Hospital of Pittsburgh’s Primary Care Center in Oakland many patients rely on public transit for transportation to and from their doctor appointment. To support physician researcher Dr. Ana Malinow in investigating the extent to which long travel times negatively impact appointment keeping, I created an ArcMap accessibility utility to estimate the time in transit to reach the Oakland clinic from any location in Allegheny county.

Compared with using Google Maps or another travel planning service, the ArcMap accessibility utility 1) can estimate thousands of travel times imported from a spreadsheet and 2) may be used with patient’s Protected Health Information that could not otherwise be shared with a third-party.

To create the utility, I built a multimodal network dataset in ArcMap by combining the city street network with the Port Authority of Allegheny County public transit network. This enables either a Closest Facility analysis that calculates the walking plus transit travel time to the clinic from a list of locations (i.e. patient homes), or a Service Area analysis showing isochrones (graduated zones of travel time, e.g. the area in which it is possible to reach the clinic with 15 – 30 minutes of travel time).

The image above shows the paths travel times for a set of example addresses.
The image above shows the paths travel times for a set of example addresses.
The graduated shades above show zones of travel times to the clinic.
The graduated shades above show zones of travel times to the clinic.

To validate the travel time predictions created by the accessibility utility, I compared the utility’s predicted travel times to those provided by Google Maps and the Port Authority’s Trip Planner. I evaluated compared predicted times for ten points corresponding to the centroids of zip codes with significant patient populations.

Predicted Travel Time Comparison between the accessibility utility (ArcGIS), Google Maps, and the Port Authority's Trip Planner.
Predicted Travel Time Comparison between the accessibility utility (ArcGIS), Google Maps, and the Port Authority’s Trip Planner.

This validation shows that the accessibility utility produces travel time estimates substantially consistent with other travel time estimators.

To learn more about the project:
Check out aPowerPoint summary of the project.
Or download the full project report.

This utility has been used to support ongoing research in patient appointment keeping.

The When and Where of Pittsburgh’s Bunched Buses

Having previously established that Pittsburgh’s buses do bunch and a technical platform for archiving and accessing historic bus service data, I sought to extend this inquiry by quantifying aspects of frequency, timing, and location of bus bunching. The resulting project is available online at:


Vehicle locations for all buses servicing routes 61, 71, P1, and G2 were recorded once every sixty seconds throughout the month of March, 2016, resulting in a dataset of some 1.5m records of timestamped bus locations. Each location data point is processed to measure the minimum distance to the next closest bus operating at the same moment on the same route in the same direction. If less than 250 meters away from the next bus servicing the same route, the point is classified as “bunched.”

Across all routes, approximately 18% of all data points were “bunched.” With only 5% of observations bunched, Route G2 is the best performer among our selected routes. Route 71A measured worst, with over 26% of all observations bunched (most bunched P1 observations are attributable to observations near the terminals on either end of its run).



Humans are really good at pattern detection, and especially at detecting visual patterns. In hopes of finding patterns to shed some light on the bunching phenomenon, I created two distinct visualizations. The first presents bunching proportionally to all bus services. The second visualizes bunching by the frequency of occurrence. Proportions can reveal the onset and persistence of a phenomenon (within specific trips, say). The frequency presentation provides a high level overview of patterns across multiple simultaneous routes throughout the course of the day.

For the proportional view, time and space are bucketed—time into 10 minute windows, and space into 1000 foot segments along the bus’s route path. The portion of bunched observations is calculated for each bucket.

This project’s one true innovation is the choice of labels for the vertical axis (bus distance along its route path). The axis labels correspond to the neighborhood boundaries crossed as the bus progresses. This makes the distance scale instantly comprehensible for anyone familiar with Pittsburgh’s neighborhoods.


Visualized in this manner, clear horizontal lines emerge corresponding with areas of congestion (e.g. the intersection of Forbes Ave and Murray Ave is the obvious band between South Squirrel Hill and Greenfield).

The proportional visualization can only display one route at a time. Gaining a 10,000 foot, multi-route view requires a heatmap and a timelapse. Fortunately, CartoDB has an “off the shelf” tool to build exactly such a visualization. Daily observations for March 1 – 31 were combined by time of day. The resulting heapmap shows all routes in the project at once, or the user can zoom in and look any specific area (e.g. comparing compare the patterns along Forbes Avenue (outbound 61) and Fifth Ave (inbound / outbound 71 and inbound 61).


Results and Discussion

Both visuals show trends, though the trends are not entirely obvious. Location seems far more deterministic of bunching than time. Bunching seems to peak at approximately hourly intervals, and the movement of heat trails in the heatmap show repeated patterns of bunching. I found surprising the limited impact of rush hour (or the persistence of bunching outside rush hour), and the dramatic difference between the on-street routes of 61 and 71 versus the busway routes of P1 and G2. (When the Port Authority suggests it wants to build a busway through Oakland, it’s easy to see why.)

These visualizations provide a foundation to begin to address the real question of “why does bunching occur,” and “what can be done about it?” The presence trends in the data points to the opportunity to predict when and where bunching will occur. If predicted, it may be possible to reduce bunching by, for example, making microadjustments to the bus departures (leaving, for example, 90 seconds earlier than scheduled), or by taking advantage of natural differences between drivers (such as staggering fast drivers with slow drivers when bunching is likely to occur). By developing a method to identify the location and time of bunching occurances, this project lays a foundation for future causal analysis

The two visualizations and a short narrative are presented together at


After additional user feedback and revision, it was time to share my findings. I shared the link via social media, which in turn was picked up by Pittsburgh’s excellent Eat That, Read This (mention in ETRT is a quick way to make one “Pittsburgh famous”, or “Pittsburgh infamous” as the case may be). From there, the project was picked up and shared out by several local civic organizations, including Pittsburgh’s hip, tech-savvy mayor, Bill Peduto.


I’ve also had occasion to speak with several news organizations, and to present the visualization on several occasions on campus. While my project hasn’t ended the bunching problem, I’m humbled to get to contribute to the conversation about Pittsburgh’s transportation needs.

Press & Presentations

Health Information Exchange: Realizing the Value of Health Information Technology

I had the privilege of presenting on health information exchange (HIE) this morning to Prof. Rema Padman’s Health Information Technology class. My goal was to put HIE in context, explain its history, challenges, and provoke discussion as to the future form of exchanging electronic health information.

When Congress allocated $35bn to support the adoption and implementation of information technology in healthcare, America made a down payment on promise of health information technology to increase the efficiency, affordability, safety, and quality of care of healthcare in America.

The result of that $35bn in HIT investment has been largely to transform silos of paper-based health information into silos of electronic health information. Electronic records have improved quality and efficiency within healthcare organizations, and possibly been a driving factor behind the consolidation that has characterized healthcare over the last decade. Much of the promise of health information technology remains out of reach, however, until such time that healthcare providers are meaningfully connected and exchanging health information.

The result of $35bn in health IT investment has largely been to transform paper silos to digital silos
The result of $35bn in health IT investment has largely been to transform paper silos to digital silos

Health information exchange addresses this need. Health information exchange is both the verb of (mass) exchange of health information between healthcare organizations, and also the noun for the entity (usually a non-profit entity, often with state or federal grant support) that provides the infrastructure and connectedness supporting such an exchange.

Health information exchange has many obvious benefits, but those benefits accrue primarily to patients and health plans (at present), and to the community and population (in the idealized HIE realization). As currently implemented and under current reimbursement schemes, healthcare providers seldom directly benefit form HIE (except to the extent that they value providing affordable, quality care), but are nonetheless expected to foot the bill for HIE (based on the currently popular business models for HIE entities).

I presented three (out of many) cases where HIE reduces costs or improves care:

  1. Reduction in duplictative tests and services
  2. Improvement in the quality of emergency care
  3. Reduction of preventable hospital readmissions

The takeaways from today’s talk:

  1. Health information exchange contains the potential to improve the quality and affordability of healthcare
  2. HIE faces considerable barriers to adoption, including gaining provider trust, matching provider incentives, and creating sustainable business models for the HIE organization
  3. Health Information Exchanges are one of many competing mechanisms for the exchange of health information. A decade ago, the Personal Health Record was the darling child of electronic health exchange. Today, the PHR is dead, and HIE is the current contender. HIE faces competition however, from direction (vendor mediated) connections, from upstart “HIE like” private companies aggregating pharmacy benefit and lab result information, and from future distruptions in the market.

There’s no guarantree of the future success of HIE. Although HIE has far greater adotpion than the PHR ever did, the future success of HIE depends on its ability to overcome obstacles to adoption, while silmutaneously increasding the quantity and quality of data available to its members through connections with laboratories, PBMs, and payers

The slides from my talk today are available below:

CMU Health Information Systems Presentation: Health Information Exhcnage: Realizing the Promise of HIT
Click to download CMU Health Information Systems Presentation: Health
Information Exchange: Realizing the Promise of HIT

Data Science Careers

I came across a quote yesterday in Cathy O’Neil and Rachel Schutt’s Doing Data Science that really resonates:

The best minds of my generation are thinking about how to make people click ads… That sucks.

~Jeff Hammerbacher

One surprise about data science is that most data science jobs exist within the marketing departments of large corporations. Marketing departments have “big data” on their potential customers, a clear business case for hiring smart people to mine those insights, and budgets with which to pay those smart people. But I can’t help but agree with Mr. Hammerbacher.

I’m grateful to my Heinz College public policy peers for their constant reminder of the broader, more interesting world in which data science has just as much to offer. Data and funding are greater challenges in this broader world than in the corporate world, but so too is the potential for impact.

Pittsburgh Bus Wait Times

A simple website displaying wait times between buses

It’s a bit messy (time constraints!) but I recently put together a simple web page that displays the average time between bus arrivals for any PAAC Route 61A/61B/61C/61D stop. It also shows the average wait time, and the excess wait times caused by variance in arrival time spacing.

This website can be accessed at the following link:

Human Stories vs Data

I came across an insightful (and indicting) quote tonight in a data visualization paper:

I think people have begun to forget how powerful human stories are, exchanging their sense of empathy for a fetishistic fascination with data, networks, patterns, and total information. … Really, the data is just part of the story. The human stuff is the main stuff, and the data should enrich it.

~Jonathan Harris

The quote led me to Jonathan Harris’s website, which profiles his work as a digital artist. My immediate impression is of an artist of the first rate, who uses programming and data as his materials and media. Check him out.

Location and Activity Data

As a fun exercise in my Data Science Pipeline class, I used my smartphone to  collect location data for approximately three weeks. A built-in algorithm also attempted to determine my activity (e.g. riding in vehicle, walking, etc.). By combining my location data with my timestamped activity data, I was able to produce a map of travels and modes of transportation:

Around Home / Squirrel Hill


Around Pittsburgh

Around Pennsylvania

For more, view the live version of the project here:

Pittsburgh Bus Bunching Project Features on CMU Students for Urban Data Analytics Website

Bus ClusterWhat began as a casual observation that Pittsburgh’s buses, when running late, often arrive in pairs turned into a data warehouse and empirical investigation. Fellow students Bhavna, Ranjana, Rohita, Enbo and I built a data warehouse to capture the real-time bus location data published by the Port Authority of Allegheny County. Our analysis of the data revealed that, indeed, buses do tend to “bunch.”

I’m excited to share that this project and its results are currently features on the CMU Students for Urban Data Analytics website.

Check out the post here:

The post was also picked up in today’s issue of Eat That, Read This:

SUDS Post Featured in Eat That, Read This

A big thank you to my classmates who I had the pleasure of working with on this project, and to Students for Urban Data Analytics for featuring the project!