In the past couple of years the Centers for Medicare and Medicaid Services (CMS) have released an unprecedented amount of data on healthcare providers participating in Medicare, often as a result of Freedom of Information Act requests. In this series of two posts we’ll take a look at a couple of CMS datasets in particular: reimbursements for Medicare services and referral patterns among participating providers. In today’s post we’ll examine some findings around payments and referral patterns at a higher level. Tomorrow, we’ll dive deeper into distinct clusters around a specific provider’s referral neighborhood. In both cases, we’ve focused on visualizing the data, an important step in exploratory data analysis.
About the Payment Data
The CMS released payment data (officially known as the Medicare Provider Utilization and Payment Data: Physician and Other Supplier Public Use File) for 2012, which contains a breakdown of Medicare Part B fee-for-service encounters across all participating providers. For each provider, we can see which services were performed (including HCPCS code and description), how many times the services were performed, along with the average submitted charge, allowed charge and subsequent payment. The intricacies of submitted charges, allowed charges, and insurance payment amounts are outside the scope of this post, where we’ll be focusing at the final payment amount, which represents how much Medicare pays providers for services. Specifically, for each provider we multiply the total number of times each service was provided by the average payment amount and aggregate over all services by the provider to estimate the total Medicare reimbursement amount.
About the Referral Patterns Data
We will be looking at a dataset the CMS says represents "referral patterns" among Medicare providers, specifically the data for 2012-2013 (all of calendar year 2012 and part of 2013). Referral relationships among providers are inferred based on individual patients seeing multiple providers within a given timeframe. For our purposes here we are using the 30-day interval data, meaning that if a patient sees provider A for some service and then provider B within 30 days, a referral from provider A to B is inferred.
Note that these are merely inferred referral relationships–there is no direct evidence presented in the data to indicate any actual referrals took place. These inferences are particularly weak in the case where a patient receives services by multiple providers, interleaved over time. For instance, consider a primary care physician performing an initial consultation, who orders diagnostic lab work for a patient, with a subsequent followup by the primary care physician. While an actual referral of sorts is present from the physician to the lab, the data will also include an inferred referral from the lab to the physician. Alternately, a patient may see two providers for completely unrelated issues during a 30-day timeframe, which results in an inferred referral.
Due to the weak nature of these inferred referrals, I like to call the referral patterns data a "co-occurrence graph" instead. This is somewhat analogous to a co-occurrence network in computational linguistics, which is a model of how words or concepts co-occur in text. While not all of the inferred relationships in the co-occurrence graph represent actual referrals, we can assume some relationship exists, however weak or indirect, in order to examine the neighborhood or sphere of influence for providers.
For the purpose of this post we’ll be examining only the data covering physicians, i.e. individual providers with an M.D. or D.O. degree. Payment and co-occurrence data for all other providers are excluded from the analysis and visualization below.
Here are the five top-grossing physicians for 2012 in terms of Medicare reimbursements.
Those are some truly astonishing numbers! Of course we must bear in mind that these total payment amounts reflect only gross reimbursement without any indication of associated overhead such as support staff and equipment or materials cost. It is also likely in some cases that subordinates submit Medicare claims using the supervising physician’s NPI, which would inflate the apparent income for that individual physician. Nonetheless, the billing practices of the top two physicians have drawn scrutiny from federal investigators in recent years.
Visualizing Co-occurrence Neighborhoods
Enough with the preamble–let’s see some visualizations! The following pictures show the immediate co-occurrence graph neighborhoods for the top-grossing physicians. That is, each node in these graphs is directly connected to the center node of interest according to the inferred referrals in the original CMS dataset. Additionally, many of the neighboring nodes are interconnected among themselves. Nodes in the graphs are sized according to total Medicare payment amounts, and they are color coded according to specialty:
While we will talk about geographic locations for physicians in these graphs, keep in mind that geolocation is not explicitly modeled here. Rather, all of the following pictures were generated using a force-directed graph drawing algorithm. While many variations of such algorithms are available, in general they share a common trait: nodes that are highly interconnected tend to attract each other while nodes that are not interconnected are repelled. This tends to group node cliques or clusters in the final layout.
By far the highest-grossing Medicare physician, ophthalmologist Dr. Melgen has a comparatively small co-occurrence neighborhood with only 113 immediate neighbors. Most of his neighbors are diagnostic radiologists. You can see that his neighborhood is segmented into two nearly distinct clusters of physicians. Each cluster is tightly connected within itself but very loosely connected to the other. He also completely dominates his neighboring physicians in terms of Medicare reimbursement amount, represented by node size.
The neighborhood for cardiologist Dr. Qamar is not so neatly separable and represents more co-occurrence relationships, totalling 404 immediate neighbors. While he also clearly dominates his neighbors in Medicare money, the difference is not so stark as with Dr. Melgen. His neighborhood is also more varied in terms of specialties, but it lacks the same sort of distinct clustering of neighbors.
Last, and in my opinion most interesting, is the co-occurrence neighborhood for Dr. McGinnis, a pathologist. With 1,750 immediate neighbors, Dr. McGinnis’ neighborhood is the largest of the top three physicians, and his neighbors cover a wide spectrum of specialties. There are more than a dozen distinct neighbor clusters in this graph.
Stay tuned for tomorrow’s post covering a deeper dive into a few of the distinct clusters in Dr. McGinnis’ co-occurrence neighborhood.