As a result of the Freedom of Information Act requests, the Centers for Medicare and Medicaid Services (CMS) have released an unprecedented amount of data on healthcare providers participating in Medicare. An earlier blog post went into more detail about what this data contains and explored the co-occurrence neighborhoods of the 5 top grossing physicians. This post will also look into the top grossing physicians and co-occurrence data, but will explore them in the context of geographical relationships across the United States. Part 1 of this post will specifically be introducing the data used, explaining why we use data visualization to analyze it, and exploring trends in the payment data by mapping the highest paid physicians across the United States.
What is Co-occurrence Data
Co-occurrence is the phrase we use for "referral patterns" among Medicare providers. Given that referral relationships among providers are inferred based on individual patients merely seeing multiple providers within a given timeframe, the word “referral” is slightly misleading for the weak nature of this relationship.
Co-occurrence relationships among physicians are inferred based on individual patients seeing multiple physicians within a given timeframe. For our purposes here we are using the 30-day interval data, meaning that if a patient sees provider A for some service and then provider B within 30 days, a referral from provider A to B is inferred.
Note that these are merely inferred referral relationships–there is no direct evidence presented in the data to indicate any actual referrals took place. These inferences are particularly weak in the case where a patient receives services by multiple providers, interleaved over time. For instance, consider a primary care physician performing an initial consultation, who orders diagnostic lab work for a patient, with a subsequent followup by the primary care physician. While an actual referral of sorts is present from the physician to the lab, the data will also include an inferred referral from the lab to the physician. Alternately, a patient may see two providers for completely unrelated issues during a 30-day timeframe, which results in an inferred referral.
Why use Data Visualization
A huge amount of data comes into our brains through our eyes every second, making our visual system extremely well built for visual analysis. One of the major aspects of data science is being able to conceptualize and understand the data you are working with, and data visualization is an incredibly useful tool that can help accomplish that goal. One common example that shows just how useful it can be is Anscombe’s Quartet, devised by statistician Francis Anscombe. Assume you have four data sets that share many characteristics: mean, variance, correlation, and regression. If you were looking at the data in a table, it may not be obvious if there were any major differences in the datasets. However, when the data is graphed, it becomes quite clear that the differences are vast. This post combines data visualization with the CMS co-occurrence data by mapping it across the United States.
For the purpose of this post we’ll be examining only the data covering physicians, i.e. individual providers with an M.D. or D.O. degree. Payment and co-occurrence data for all other providers are excluded from the analysis and visualization below. Top-Grossing here refers to just 2012 Medicare reimbursements.
The aforementioned post lists the five top-grossing physicians for 2012 (most recent data available) in terms of Medicare reimbursements (copied here for convenience).
Of course the list goes far beyond just these five doctors, so let’s take a look at the 50,000 top-grossing providers to better understand how they look displayed geographically across the U.S.
Here, the higher grossing physicians are represented by large orange vertices, while the lowest grossing physicians are shown as small purple vertices. The blue, medium-sized vertices represent physicians that fall somewhere in the middle. As you can see, our good friend Salomon Melgen ($20,827,328) is located in West Palm Beach, Florida and his runner-up buddy Asad Qamar ($18,154,753) closeby in Ocala, Florida. Melgen was recently sent to prison in a medicare fraud case earlier this year after investigators started looking into this very dataset for physicians who were billing medicare more than anyone else. Qamar was also accused of fraud, and it was revealed in the investigation process that he had even been “using his children as political pawns” - making large donations in their names.
Michael McGinnis ($12,577,006) shown above by the big, orange vertex in New Jersey, runs three practices from there. He states here that he is high on the list because nearly 30 pathologists who work at the three practices he runs were using his NPI (National Provider Identification Number) to bill patients. This could help explain why Dr. Michael McGinnis had many distinct clusters of networks in his co-occurrence neighborhood graph. Different pathologists using his NPI may have referred patients to different networks of other physicians. With pathologists using his NPI across 3 practices it’s easy to see how they may have had different referral patterns.
Looking at this graph, it is clear that there are some regions where higher grossing physicians seem to be more or less likely to be found. Major cities seem to be a common theme for highest grossing physicians across the United States, with Florida booming perhaps due to their large percentage of retired citizens and the 2 fraud cases already mentioned.
Now that we have discussed the vertices of the above map in detail, you might be wondering about what the edge in the above graph represent. These edges connecting the physician vertices together represent the co-occurrence relationships between these physicians. We will begin to explore and analyze more of what the co-occurrence relationships mean in part 2 of this post, while still keeping the insights from the payment data we discussed a relevant topic as well.