In Part 1 of this 2-part post, we introduced a dataset from the Centers for Medicare and Medicaid Services (CMS), defined co-occurrence relationships between physicians based on that data, and used data visualization to look closely into the CMS payment data geographically thereby uncovering some trends related to the highest grossing physicians. Now in Part 2, in an effort to further share the benefits of visualizing and analyzing data in graph form, we’ll delve into the co-occurrence relationships between these physicians and discuss what can be inferred.
Co-Occurrence Referral Relationships
In order to get a better idea of the co-occurrence relationships (defined in Part 1), let’s keep the vertex size the same as the image in Part 1 (based on total payments), but change the color of the vertices to represent the number of co-occurrence referral patterns (edges) between various physicians (vertices). To reiterate, an edge drawn between two physician vertices represents a patient that has seen both of those physicians within a 30-day time frame.
Here, the orange color now represents physicians with more co-occurrence referral patterns (edges), the purple color represents physicians with less co-occurrence, and the blue represents something in between the orange and purple. From a quick look at the graph, it becomes clear that while some higher grossing physicians appear to have a high rate of co-occurrence, not all high grossing physicians have the highest co-occurrence, as noted by some smaller orange vertices scattered throughout the graph. However, very few of the large vertices are on the purple side of the scale here, suggesting that very few of the top grossing physicians have a low co-occurrence. This indicates that, not surprisingly, having a high co-occurrence rate (perhaps even high referral rate) is an important and common feature among high grossing physicians.
Looking at the colors of the edges, it appears that some of the strongest co-occurrence patterns include physicians from Florida. This can be seen by looking at the bright orange edges stretching from there to a region as far west as Texas and other regions as far north as New England. It is also interesting to note that among the largest vertices in Florida, the largest two were found to be physicians associated with fraud as noted in Part 1 seem to have a lower co-occurrence. This may suggest that high grossing physicians with low co-occurrence (large darker colored vertices) are more likely to be associated with fraud.
In layperson's terms, using color, the above graph showed us which physicians referred more patients to each other, and, noting vertex size, which physicians made more money. To gain further insight into just the co-occurrence referral patterns piece of the puzzle, let's resize the vertices by number of edges to emphasize and match the color scale. In addition, we will decrease the sensitivity of the color scale on the top end to yield a higher percentage of orange vertices.
Looking at this this graph, it appears there is a much higher co-occurrence rate, suggesting many more referrals between physicians on the eastern side of the country. Just looking at vertex density alone in the graphs posted so far, there is clearly a higher percentage of our 50,000 top-grossing physicians in the east as well. Perhaps higher physician density could be leading to higher co-occurrence rates organically, without one physician actually referring a patient to another physician. This is one caveat of the “referral” data provided by the CMS.
Larger orange vertices seem to be prominent in major cities, just like in the graph that sorted size and color by total payments; however, there are some larger orange vertices that are located in less major cities as well unlike the previous total payments graph. While looking at these graphs, feel free to draw your own conclusions and make additional observations we did not directly discuss in this post. Also note the importance of visualizing data in different and unique ways, and the benefits that can be achieved from viewing data in a graph as opposed to merely looking at it in a table or spreadsheet.
Looking Deeper: Community Detection
In addition to visualizing the raw CMS referral (co-occurrence) data, performing data transformations or running the data through various algorithms and/or classifiers can also be useful. We will be using the Louvain community detection (aka clustering) algorithm, also called Louvain modularity method to, as the name implies, detect communities. These communities will represent groups of physicians which have seen a high number of similar patients within a 30-day time frame. (For those without a data science background: clustering and community detection are useful ways to help us find communities of physicians that tend to refer patients to one another.)
The Louvain modularity method optimizes Modularity, which in this case measures the density of links inside communities compared to links between communities. It does this by iteratively grouping vertices together into larger and larger groups of vertices with strong connections (highly weighted edges) until the final clusters (communities) have been found and the modularity can no longer be increased. The Louvain modularity method is designed for networks, aka graphs, which makes it a nice fit for our purposes, since we are using a graph to look for the relationship between doctors and their referral networks.
The above graph was generated by performing the Louvain modularity method on the CMS co-occurrence data with a modularity of 0.948 (perfect would be 1.0), and then coloring the resulting communities for visualization. The size of the vertices here corresponds to the physician's payment information with larger vertices being higher grossing physicians.
With large scale graphs it can be difficult to tell just how many edges are between groups of vertices that are close together. The nice thing about using an algorithm to cluster groups of vertices, is that it takes guessing out of the equation and simply displays something that is easier to visually process. For example, looking at any of the graphs shown before clustering, we might not have been able to tell that there were strong connections between distinct communities along the east coast of the United States.
These communities of providers can be thought of as large scale referral networks, assuming if a patient was seen by two physicians within a 30-day time frame, there was a referral of one physician by the other. Although a referral network spanning nearly the entire state of Texas seems farfetched to a certain extent, there is likely some truth behind the numbers. For example, the state boundary lines between Texas, Louisiana, Mississippi, Alabama, and Florida are quite clearly defined spanning across the Gulf of Mexico region, suggesting physicians in the south tend to refer patients within state.
For the sake of completeness, let’s decrease the size of the clusters to something that more closely resembles actual referral regions. This can be achieved by decreasing the resolution parameter of the Louvain modularity method.
This time, we were only able to achieve a modularity of 0.903, which is slightly lower than before, but still good enough for our intents and purposes. By decreasing the size of the communities, it is not clear just how diverse different regions of the country are. For example, you might expect to see many smaller clusters of referral communities in the New England area due to its dense population across many states, which is somewhat evident here. However, Florida seems to have even more communities than New England, which again could be because Florida is such a popular retirement destination.
Conclusions and Tooling
There are many more observations and speculations that can be drawn from the visualizations in both this blog post (Parts 1 and 2) and the CMS dataset as a whole, but hopefully we have provided you with an informative and interesting peek into both.