In the beginning there were four bags of data. We had members, claim lines, providers, and employers. Data format: IBM EBCDIC. And even worse, in each separate type of entity, we had multiple files and disparate sources with no naming conventions. This is the day-to-day of modern healthcare ETL.
Over on PokitDok's technical blog, the data science team steps through using Spark to handle the the extract transform and load (ETL) process of converting legacy healthcare data into modern, analyzable formats. Check out the code we provided on our post and feel our pain.
Tags: Dev, Enterprise