On Full Metal Health: Dirty data ETL in healthcare

By PokitDok Team,

In the beginning there were four bags of data. We had members, claim lines, providers, and employers. Data format: IBM EBCDIC. And even worse, in each separate type of entity, we had multiple files and disparate sources with no naming conventions. This is the day-to-day of modern healthcare ETL.

Over on PokitDok's technical blog, the data science team steps through using Spark to handle the the extract transform and load (ETL) process of converting legacy healthcare data into modern, analyzable formats. Check out the code we provided on our post and feel our pain.

The opinions expressed in this blog are of the authors and not of PokitDok's. The posts on this blog are for information only, and are not intended to substitute for a doctor-patient or other healthcare professional-patient relationship nor do they constitute medical or healthcare advice.

  Tags: Dev, Enterprise

Be the first to write a comment.

Your feedback