Draining the Data Lake

Data is King—Once it is Cleaned, Structured and Fully Formed, that is

Blogger: Matthew Leick-Macari, Data Engineer | Omaha, NE, USA
July 21, 2016

In today’s fast-paced, data-driven world having answers at your fingertips is a priority. Increasingly our clients need data to predict what will happen in the future and make data driven planning and design decisions. Simply put: Data is king. You may have heard about predictive analytics and the power of information to transform our lives, but what about the data itself?  Data doesn’t emerge from a vacuum fully formed and ready to be analyzed. Often times, data must be cleaned, structured and transformed many times before it is ready to be analyzed.

Our Predictive Analytics team’s mission is to help inform the decision-making process for HDR and our clients. Before we can get to work, we have to have data. As a data engineer on the Predictive Analytics team, I think it’s important to explain what we do, and show how the data we wrangle and distill helps people make decisions, and ensures we advise clients with the best possible information.

Our approach is simple: We take the distinct data sources in many different forms from flat files (data files that contain records with no structured relationships) to web APIs (application programming interfaces) and merge them into our ever-growing data warehouse. In a sense, we drain the data lake a little bit at a time to build and maintain a data warehouse that we use to improve our clients’ business outcomes. The details of what we do are a little more complicated, because the data comes to us in different shapes and formats. Our proprietary toolkit of data processing tools is always expanding to help manage and manipulate data.

On most projects, our primary data source is data provided by the client; however, we rely on other sources to help inform our analysis. For instance, for our healthcare projects, we’ve built a vast repository of medical coding standards to help make sense of healthcare data and the different medical coding nomenclatures. Being able to crosswalk between different medical coding and reporting systems is a key component to our analysis and a game changer when we’re presenting projections and trends to clients. This bridge also allows us to be flexible in using any data the client provides, and to group procedures and diagnosis information into meaningful categories for clients' planning and decision making.

Our ever-expanding healthcare data warehouse includes metrics and data points covering all the hospitals in the US. These metrics include the number of ICU beds and general beds that are available in each hospital, the services provided, and staffing and square footage. We track head counts of each department and can tie the information to the quality metrics. Additionally we measure where patients are coming from by zip code and by the type of case mix from each hospital—a key building block to understand current and future demands within a market.

Gathering data about board certified practitioners from federal and state registries allows us to build a complete picture of the healthcare provider community and how they work together over time in a given area. This helps inform the strategic decision-making processes of our clients and community health providers.

In healthcare, as well as in other industries, changes in where and how people live have ramifications for planning and design. Changing tides in population demographics, such as an aging population, and increased rates of urbanization, impact what facilities people need today and in the future. Our models and analytical processes all rely on our large data warehouse of economic, socio-demographic, and health statistics.

On all of our projects we must take into account space, location and geography. As a result, we have built all of our data infrastructure to support rich geographic information. Using geospatial concepts, we combine and layer all of our information together, using the geospatial component to provide a coherent and impactful picture for our clients.

Wrangling messy data can be overwhelming, but that is what we do. We are data experts, excited to build our data warehouse or tackle data sets of any kind or size. I know the data we work with will constantly grow and expand. We’re equipped to meet the challenge. 

Image credit:  © Shutterstock.com / Doug Lemke