Overview
In this blog post, I provide a worked example demonstrating how to perform an analysis of blanks on a target dataset. When analyzing data a typical first step is to get an understanding of where there are missing values. Identifying where there are missing values in your data can help you make more informed decisions about your analysis approach.
Overview
This blog post demonstrates how to identify and remove duplicate records from a dataset. I provide a worked example that shows how to configure and implement the deduplicate function against some sample customer data. The deduplicate function is a critical action which allows the workflow developer to create rich data validation and transformation rules.
Overview
In this article, I provide an introduction to measuring and evaluating data quality using Flow. I briefly discuss data quality dimensions and data quality assessment. Then I examine how a schema-on-write approach increases the time and cost required to assess data quality along with a brief discussion of schema-on-read technology. I then introduce Flow's "Generic Data" technology as a solution to the deficiencies of schema-on-write and schema-on-read for data quality. Finally, I provide a hands-on working example of doing data quality in Flow Analytics using some sample name and address data.