In this article, I provide an introduction to measuring and evaluating data quality using Flow. I briefly discuss data quality dimensions and data quality assessment. Then I examine how a schema-on-write approach increases the time and cost required to assess data quality along with a brief discussion of schema-on-read technology. I then introduce Flow's "Generic Data" technology as a solution to the deficiencies of schema-on-write and schema-on-read for data quality. Finally, I provide a hands-on working example of doing data quality in Flow Analytics using some sample name and address data.
n this post, we build a reusable eight-step Flow that performs a basic Benford's Analysis on a sample data set. This Flow loads the sample data set then obtains the first digit from each observation, builds a hypercube and uses it to count the first digits, extracts a dataset containing the distribution and, finally, computes the expected distribution and compares it to the observed distribution by taking the difference.