This is the fifth installment in the series 4 Quick Tricks for Data Summary.
In Part 2 we covered how to load our example data collection into Flow. In Part 3 we covered our first of four techniques: generating a descriptive statistics set. In Part 4 we covered the second of our four techniques Creating a Metadata Collection.
In this post we will cover the third of our four techniques: Performing a Fast-Based Correlation Analysis on a Target Data Point.
A common task in data analysis is to discover variables which influence or may influence a target variable of interest.
Often times a go-to technique is to evaluate the correlation between pairs of variables. Strong correlations indicate that a potential pattern may exist and can signal a reason to further analyze the relationship.
In this post we will continue to use the example data collection introduced in Part 1.
The example collection has a data point named 'Sales'. Our goal is to discover if any data points are correlated with Sales.
Performing a Fast-Based Correlation Analysis on a Target Data Point
Flow provides a short-hand correlation detection action called Correlation Analysis.
The Correlation Analysis action targets a response data point (variable of interest) and computes the slope and correlation of all numeric data points against the target.
The result of the correlation analysis is stored in a new generic data collection.
Adding the Correlation Analysis Action
Step 1.) Under Actions -> Summary Functions -> select the Correlation Analysis menu item.
Step 2.) Target a Generic Collection. In this example we target the Ajax Associates Results collection.
Step 3.) Target a Response Data Point. In this example the response data point is 'Sales'.
Step 4.) In the Result Collection field box enter a name for the collection that will store the output of the action. In this example we name the result collection 'Correlation Test'.
Step 5.) Click OK to add the configured Correlation Analysis action to the Flow. Run the Flow to execute.
The result of our action is a new generic data collection named Correlation Test.
This generic data collection contains a generic object for each numeric data point in our target collection.
Each generic object contains four data points:
- Response DP
- Feature DP
The 'Response DP' data point stores the name of the target response data point.
The 'Feature DP' data point stores the name of the paired numeric data point tested.
The 'Correlation' data point stores the computed correlation value of the Response DP and the paired Feature DP.
The 'Slope' data point stores the computed correlation value of the Response DP and the paired Feature DP.
In this post we covered how to perform a third technique for data summary. Using the Correlation Analysis action we can add a workflow step which generates a collection holding computed correlation and slope values for each numeric data point against a target data point of interest.
In the next section we will explore the last technique for data summary in this series: Profiling Blanks or Null Values for a Target Collection
Continue: 4 Quick Tricks for Data Summary - Part 6 - Profiling Blanks or Null Values for a Target Collection