Data to Decisions

Integration, Automation, and Data Analytics

Three Common Types of Data-Driven Decisions

Strategic, Tactical, and Operational Decisions In my experience, most business decisions are, at lea

Strategic, Tactical, and Operational Decisions

In my experience, most business decisions are, at least to some extent, data-driven decisions. This post takes a quick look at three types of business decisions that are improved and facilitated using a data-driven approach.The three types of business decisions are:

 

Business Analytics - Three Types of Data Driven Business Decisions

 

Strategic Decisions

Strategic decisions determine the long-term direction of a business or organization. Strategic decisions are more difficult to model and structure than tactical or operational decisions. Having said this, there are a number of high-level models, or frameworks, for strategic decisions (such as the Porter five forces model). It is certainly safe to say that most strategic decisions are the least structured of three considered here. One clear example of a strategic decision is the board and executive management of hospital evaluating whether or not to expend capital for expansion of their geriatric facilities over a ten year period - where the goal of this decision is to meet projected increases in demand from a growing population of elderly patients.

Tactical Decisions

These are medium-term decisions (generally three to twelve months) that in many cases involve implementation of one or more strategic decisions. Tactical decisions also involve responding to unforeseen events. Examples of tactical decisions may include fiscal year budget adjustments, re-allocation of marketing resources, or short-term changes in product pricing. A specific example may involve a business' supplier that unexpectedly raises prices requiring the business to consider alternatives - such as a temporary switch to an alternate supplier.

Operational Decisions

Operational decisions are semi-structured, short-term (i.e. generally less than three months) decisions generally involving tasks required to implement tactical decisions. An example may be the decision to increase near-term staffing levels in an anticipation of increased short-term demand for products and services.

Additional Decision Types

This blog is about data-driven decision making, business analysis, data management, and automation. These following two classes of business decisions are well covered elsewhere but still warrant a mention here. They are:

  • Programmable decisions - these are decisions with a single fixed goal that require following a set of steps or prescribed procedure. These decisions usually follow a predefined set of steps or written, formal procedures that prescribe certain actions.
  • One-time, non-programmable decisions. These are decisions that arise from one-time, non-routine events. Each decision has a highly unique goal and requires a one-time evaluation of a set of alternatives that are unlikely to reoccur.

The Six Steps of Data-Driven Decision Making

Fundamental Overview of the Data-Driven Decision Making Process This post provides a high-level over

Fundamental Overview of the Data-Driven Decision Making Process

This post provides a high-level overview of the data-driven decision making process broken down into six steps. Successful adoption of data-driven decision making requires a general understanding of each of these steps, the processes involved, and how they relate to each other. The data-decision making process is fundamentally the same regardless of:

  • The type of business decision
  • The size of the business - from small to large
  • The number of decision makers - from one to many

In reality, it is impossible to carry out the six steps of data-driven decision making without the aid of technology. However, the process is not dependent upon a particular set of technologies set. The purpose of this post is to aid understanding of the fundamentals which will later aid the evaluation of particular technologies.

The Six Steps

If the culture of a business does not support the data-driven decision making process, no amount of investment in technology or consulting will confer any competitive advantage. Likewise, decision makers, armed with sophisticated decision tools, who still employ a ready shoot aim approach to data collection for data-driven decisions, have wasted their companies' time and money. It is therefore important for executives and managers to understand both the individual steps in the process of data-driven decision making as well as the entire process. The following provides a high level overview of the data-driven decision making steps.

Data-Driven Decision Making Steps

Business Analytics - The Data-Driven Decision Process

Establish a Goal (Step One)

The single most important step in the data-driven decision process is establishment of a goal. Failure to define a specific goal is like starting a journey with no destination. You may see some interesting sights but who knows where you'll end up. A well-defined goal has two important attributes: it is both attainable and measurable. It almost goes without saying that no decision process will deliver results against an unattainable goal. Likewise, failure to precisely quantify what constitutes attainment of a goal is equally as bad. In this case, a business may seem to have attained a goal when in fact they have not.

Define and Model Alternatives (Step Two)

Establishing a goal is about where to go. Modeling alternatives is about how to get there. There are two primary steps involved: first, quickly eliminate the majority of unfeasible alternatives and, second, develop a short list of feasible alternatives. A quantitative model of each alternative in the final list must then be developed.

Identify Required Data (Step Three)

Identify the data required by each alternative and the metrics associated with the alternative's model. Here, the quality, accuracy, and timeliness of the data must be considered. If the required data is low quality, inaccurate, or out-of-date, the alternative should be removed from the list.

Collect and Organize Data (Step Four)

It should be no surprise, that the data-driven decision making process depends on rapid access to timely and accurate data. Nor should it be a surprise that data management considerations play a central part in the data-driven decision making process. Without efficient and fast data management processes in place collect and organize supporting data, a data-driven decision making process is nearly impossible to implement. It is therefore imperative to build data-driven decision making processes around sound data management processes.

Data Analysis (Step Five)

The relationship between goals, alternatives, metrics, and data forms the data-driven decision making. Well defined goals, alternatives and metrics should make this nothing more than a technical step. However, business must have decision-makers, analysts, or other resources that understand how to apply data analysis techniques within the context of a data-driven decision process.

Decide and Execute (Step Six)

Ultimately, someone has to decide and often others must execute. While making decisions and executing them are a step in the data-driven decision process, this last step is also part of a varying number of other business processes. However, including execution as a step is necessary because (quite obviously) there is no value to making data-driven decisions if they are never executed. As a final note, the results of a decision and its execution provide feedback to the establishment of new goals thereby forming a continuous loop.

Business Analytics - The Role of Metrics

Metrics and the Data-Driven Decision Making Process This post discusses the role of metrics and thei

Metrics and the Data-Driven Decision Making Process

This post discusses the role of metrics and their evaluation within the overall context of the six steps for making data-driven decisions. Reaching an established goal starts with defining a set alternatives that will lead you there. Metrics are established to evaluate alternatives and their suitability for reaching a goal. It is the analysis and evaluation of metrics that turns a decision process into a data-driven decision process. The goal is to optimize decisions through improving the quality of metrics and improving their ongoing evaluation. This is depicted below as a linear relationship between the precision of analytic models and the quality of data-driven decisions.

Improving the Quality of Metrics

A collection of metrics forms a model. This model is used in each step of the data-driven decision process. It follows that improving the quality and precision of a model (as ultimately measured by outcomes) will improve the quality of data-driven decisions. Most common business problems do not require advanced analytical techniques. There are, however, a small set of relatively simple, commonly applied analytical techniques that can be used to improve the precision and application of models. These techniques generally fall within two broad areas of basic descriptive statistics: numerical and visual.

Three Widely Applied Numerical Techniques

There are three widely applied numerical techniques employed by business analytics. They are:

  • Data location (or measures of central tendency) - these techniques measure or summarize the point of central value within a given data set. Example techniques include mean, median, and mode.
  • Variability or dispersion - these techniques measure the amount of scatter or distance of all individual data points from their central location. Examples techniques include variance and average deviation
  • Identification of outliers - this technique refers to the identification of data point values that lie far outside the average dispersion of all other data points

Three Widely Applied Visualization Techniques

Data visualization, or charting, techniques provide a way to visualize data location, variability, and outliers. Three of the most widely used data visualization techniques are:

  • Histogram - a graphical representation of the variability or dispersion of data within a set
  • Frequency distributions - a graphical representation that plots how many times each unique data point value in a data set appears
  • Plot (box, scatter, bar, diagnostic. and others) - graph that plots the value of each data point against two independent variables located on the x and y axis of a chart.

Use Comparison to Analyze Patterns and Differences

By analyzing the patterns and differences in the sample data, generalizations can be formed regarding the relationships or associations among data sets and data points. This generally involves aggregating (or summarizing) the data along several different dimensions (or measures) then looking for patterns and differences.

Summary

The goal of fact-based decision making is to improve the quality of decisions through the application of precise analytic models. The business analytic techniques used to find initial patterns and differences fall within two areas of descriptive statistics: numerical and visual. Three widely used numerical techniques are data location, variability, and outliers. Three commonly used visualization techniques are histogram, frequency distribution, and plots.

Emerging Technologies Push Business Analytics Towards Tipping Point

The Challenge of Emerging Technologies In my prior post, I discussed the weaknesses of the current b

The Challenge of Emerging Technologies

In my prior post, I discussed the weaknesses of the current best-practice approach to business analytics. This post expands on that discussion within the context of emerging technologies.

Best-practice architectures have weaknesses relative to the current business environment. However, that environment is rapidly changing. Emerging technologies require business to create processes for increasingly decentralized information sources. Existing architectural weaknesses combined with the demands of emerging technologies are pushing current best-practices architecture toward a tipping point.

The Tipping Point is Near

Competitive Advantages Realized by Early Adopters Will Create a Tipping Point

The tipping point will be reached when early adopters implement solutions that overcome the weaknesses of the current best-practices architecture and also address the need to operate in an increasingly decentralized information environment. Those businesses that successfully move beyond this tipping point will be in a superior competitive position.

Move Your Business beyond the Tipping Point

Business must adopt a new architecture to gather, analyze, report, and act on information. This architecture must overcome the limitations imposed by current best-practice architectures. It must be radically simpler, faster, more powerful, and agile than current best-practices architecture to meet this fundamental business need.

What are some of the requirements of this new architecture? Here are a few:

  • Eliminate impediments that prevent business from rapidly gathering data required to drive current, critical decisions.
  • Remove the need for an operational data store through adoption of solutions capable of incremental loading of required data.
  • Real-time, or near real-time, processing and analysis of incrementally acquired data.
  • Eliminate the requirement to store analytical data in intermediate, schema bound data repositories.
  • Take advantage of increased memory and processing power to expand the capabilities of in-memory analytics.
  • Eliminate the time and expense of custom coded middle tier business logic.
  • Perform sophisticated mathematical and statistical in-memory flat and hypercube data sets.
  • Enable business to implement business logic for management of decentralized business processes and workflows.

Evaluation of New Solutions

Any evaluation of solutions to meet these and other requirements must include the explicit recognition of point solutions developed to redress narrow limitations of current best practices architecture. These point solutions often fail to address underlying weaknesses as whole.

 

 

Beyond the Current Best-Practice Approach to Business Analytics

The Approaching Business Analytics Tipping Point The competitive position of business is greatly imp

The Approaching Business Analytics Tipping Point

The competitive position of business is greatly impacted by analytics; that is, the ability to gather, analyze, report, and act on information. The impact of analytics creates competitive forces that drive technological change. Over the past two decades, those forces have driven business adoption of core best-practice architectures that are rapidly approaching a tipping point. Competitive pressure, inherent architectural weaknesses, and newly emerging technologies will combine to compel business to adopt a new architecture. Some points to consider are:

  • Business has the basic need to gather, analyze, report, and act upon information.
  • This need is driven by competitive forces that also drive the adoption and growth of new technologies.
  • Competitive forces drove the evolution of the current best-practices architecture over the past two decades.
  • Nearly all current best-practices architecture follows a similar fixed implementation pattern.
  • Post implementation, the fixed nature of best-practice architectures create increasing problems which accumulate over time to add to size and complexity of the solution.
  • Competitive forces are driving the adoption of emerging technologies that compound the weaknesses of current best-practices architecture.
  • A technology tipping point is rapidly approaching; it will create significant competitive advantages for businesses that strategically move beyond current best-practice architectures.

Analytics - a Basic Business Need

Driven by competitive forces, all businesses have the basic need to gather, analyze, report, and act on information. The fundamentals of this need remain nearly constant. However, competitive forces also drive the adoption of new technologies which change the volume and type of information business must act on. The evolution of technology over the past two decades has resulted in a current set of core best-practice architectures employed to meet this basic need.

Variants of the current core best-practice architectures follow a similar implementation pattern. This pattern usually involves implementation of a solution to:

  • Identify, extract, and store fixed sets of data required for analysis
  • Create and load operational data stores, data marts, or data warehouses to store required data
  • Design business logic to access and process these fixed data sets
  • Develop static or ad-hoc reports and dashboards against these data sets

Weaknesses of the Current Best-Practices Architecture

 

Cumulative Change Adds Size and Complexity

Post implementation this best-practices architecture must be modified in response to changing business needs. Generally, each modification requires process changes and increases the volume of information. These cumulative changes add to the size and complexity of the implementation and frequent optimizations become required to maintain acceptable response times.

 

Size and Complexity Creates Business Frustration

Over time, frustrations with slower response times spur the creation of ad-hoc applications and local data marts to meet business exigencies. Point solutions have emerged to cope these with these frustrations. These include new software products and modifications to existing ones. However, these point solutions fail to address the underlying weaknesses in current best-practice architecture.

 

The Challenge of Emerging Technologies

Emerging technologies are another reason business must adopt a new analytics architecture. Read More.

 

 

 

Four Key Benefits of Data Integration for Investment Advisory Firms

Data is an Important Resource for Investment Advisory Firms Investment professionals, like so many o

Data is an Important Resource for Investment Advisory Firms

Investment professionals, like so many of us, often find it difficult to tell the difference between ‘information' and 'data'. However, the business impact of new data sources available to the financial services industry make it imperative to understand the difference between these terms. Information, such as client reports, portfolio metrics, and dynamic portfolio recommendations, is the primary product produced by most advisory firms. Further, data, including market, client, and economic, is the raw material used to produce it. Therefore, data is one of the most important resources owned or used by an investment advisory firm.

The level of data integration employed by a firm is a good measure of the efficiency with which a firm produces their service. It may also be a good indicator of an advisory firm's client service, specifically the quality. This makes the level of data integration an important indicator of a firm's current and future health and profitability.

Unfortunately, merely mentioning the words data or integration is often enough to make many financial advisor professionals abruptly refer you to the nearest available IT employee. I am, of course, exaggerating to make a point, which is: investment advisory professionals should pay close attention to discussions of data and integration. Outlined below are four key reasons data integration deserves attention in financial services:

Four Key Reasons Data Integration Benefits Investment Advisory Firms

1. Increased New Business Opportunities

More often the competition for new business is won by firms able to create information that serves client needs as understood from a complete 360° view of a client. Since the competition among firms for assets is a zero-sum game, information provided to clients based upon a complete and accurate view of their needs not only win the day for the advisory service but also positively impacts client retention rates. Firms that have efficient data integration capabilities will be able create and provide a better information service to their customers. That is, they will be able to offer timely, high-quality information that is uniquely tailored to the needs of their current and prospective clients. To generate this information, these firms will need to efficiently integrate data from a large number of heterogeneous sources. These include internal operational data, third-party data, and newer data sources such as social media.

2. Increased Efficiency and Decreased Costs

There are two primary service delivery approaches implemented by investment advisory firms: best-of-breed and end-to-end. Each of these approaches creates unique integration complexities that increase costs - including data duplication, required customization, and process inefficiency.

Data is a perishable commodity - it’s a raw material that possesses a limited shelf life. Therefore, rapid and efficient access to timely data reduces waste. Over time, the logistics of internal data integration have been made more complex due to data duplication arising from siloed best-of-breed applications or inflexible end-to-end solutions. The best-of-breed and end-to-end solution approaches have made internal data integration more difficult and complex. The additions of external data sources, such as social media, add a new layer of complexity on top of this. Therefore, front, middle, and back-office technology don’t easily translate into increased efficiency. More efficient use of existing technology primarily means more efficient data integration.

Data integration remains a complex process, so a “light weight,” low cost, and efficient data integration solution managed with proper skill and expertise are needed to overcome this complexity. Such a solution, implemented as an overlay on existing technology, will greatly decrease costs.

3. Decreased Risk Exposure

Data integration efficiency directly impacts a firm’s risk exposure. Complex integration, outdated data, and poor data quality lead to the production of inconsistent, outdated, or inaccurate information – garbage in, garbage out. This greatly increases risk exposure.

Decreasing risk exposure requires timely and accurate data to be delivered both internally and externally. This enables informed decisions on the part of everyone involved in the delivery process. It also insures that processes are consistent, which results in compliant information provided to clients and regulators.

4. Increased Product Quality

Investment advisory firms’ primary product is information, and the quality of that product is directly impacted by the value and relevance of the underlying data.

Increased product quality requires advisory firms to have timely and efficient access to the most up-to-date and accurate data. In addition, product quality is increasingly defined by how well information provided to clients reflects their unique personal circumstances. This requires integration of data that is unique and customer centric. Data integration flexibility, therefore, is becoming extremely important

Data integration has a profound impact upon the quality of product produced by investment advisory firms. Moreover, product quality will increasingly be measured by how well a firm delivers information that is tailored to unique client needs. As clients begin to expect information tailored directly to their needs (think Amazon), efficient, high value data integration services will become more of a business critical function for investment firms.

Data Collection and Assembly - An Improved Approach for Business Analytics

Diminishing Marginal Returns of the Current Best Practice Approach to Business Analytics As I discus

Diminishing Marginal Returns of the Current Best Practice Approach to Business Analytics

As I discussed in my prior post, the marginal returns of the current best-practices approach to creating business analytics solutions diminish over time. This is because:

  1. New operational demands cause performance degradation
  2. The accumulation of process dependencies
  3. ETL processes become more rigid and complex
  4. Metrics are not static, so incorporating new data is extremely important
  5. However, incorporating new data becomes increasingly difficult over time
  6. Touch points increase over time
  7. Incorporating changes into logical and physical schema becomes more difficult as touch points increase
  8. Data access logic becomes more complex as logical and physical schema change

Streamlined Business Analytics Data

Here are five suggestions for improving and streamlining the data collection and assembly process:

  1. Minimize data assembly tasks for rapid analysis and evaluation of metrics
  2. Use incremental data acquisition to eliminate the need for intermediate data stores
  3. Use a workflow centric approach for rapid incorporation of new analytical data
  4. Avoid pre-configured, schema bound data cubes for faster assembly and access to analytical data cubes
  5. Use a configure not-code approach to reduce or minimize the hand coding of business logic

A New Approach to Business Analytics Solution Development

An in-memory approach combined with virtual creation of dimensional data provides a platform that enables rapid creation of business analytics and business intelligence solutions. Using the metric simplifies the creation of

My prior post walked through the tasks required to support this case study metric:

the dollar change in total sales during the test quarter compared to the prior five quarters.

Here is the sample presentation of the information required to evaluate this metric:

business analytics metric display

The In-Memory Approach

Here is a simple step by step walk through using a flexible in-memory approach that produces the same result in a small fraction of the time:

Step by step walk through of the in memory approach to business analytics.

When compared to fixed, schema bound methods used by data warehouse processes such as dimensional modeling and ETL, the approach shown above offers myriad advantages. Some of these are:

  1. Direct, incremental import of raw data sources with no intermediate staging
  2. Work natively with in-memory denormalized data
  3. Add dimensions, transform, enrich, and perform operations on the data in-memory
  4. Dynamically create in-memory hypercubes based upon selected dimensions
  5. Expand analysis cube by adding or modfifying facts derived from raw data, calculations, or transformations

Collecting and Assembling Data - Part 2 - Business Analytics Case Study

Assembling Required Data to Support Metrics Most non-trivial metrics used for business analytics are

Assembling Required Data to Support Metrics

Most non-trivial metrics used for business analytics are dependent on dimensional data. This installment of the business analytics case study continues the prior discussion of data collection and assembly. In prior posts, I discussed defining metrics and identifying required data. Once required data has been identified, its collection is merely a technical problem. That is, it must be located and physical access to it ensured.

Once data has been collected, it must be assembled to support metrics required for analysis and evaluation. The assembly of data often requires its transformation from a normalized to dimensional form. Unfortunately, this transformation can become a needlessly expensive and resource intensive step.

A Data Assembly Example Using a Single Metric

To illustrate the challenges of assembling data to support metrics, I will use the data requirements of business analytics case study metric discussed here.The example metric is: "the dollar change in total sales during the test quarter compared to the prior five quarters". The data required to support this metric is listed below and labeled as either fact or dimension.

  1. Dollar change in sales (fact)
  2. Total sales (fact)
  3. Quarter (dimension)
  4. Prior five quarter (dimension)

The Starting Point - Insufficient Data

My prior post on data collection and assembly, I introduced the four data collections and combined those sets into a single data collection. The result is shown below:

denormalized data

It is noteworthy that the above table contains no facts or dimensions required for evaluation of this metric. Nor are those facts or dimensions available in other case study data sets. This is not unusual. Facts and dimensions must often be created from the required raw data through operations that transform or enrich the data.

Information for Decisions Makers

To evaluate the example metric, decision makers require a view of information similar to the one shown below:

table of metric values for business analytics

The above information must be derived from available data. That is, we will need to derive time periods, time series, and perform other calculations to obtain this information. In addition, these operations must be performed quickly and accurately so required information is delivered to decision makers in a timely manner.

Data assembly remains one main obstacles to achieving the fundamental goal of timely, accurate information delivery to decision makers. The reason for this can be attributed to the continued wide acceptance and use of best-practice architectures for development and delivery of business analytics solutions.

Metric Evaluation Using Dimensional Data Models and Data Warehouses

The goal is to derive the information needed to evaluate the metric from the required raw data. How would this goal be achieved using the current best-practices architectural approach. Here is a brief outline of the steps involved:

  1. Identify, extract, and store fixed sets of data required for analysis based upon a pre-defined logical dimensional data model.
  2. Create the required physical data stores which might include a staging repository and data warehouse.
  3. Employ ETL processes to populate a data warehouse with the required data
  4. Design business logic to access and process these fixed data sets
  5. Develop static or ad-hoc reports and dashboards against these data sets

To evaluate our example metric using this approach, we would first employ steps one and two above. That is, create a logical data model to represent the required facts and dimensions. Even a cursory review of this topic is well beyond the scope of this post. That said, I have created an extremely simple snowflake schema to represent the metric. See below:

Snowflake schema

Admittedly, this is not much of a snowflake schema.

Information for Decision Making - Logical, Physical, and Presentation

The snowflake schema places the facts in a center table and the dimension tables radiate around it (thus the snowflake). The purpose is to model a business process at a particular level of granularity. Here, the sales process is represented at the granularity of an order record. All the data operations (i.e. grouping and aggregation) are represented in the order fact table. At the physical level, this unburdens transactional systems and provides for very fast query times against large data sets. In addition, it has relatively more flexibility so new data can be added and additional operations performed.

The design of logical and physical data model are only the first two steps in this process. In addition, we must develop ETL processes to load the required data. Then, a data access layer must be created and implemented. The data access layer will query the data warehouse and supply required information to decision makers via sets of reports and/or dashboards.

A Process in Need of Improvement

The marginal returns of this architectural approach slowly diminish over time. New operational demands cause performance of the original design to degrade slowly. This is process is exacerbated by dependencies among required steps. In addition, ETL (extract transform load) processes also become more complex, as new data must be added and old data removed. Will this metric be accurate two years from now? Changes to logical and physical schema impact more and more external touch points. For example, new users, new reporting systems, and re-purposing increase complexity. Data access logic and reporting can also become increasingly rigid. For instance, new reports often conflict with facts or dimensions that are no longer used.

How to Improve and Streamline Data Collection and Assembly

Here are five prescriptions for improving and streamlining the data collection and assembly process:

  1. Minimize data assembly tasks for rapid analysis and evaluation of metrics
  2. Use incremental data acquisition to eliminate the need for intermediate data stores
  3. Use a workflow centric approach for rapid incorporation of new analytical data
  4. Avoid pre-configured, schema bound data cubes for faster assembly and access to analytical data cubes
  5. Use a configure not-code approach to reduce or minimize the hand coding of business logic

In my next post, I will demonstrate a technology platform that implements these prescriptions and enables dramatic improvements to the data-driven decision making process

Collecting and Assembling Data - Business Analytics Case Study

Overview of Collecting and Assembling Required Data As discussed in my last post, the steps in the

Overview of Collecting and Assembling Required Data

As discussed in my last post, the steps in the data-driven decision process follow one another in a logical way. Alternatives represent feasible paths to achieving business goals. Each alternative is evaluated against a set of metrics. Metrics are composed of facts which are associated with certain data set attributes, or dimensions. For some companies, such as Amazon, business analytics are built in to nearly all their business processes. However, Amazon was afforded the luxury of building analytics into their core business processes from the ground up. Most businesses must overlay analytics on top of existing processes and therefore face a number of challenging hurdles. Data is one of the most prominent.

Data Hurdles

As stated above, most businesses have existing processes and supporting systems that present organizational and technical impediments to collecting and assembling data. However, the competitive advantages realized by companies like Amazon have spurred an increasing number of businesses to move towards a data-driven decision model. The adoption of data-driven decision making can be highly disruptive for many businesses. One of the principal reasons for is the technical hurdles businesses face collecting and assembling data. These data issue are one main reason initiatives fail. Data collection and assembly are often the costliest part of any business analytics.

Information and Data

This post uses the goals and data sets from the case study to examine some of the fundamental issues presented by data collection and assembly.

In my prior post on identifying required data, I discussed the dimensions and facts, or metrics, that will be used to evaluate alternatives. Data-driven decisions often require the integration, aggregation, and summary of data from many disparate sources. Actionable information is created when data is properly aggregated and summarized.

Once created, this information must be presented in a format that eases analysis and evaluation. In our example, the proposed metrics require data on sales, profit, product margin, and number of customers be aggregated then summarized by quarter; the summary results are illustrated below.

Data required for analysis of the business analytics case study decision

Example Data Collections

Sources of decision data could vary from a single, well organized relational database to files and spreadsheets spread across departments and locations. In this part of the discussion, I will focus on four data sets. The goal is to collect and assemble data so the facts and dimensions required to support the metrics are easily assembled. For purposes of illustration, I will assume the summary data shown above were created from four sets of raw data. These are represented below:

Raw data sets required by the business analytics case study.

The above diagram depicts four related data collections: orders, order details, products, and customers. These data collections could be drawn from a number of different sources including a highly normalized relational database, individual text files or a set of spreadsheets. In the latter two cases the data sets might need to be retrieved from different departments, locations or systems. It is very easy to become mired in the data collection process and lose sight of the real goal which is the accessing and evaluating the required dimensions and facts.

Data Collection and Assembly - A Simple Illustration

Remember, our goal is to derive the metrics that will be used to evaluate our alternatives. The fact that those metrics are formed from facts and dimensions drawn from various collections of data is merely a technical consideration. Unfortunately, this technical consideration often becomes the major focus of business analytics projects. Therefore, the creation of dimensional data models and construction of data warehouses all too often become the major focus of business analytics and, unfortunately, a major impediment to success.

In this case, I will focus upon the desired result: a simple, flat data structure that is easy to store, access, and operate upon. The following diagrams illustrate the process of creating this flat data structure using the data collections shown above.

Step 1 - combine the orders and order details data sets

In this step, we combine the order detail and order records by adding the order detail customer and product id fields to the orders data.

Business analytics case study - step 2 creating flat data representation

Step 2 - combine the product data with data set from step 1 above

Next we combine the product data with the data set created in step 1. In this case the product id from the products data set is dropped since it is already included. This results in a combined data set as depicted below.

Business analytics case study - step 3 creating flat data representation

Step 3 - combine the customer data with data set from step 2 above

Finaly the customer data is combined with the data set created in step 2. This provides a single, flat representation of the original four data sets.

Business analytics case study - flat data representation

Finally, the customer data is combined with the data set created in step 2. This provides a single, flat representation of the original four data sets. This final data set looks similar to the familiar two dimensional spreadsheet models. This two dimensional data is obviously much easier to work with than our original four data sets. It is also important to note that both the facts and dimensions supporting the case metrics are represented in this structure.

In a later post, I will show how to use this flat data structure to perform operations that derive the facts to support the business case metrics.

Business Metrics and Required Data - Business Analytics Case Study

Overview This is the first installment in a detailed walk through of each step in the case study on

Overview

This is the first installment in a detailed walk through of each step in the case study on data-driven decision making. The case study concerns a fictional online retailer evaluating the impact of product syndication on their sales, profitability, and customer base. My previous posts provided an overview of steps one through six of the data-driven decision process.

This installment provides a deeper dive into analyzing the metrics discussed in step two to identify the data each requires.

Business Goals, Metrics, and Supporting Data

A business goal usually contains verbs that specify direction such as increase, decrease, minimize, or maximize and nouns representing something quantifiable, i.e. sales. An alternative is a way to perform the actions (verbs) in order to reach the quantifiable goal. Metrics are used to measure and evaluate direction and progress. That are generally stated as sum, difference or count.

To illustrate, I will use the first metric defined in the case overview: "The dollar change in total sales during the test quarter compared to the prior five quarters". The measurable components of this metric consist of numeric and time data, as follows:

  • dollar change (numeric)
  • total sales (numeric)
  • test quarter (time)
  • prior five quarters (time)

Metrics, Dimensions, and Factual Data

Each metric represents a measure of some fact which is usually, but not always, numerical data. Facts are groupings, summaries, or aggregations associated with a dimension drawn from a collection of data. For example, one metric identified above is total sales. This is a fact drawn from a collection of sales data. The metric “total sales” is meaningless until we answer the question: What sales should be totaled? The answer to this question is the dimension which provides the required context. As discussed above, the required context is dollar change in total sales over a quarter. So the time period quarter is the dimension and totals sales is the fact.

For each dimension, we must determine if the required is available. For example, the total sale fact requires raw sales data. Here, these are derived from sales order and pricing data collections. To illustrate, here is an order detail record from the sample data:

Business Analytics - Order Record

This record contains a quantity but no price. Therefore, price data must be identified. This is contained in the collection of product data shown below.

Business Analytics - Order Details Record

The dimension total sales therefore requires two data sets: order detail and product pricing data in order to calculate a basic fact. This is shown below:

Business Analytics - Product Record

The Three Remaining Metrics

The second metric mentioned in the case overview is: "gross profit in dollar terms measured over the test quarter compared to the prior five quarters". For each metric, I will omit measures mentioned previously. Here is the numeric and time data that make up the measurable components of this metric:

  • total cost
  • gross profit (total sales minus total cost)

The third metric is "net sales margin calculated over the test quarter compared to the prior five quarters". This metric requires:

  • total sales divided by total cost

Finally, the fourth metric is "the total number of unique customers measured over the test quarter compared to the prior five quarters". This metric requires:

  • a count of unique customer purchases

For example, if customers A, B, and C each made 2 separate purchase transactions each at different times, there are six transactions but only three unique customers.

Summary and Next Step

This simple example is meant to illustrate how the first three steps of the fact-based decision making process relate to each other at a detailed and logical level. Alternatives are merely different routes that may lead to a quantified business goal. An alternative is evaluated against a set of metrics. Each metric contains facts, or measures, that must be analyzed and evaluated along various dimensions drawn from an identifiable collection of data. Once required data has been identified, it must be collected, assembled, and organized to support analysis and evaluation. This is the subject of my next post on collecting and assembling data sets - part one.