Market Basket, table calc, LOD calc, Workout Wednesday

By:**Rosario Gauna **@rosariogaunag

In Week 14 of Workout Wednesday, @AnnUJackson prepared us a frequency matrix challenge. Link to the Challenge

This challenge reminded me of the week 42 of last year published by @EmmaWhyte, referring to the analysis of market basket. Although the graphs presented in both challenges are different, the basic solution procedure is very similar.

The analysis of market basket is a common request, particularly in certain industries such as online retail, to make suggestions on what other product might be of interest to the customer at the time of purchase.

Next, I will list the main steps required:

**Step 1: Create an INNER-JOIN from the data table of “Orders” in Excel from SuperStore with itself.**

The graph requested in this challenge is a symmetric matrix of frequencies of the number of orders for each combination of Sub-Categories. The dimensions of the matrix show the need to perform an INNER-JOIN of the data table with itself through the “Order ID” dimension.

The variables of the second table used in the JOIN (Orders1) will be differentiated from the original table with the termination (Orders1).

**Step 2: Calculate the number of unique orders that are sold for each combination of Sub-Categories. This is the calculation of the frequency of orders.**

Unique Orders = COUNTD([Order ID]) |

**Step 3: Obtain the average sales for each of the Sub-Categories of the selected combination.**

Average Sales = SUM({FIXED [Row ID] : AVG([Sales])}) / [Unique Orders] |

Average Sales (Orders1) = IIF(ATTR([Sub-Category]) = ATTR([Sub-Category (Orders1)]), NULL, SUM({FIXED [Row ID (Orders1)] : AVG([Sales (Orders1)])}) / [Unique Orders]) |

I believe that the formulas of this step are crucial to achieve the correct numbers, so I will delve into its explanation and its relationship with the operation of the INNER-JOIN.

When using the INNER-JOIN technique of a table with itself, caution should be exercised if the analysis is performed on an aggregate dimension, i.e. a Sub-Category can have several records of different products purchased within the same order or basket and causes the details of the second table of the INNER-JOIN to be repeated as a result of the JOIN.

If the requested challenge were about combinations of products instead of combinations of Sub-Categories, the required formulas would be quite simple because each line represents a product and at the product level there is no duplication. However, in this challenge the analysis is done at the level of Sub-Categories combination.

The previous table shows an example of the JOIN resulting from the Order “CA-2014-103317”. From the Sub-Category “Furnishing” two different products registered in lines 5437 and 5439 are sold and for the Sub-Category (Orders1) of “Binders” a single product registered in line 5438 is sold.

When you want to analyze the sales “Furnishing” with “Binders”, if you take the sales directly from the Sub-Category of Orders1 of “Binders” you would be duplicating the sale. That is, there is a repetition for each different product sold from “Furnishing”. See Row ID 5438 from (Orders1) on the right side in red.

Therefore, to ensure that the information is considered without duplication, you must initially calculate the average sales for each unique “Row ID” with the following formula *{FIXED [Row ID] : AVG([Sales])}*

The next step is to obtain the average of the sales by Sub-Category for the selected combination, which are obtained by adding the average sales for each Row-Id and the result is divided by the number of unique orders for the combination of Sub- Selected categories

**Why do I do the manual calculation of the average and do not use the “Average” formula? **Because if we apply the average formula, the results obtained would be from the average sales by Row-Id, and the challenge requests the average sales per Order of each Sub-Category.

Additionally, to the formula of (Orders1) the condition is included so that the value is null in case the Sub-Category and the Sub-Category (Orders1) matches. On the diagonal, only the average sales are considered.

**Step 4: Calculate the name of the sub-categories in upper case and start the definition of the graph:**

Upper Sub-Category = UPPER([Sub-Category]) |

Upper Sub-Category (Orders1) = UPPER([Sub-Category (Orders1)]) |

Place the variables “Upper Sub-Category” in Rows and “Upper Sub-Category (Orders1)” in Columns (If you prefer you can create the formulas directly on the shelves of rows and columns without having to define them as new variables).

Sort down both variables, by the number of unique orders obtained in step two.

Select Square in the Mark type.

**Step 5: Customize the displayed text, the tooltips and the color depending on whether it is on the diagonal line or outside the diagonal line.**

*A) Calculate the Text label displayed on the diagonal of the matrix*

This label will be displayed only when the selected Sub-Category and Sub-Category (Order1) **match**.

Label Diagonal = IIF(ATTR([Sub-Category])=ATTR([Sub-Category (Orders1)]),[Unique Orders],NULL) |

Move the newly created variable to the Text shelf.

*B) Customize the tooltips of the diagonal vs. the rest of the matrix*

The following variables will only show information when the selected Sub-Category and Sub-Category (Order1) **are different**.

Label SubCategories A & B = [Sub-Category] + IIF([Sub-Category]<>[Sub-Category (Orders1)], ” & ” + [Sub-Category (Orders1)],””) |

Label SubCategories B = IIF([Sub-Category]<>[Sub-Category (Orders1)], [Sub-Category (Orders1)] + “:”,NULL) |

These variables, plus the “Unique Orders”, “Sub-Category”, “Avg Sales” and “Avg Sales (Orders1) should be included in the tooltips shelf.

*C) Customize the color of the diagonal vs. the rest of the matrix*

Color = IIF(ATTR([Sub-Category])<>ATTR([Sub-Category (Orders1)]), [Unique Orders], -20) |

The challenge asks us to use the purple color in the cells of the matrix that are outside the diagonal line and to use the darker tonalities for the frequency of orders of greater value.

In my first version I used the blue color, I like the blue color. Now I will use the purple color.

We need a little trick to achieve this.

The first step to follow is to select the Red-Blue-White Diverging key.

Do not think that I forgot what I just told you or that it is a joke, we are going to present the matrix in purple.

The Red-Blue-White Diverging selection is only one step to achieve the purple color, as there is no predefined purple diverging option that you can select.

The next step is to change the color blue by purple, for which, it is simply required to double-click the blue color and indicate the color that will replace it (I used the color “990099”)

Finally, in the advanced color options, the center is set to a value of -20, which is the value that the formula establishes for all the cells in the diagonal, to force the value of -20 to correspond to the color of white.

**Step 6: Get the combination with higher average sales per order and customize the Text and tooltips.**

Customize the Text shelf to include a point, in the highest average sales cell.

Label Dot = IIF(WINDOW_MAX([Avg Sales] + [Average Sales (Orders1)])=[Avg Sales] + [Average Sales (Orders1)], “●”, “”) |

Customize your tooltip to include a legend, in the cell with the highest average sales.

Label Dot 2 = IIF(WINDOW_MAX([Avg Sales] + [Average Sales (Orders1)])=[Avg Sales] + [Average Sales (Orders1)], “highest average sales per order”, “”) |

** ****Step 7: Adjust formats. And ready!**

I hope you find this blog useful.

If you have any questions, do not hesitate to contact me on Twitter (@rosariogaunag)

Regards!

Rosario Gauna