LOD calc, average, challenge, Workout Wednesday

By: Rosario Gauna @rosariogaunag

In Week 25 of Workout Wednesday, @LukeStanke had prepared one of his Machiavellian challenges and not satisfied, he posted the following image on twitter.  Link to the challenge

What do you think? I definitely think Luke was looking to be excluded from several lists of friends.

In the challenge, Luke asked us to use clusters and it was a subject that I did not know about. Therefore, the first thing I did was look for material on the subject and of course I had no problems in finding what I was looking for quickly. After reviewing the material, I could verify that the hard work of the clusters is done by Tableau. My first plus point.

Now the subject of how to make the graph was missing. The graphics are just about lines and dots, but still I could not decipher the trick?

I read the “spoilers” that Luke gave us and I started to try ideas. However, starting without a defined plan, although in many cases it works very well for me, at other times it takes me through complex paths. After more attempts than I thought, I managed to finish the challenge.

Now when reviewing the final solution, I can see that the steps required are not complex.

When this happens to me, I like to take some time to better understand what the challenge was asking for, what I did and what I can improve.

Step 1: Work the data so that you have two marks for each data. One of the techniques to achieve this goal is to create the “Union” of the data of “Orders” of Superstore with itself.

The “Union” allows us to have two “marks” for each element to be plotted. Why do I need two “marks”? In this exercise, we will use one of the “marks” to graph the starting point of the line (small dot) and the other “mark” to graph the end point of the line (large dot).

To identify whether the data is the first copy or the second copy, Tableau automatically creates a new dimension called “Table Name” with the values “Orders” and “Orders1”.

Step 2: Use the Tableau Cluster functionality.

First, let’s understand. What are the Clusters for?

Clusters help us in the analysis of complex data, by creating groupings of dimensions considering similarities between the analyzed data. In such a way that the elements of a cluster have a more similar behavior among them, than with the rest of the elements of other clusters.

For this exercise, the clusters will be defined with:

• The dimensions of “Sub-Category” and “Region”.
• The values that will be evaluated to find similar patterns are: the average percentage of 2015 profit and the simple average of the 2015 discount.

Step 2.1 Create the “Profit Ratio” variable.

 Profit Ratio = SUM([Profit]) / Sum([Sales])

It should be noted that the average discount that the challenge asks us is a simple average and not the weighted average. Therefore, it is not required at the moment to calculate another variable.

If you want to try the discount amount with weighted averages. Simply calculate the discount value in \$ and then obtain the proportion that represents the sum of the discounts in \$ vs. the sum of sales.

Step 2.2 Create the base graph to define the clusters with the dimensions and “measures” to analyze.

Move to the shelves of:

• Filter ⬅   “Year” and select year 2015.
• Columns ⬅   “Discount” and with the right click change the aggregation to “Average”
• Rows ⬅   “Profit Ratio”
• Detail ⬅   “Sub-Category” and “Region”

Step 2.3 Use the option to create Tableau Clusters.

In the “Analytic” menu select the cluster model and drag the option to the graph. The number of clusters or groupings to be used in the challenge is three.

If you have a different number, you must adjust the number to three clusters.

Step 2.4 Define as a new dimension the cluster of Clusters calculated in the previous step.

Select the newly created clusters field located on the colors shelf and drag it to dimensions. Rename the new dimension as “Clusters 2015”

I want to clarify that this graphic was created exclusively with the purpose of defining the clusters. Once your goal is met, this graphic can be eliminated.

Step 3: Calculate the variable “Year”

 Year = DATEPART(‘year’, [Order Date])

Step 4: Understand two key requirements of the challenge.

Before continuing, let’s review two requirements of the challenge:

• The small dots are for the combination “Sub-category” and “Region”
• The large dots are for the clusters.

In order to present the small dots and the large dots in the graph, we will use the duplicity of the data defined in the first step of the procedure.

We will use the first copy of the data identified in the variable “Table Name” with the value of “Orders” to define the values of the small dots.

And the second copy of the data table will be used to define the values of the large dots.

Step 5: Calculate the variable “Avg Profit Ratio”.To obtain the values of the axis of the “y” of the two dots.

 REQUEST FORMULA For Small Dots The value should be calculated at level: Year Sub-Category Region Avg Profit Ratio= IIF([Table Name]=”Orders”, {FIXED [Year], [Sub-Category], [Region] : SUM([Profit]) / SUM([Sales])}, For Large Dots The value should be calculated at level: Year Clusters 2015 {FIXED [Year], [Clusters 2015] : SUM([Profit]) / SUM([Sales])})

Step 6: Calculate the variable “Avg Discount”.To obtain the values of the axis of the “x” of the two dots.

 REQUEST FORMULA For Small Dots The value should be calculated at level: Year Sub-Category Region Avg Discount = IIF([Table Name]=”Orders”, {FIXED [Year], [Sub-Category], [Region] : AVG([Discount])}, For Large Dots The value should be calculated at level: Year Clusters 2015 {FIXED [Year], [Clusters 2015] : AVG([Discount])})

Step 7: Calculate the variable “Label”, which will help us customize the header in the tooltips.

 REQUEST FORMULA For Small Dots The header includes the dimensions: Sub-Category Region Clusters 2015 Label = IIF([Table Name]=”Orders”, [Sub-Category] + ”  |  ” + [Region] + ”  |  ” + [Clusters 2015], For Large Dots The heading will be the name of the Cluster. [Clusters 2015])

Step 8: Change the aggregation level of the “Avg Profit Ratio” and “Avg Discount” to “Average”

Step 9: Create the graph corresponding to the years 2015 and 2016. Define the type of line graph.

Move to the shelves of:

• Filter ⬅   “Year” and select years 2015 and 2016
• Columns ⬅   “Year”
• Columns ⬅   “Avg Discount” should be displayed with aggregation at “Average level”
• Rows ⬅   “Profit Ratio” should be displayed with aggregation at Average level
• Detail ⬅   “Sub-Category”, “Region”
• Path ⬅   “Table Name”

Step 10: Complement the graph to include a second axis to display the information in dots. Move to the Rows shelf a new copy of the “Profit Ratio” variable and indicate that it is a synchronized dual axis.

Step 11: Change the type of the graph to circles. Position yourself on the “Marks” shelf corresponding to the second axis and select the type of circle graph.

Add new items to the shelves:

• Color ⬅   “Clusters”
• Size ⬅   “Table Name”
• Tooltips ⬅   “Label”

Step 13: Adjust the colors used and formats.

Step 14: Duplicate the graph for the years 2017 and 2018.Just adjust the respective “Year” filter