In one of my previous articles, I addressed the question of how granular the analyzed data in retail analytics should be.

There is a tangible cost resulting from the efforts of collecting, maintaining, and continuously analyzing huge granular data sets. Professionals tend to underestimate the effort needed in the data collection and organization phases.

In addition, there’s a trade-off between *precision* and *accuracy*. Often as precision rises, accuracy decreases, and vice-versa. For example, it might be *more accurate* to predict aggregated monthly demand for a specific product across an entire retail chain rather than to predict the granular demand for that product in a specific store on a specific day.

Besides these known costs, one must understand that the conclusions can change dramatically when changing the level of data aggregation. Thus, analyzing the aggregated data can lead to a totally new set of conclusions and can be used in different applications.

Let me use a very simple example based on a Wikipedia’s article on “basket analysis” to illustrate the point.

Basket analysis answers the fundamental question of “which group of items are likely (or less likely) to be purchased together.” For example, one can determine if beer and potato chips or a certain shampoo and a hair conditioner are often purchased together. And if so, by which customers.

Basket analysis provides a better understanding of the individual purchase behavior of the customer at a transactional level. This is the reason why this type of analytics is often referred to as “impulsive customer purchase” — when a customer selects a hair conditioner on the shelf that is located nearby the shampoo, for example.

Basket analysis is well established and is helpful in many applications. For example, it can be used to establish a bundling price, provide personalized shopping coupons or even shed insight on how best to design a planogram.

Figure 1 shows a basket analysis example with five products and eight transactions. Of course, this example is too small to derive any statistical evidence and should be used for illustrative purposes only.

The table shows eight transactions, two transactions per day, for a basket that contains up to five products – butter, bread, milk, beer and diapers. The first transaction represents a purchase of three products – butter, milk, and diapers, whereas the second transaction on the same day represents a purchase of only two products — bread and beer.

Using a conventional basket analysis method, one can begin to understand the relationships between different products. Let’s consider only two patterns (out of hundreds possible) – one between butter, bread, and milk and another pattern between beer and diapers.

- The probability to buy milk is 62.5% since milk was purchased in 5 out of 8 transactions.
- The (conditional) probability to buy milk given that butter and bread are purchased is 100% since in the two transactions that both butter and bread were purchased, milk was purchased, too.

Thus, one can say that there is a *lift* of approximately 40% to sell milk if both butter and bread are sold or that the probability to buy milk is higher if the basket contains butter and bread. These types of insights are important for example in bundling the three products together at a discounted price, or providing personalized promotions to customers that often buy butter and bread but do not often buy milk.

In the second example:

- The probability to buy diapers is 75% since diapers were purchased in 6 out of 8 transactions.
- The (conditional) probability diapers are purchased given that beer is purchased is also 75%, since in 3 out of the 4 transactions that beer was purchased, diapers were also purchased.

Thus, in this case, there is seemingly no lift– implying that there is a very weak correlation between diapers and beer if at all.

Now let’s assume that these two transactions per day are aggregated daily, as seen in Figure 2. Now, one can see a direct correlation between the quantity of milk and the quantity of bread that is sold. Moreover, it seems that the quantity of milk sold does not depend at all on the quantity of butter sold. And a correlation between the quantity of diapers and beer sold is suddenly very evident as well. That is, the aggregation of data not only provided new analytical outcomes, but also challenged some of the insights that were obtained when analyzing a higher-resolution level.

When taking this perspective, one could gain new and different insights at each resolution level.

The above results also immediately raise another question — which resolution should be used? The answer depends both on the data – where we can find more informative correlations and patterns — and on the application used.

If the scenario involves the individual purchasing behavior of customers at a single point in time, then maybe a transaction level should be used. On the other hand, if for instance the intent is to better localize the assortment at a store level or to find operational opportunities for better supply and allocation, then data aggregation would make a lot more sense.

Contact us today to learn more about CB4’s retail analytics solution.