Category

Outlier Analysis

 

Introduction

An outlier is an observation that deviates significantly from most values within a population, often stemming from measurement variability or data anomalies. The presence of outliers can wield a substantial impact on statistical analyses, underscoring the importance of their accurate identification and proper handling.

Overview

In Lumenore, a suite of diverse outlier detection methods caters to various scenarios. These methods include the Z-Score or Modified Z-Score method, employed for detecting outliers in a single Key Performance Indicator (KPI) or variable (one-dimensional outlier detection), the Density-based method for detecting outliers using two KPIs (two-dimensional outlier detection), and Time Series Anomaly detection for a single KPI concerning date or time. These methods are elucidated below:

 

Outlier Detection Methods

Z-Score:

The Z-Score, a statistical metric gauging the number of standard deviations an observation deviates from the mean, designates an observation with a Z-Score exceeding 3 or falling below -3 as an outlier. For instance, in monthly sales data, months with exceptionally high Z-Scores can be considered outliers.

Modified Z-Score:

The Modified Z-Score method replaces “mean” and “standard deviation” with “median” and “median absolute deviation from the median.” Unlike the traditional Z-Score, based on the mean and standard deviation, the modified Z-Score employs the median and median absolute deviation (MAD). Its advantage lies in being less sensitive to outliers, enhancing robustness.

Time Series Anomaly:

A time series anomaly is a data point significantly diverging from the trend, seasonality, or cyclic pattern of the entire time series. In Lumenore, the moving window threshold method, coupled with the Z-Score, identifies outliers at each window position, denoting anomalies after traversing the entire time series.

 

How to Perform Outlier Analysis in Lumenore

Step 1: After accessing “Do You Know”, select “Outliers.”

Step 2: Now, click on “Create New Insight Outliers”.

Step 3: Choose a Schema and click “Next.”

Note: Ensure prerequisites (A KPI, Date, and Attribute) for predictive analysis are met.

Step 4: Select the following:

  • Select outlier metric: Choose one or two metrics (KPI) for outlier analysis.
  • Select outlier attribute: Identify an attribute for outlier identification (e.g., Product, Region, Customer Segment).
  • Select outlier bucket: Choose a bucket of observations (e.g., Quarter, Product Category, Country)
  • Select algorithm: Pick the appropriate method (Modified Z-Score, Time-Series Anomaly, or Density Based).
  • Select threshold sensitivity: Create a limit to decide when data points are outliers. If you set the limit to be less strict, you’ll find more outliers.
  • Do you want to add filters? – Optionally add filters based on conditions.

Click “Next” after filling in the required details.

Note: If you wish to apply filter a window for creating filters appears. Establish filters by groups or conditions as needed.

Step 5: Customize insights narration to define variables used and click on “Save” to proceed.

Step 6: Name the insight for future access (default suggestion provided).

Step 7: After saving, select “Execute”.

Output (Insights)

For clarity, an insight for outlier analysis has been created using profit (outlier metric), State (outlier attribute), and Order Date-Quarterly (outlier bucket) with the Z-Score algorithm. Red dots in the chart represent outliers.