An outlier is an observation that deviates significantly from most values within a population, often stemming from measurement variability or data anomalies. The presence of outliers can wield a substantial impact on statistical analyses, underscoring the importance of their accurate identification and proper handling.
In Lumenore, a suite of diverse outlier detection methods caters to various scenarios. These methods include the Z-Score or Modified Z-Score method, employed for detecting outliers in a single Key Performance Indicator (KPI) or variable (one-dimensional outlier detection), the Density-based method for detecting outliers using two KPIs (two-dimensional outlier detection), and Time Series Anomaly detection for a single KPI concerning date or time. These methods are elucidated below:
Outlier Detection Methods
The Z-Score, a statistical metric gauging the number of standard deviations an observation deviates from the mean, designates an observation with a Z-Score exceeding 3 or falling below -3 as an outlier. For instance, in monthly sales data, months with exceptionally high Z-Scores can be considered outliers.
The Modified Z-Score method replaces “mean” and “standard deviation” with “median” and “median absolute deviation from the median.” Unlike the traditional Z-Score, based on the mean and standard deviation, the modified Z-Score employs the median and median absolute deviation (MAD). Its advantage lies in being less sensitive to outliers, enhancing robustness.
Time Series Anomaly:
A time series anomaly is a data point significantly diverging from the trend, seasonality, or cyclic pattern of the entire time series. In Lumenore, the moving window threshold method, coupled with the Z-Score, identifies outliers at each window position, denoting anomalies after traversing the entire time series.
How to Perform Outlier Analysis in Lumenore
Step 1: After accessing “Do You Know”, select “Outliers.”
Step 2: Now, click on “Create New Insight Outliers”.
Step 3: Choose a Schema and click “Next.”
Note: Ensure prerequisites (A KPI, Date, and Attribute) for predictive analysis are met.
Step 4: Select the following:
- Select outlier metric: Choose one or two metrics (KPI) for outlier analysis.
- Select outlier attribute: Identify an attribute for outlier identification (e.g., Product, Region, Customer Segment).
- Select outlier bucket: Choose a bucket of observations (e.g., Quarter, Product Category, Country)
- Select algorithm: Pick the appropriate method (Modified Z-Score, Time-Series Anomaly, or Density Based).
- Select threshold sensitivity: Create a limit to decide when data points are outliers. If you set the limit to be less strict, you’ll find more outliers.
- Do you want to add filters? – Optionally add filters based on conditions.
Click “Next” after filling in the required details.
Note: If you wish to apply filter a window for creating filters appears. Establish filters by groups or conditions as needed.
Step 5: Customize insights narration to define variables used and click on “Save” to proceed.
Step 6: Name the insight for future access (default suggestion provided).
For clarity, an insight for outlier analysis has been created using profit (outlier metric), State (outlier attribute), and Order Date-Quarterly (outlier bucket) with the Z-Score algorithm. Red dots in the chart represent outliers.