Identifying Outliers
Outliers are data values significantly separated from the main cluster, identifiable visually in dot plots, histograms, and box plots — a statistics skill in enVision Algebra 1 Chapter 11 for Grade 11. In {88, 92, 85, 95, 34, 91}, the value 34 is an outlier because all other scores cluster between 85–95. In a week of summer temperatures {28, 29, 31, 30, 27, 29, 15}°C, the 15°C reading stands apart from the cluster of 27–31°C values. Outliers must be examined in context — they may indicate unusual events, measurement errors, or genuinely interesting cases. They can heavily influence the mean.
Key Concepts
Outliers are values that are significantly different from the rest of the data in a set. An outlier appears separated from the main cluster of data points and can often be identified visually in data displays like dot plots, histograms, or box plots. Outliers should always be examined in context to determine if they represent errors, unusual but valid observations, or the most important data points in the set.
Common Questions
What makes a value an outlier in a data set?
An outlier is a value that is notably separated from the main cluster of data. It does not follow the general pattern and stands out visually in data displays.
In {88, 92, 85, 95, 34, 91}, which value is the outlier and why?
34 is the outlier. All other values cluster between 85 and 95, but 34 is roughly 50 points below the next lowest value, clearly separated from the group.
How can you spot an outlier in a box plot?
Outliers appear as individual points plotted beyond the whiskers. Typically whiskers extend to 1.5 × IQR from the quartiles, so any point beyond that distance is marked as an outlier.
Should outliers always be removed from data analysis?
No. First determine whether the outlier is a data error (then correct or remove it) or a legitimate extreme value (then keep it and note it). Removing valid outliers can misrepresent the data.
How do outliers affect the mean vs. the median?
Outliers pull the mean toward them significantly. The median is resistant to outliers because it depends on the middle position, not the actual values. This is why the median is preferred for skewed data.