Property
In general, clustering refers to a set of data points that are in close proximity to each other.
Outliers are data points that notably deviate or “stand out” from the general behavior of the data set.
Once outliers have been identified graphically, the researcher must give justification to treat them as outliers in terms of the context.
Outliers are simply data points that “stand apart from the general trend”, regardless of the reason.
Examples
- On a scatter plot of house prices, most houses in a neighborhood cluster together. A single, very expensive mansion would be an outlier.
- In a study of test scores versus homework completion, most students form a cluster. A student with zero homework but a perfect score would be an outlier requiring investigation.
- A plot of animal weights and lifespans shows a cluster for most mammals. A point representing a tortoise, with its long lifespan and moderate weight, would be an outlier compared to the mammals.
Explanation
Clustering is when data points group together, showing a common trend. An outlier is a point that sits far away from this main group. It's crucial to investigate an outlier, as it could be a mistake or a very important, unique piece of data.