Introduction to Error Bar in Exploratory
Error Bar can be useful when you want to know the variability or uncertainty of the reported values. It can be used to visually understand whether the differences we are observing among the groups (or categories) are statistically significant or not.
I have created a short video to demonstrate Error Bar chart in Exploratory.
If you prefer reading, please read on.When to use ErrorBar?
Here’s a data about the people who died while they were in custodial in Texas over the last 10 years or so.
And I have quickly created a bar chart to show the average age for each ethnic group, African American, Anglo Saxson, and Hispanic.
By just looking at this, we can see that the average age for Anglo Saxon group is older than the others and that Hispanic group is the youngest.
If we change the category (X-Axis) to Sex, we can see Male is older than Female by comparing the two average ages.
So the question here is, can we be confident about these differences? In another word, how are these difference ‘ statistically significant ?’
This is where you want to switch to Error Bar chart type.
The main bars (or boxes) themselves are still showing the average age for each ethnicity group. But we can see the vertical lines at the top of the boxes. These are called Error Bars showing the ‘uncertainty’ or the ‘errors’. In Exploratory, the ranges for the error bars are calculated as either Standard Error of Mean or 95% Confidence Interval (Margin of Error). Here’s an example of how they are calculated for each type, using Age as an example.Standard Error of Mean (SEM) for Age SEM = sd(age) / sqrt(number_of_the_people) 95% Confidence Interval (ME - Margin of Error) for Age ME = qt(95 / 2, n) * SEM
If we want to simply compare the error bars themselves, we can switch the Graph type to Marker .
This helps us to see a clear difference in the average ages among them.
We can also switch the range to Confidence Interval under Range Type.
Now we can see that the bars for African American and Hispanic are slightly overlapping, but the bar for Anglo Saxon is still in a completely different range.
Next, we can assign Sex to Color.
We can see a clear difference between Male and Female within each ethnic group.
If we zero in on Female by disabling Male, we can see that African American’s average age is older than the others.
By looking at the error bars though, it’s hard to say the difference is statistically significant between African American and Anglo Saxon females because they are widely overlapped.
One last thing.
I have filtered the data to keep only these three ethnic groups. What if we compare all the ethnic groups? We can go back to a previous step before the filtering by clicking on the step at right hand side.
Now we can see that wide range of the bars for the groups like American Indian, Asian, Middle East, etc.
If we look at Ethnicity column in Summary view, we can see that they are not many compared to African American, Anglo Saxon, and Hispanic.
When you have small population of the data, the error range becomes wider. Hence, it’s harder to conclude statistical significance for these groups.