Box Plot

Share This
« Back to Glossary Index

Box Plots, also called Boxplots or Box and Whisker Plots, are graphical diagrams to display a summary of data. They were first introduced in 1969 by John Tukey.

Box plots are non-parametric, in that you don’t need to make any assumptions or have any knowledge of the underlying statistical distribution of the data.

The box can be confusing to users who see it for the first time, but once comprehended, it is a popular graph due to its simplicity and ability to convey a lot of information in a small space. They are useful for comparing distributions between several groups or sets of data, as we will show in the example at the bottom of the page. Box plots can be drawn either horizontally or vertically.

To understand the box and the lines (whiskers), let’s start with the top section of the box, and use the following data set with 25 results (sorted from smallest to largest, which is important for creating the box plot).

  1. 126
  2. 132
  3. 138
  4. 140
  5. 141
  6. 141
  7. 142
  8. 143
  9. 144
  10. 144
  11. 144
  12. 145
  13. 146
  14. 147
  15. 148
  16. 148
  17. 149
  18. 149
  19. 150
  20. 150
  21. 150
  22. 154
  23. 155
  24. 158
  25. 161

The top of the box represents the 75th percentile. This is determined by finding the data point halfway between the middle point (#13) and the last point (#25), which would be the data point halfway between #19 and #20, which are 150 and 150. Therefore, halfway between those two numbers is obviously 150.

The line across the middle of the box is the median, which is the 50th percentile. Data point #13 would be halfway between #1 and #25, which lands directly on 146.

The bottom of the box represents the 25th percentile, The data point located halfway between #1 and #13 is data point is halfway between #6 and #7, which would be the average of 141 and 142, which would be 141.5.

To determine the whiskers, we will calculate the Interquartile Range (IQR), and multiply it by 1.5. The IQR is the width of the blue box, which is the difference between the 75th percentile (141.5 and 150), which would be 8.5. Multiply 8.5 x 1.5, and you get 12.75.

To calculate the upper whisker, add 12.75 to the median (50th percentile).

146 + 12.75 = 158.75

To calculate the lower whisker, subtract 12.75 from the median (50th percentile).

146 – 12.75 = 133.25

If there are any data points outside of the IQR range (below 133.25 or above 158.75), then mark those as astericks (*).

According to Wikipedia, there are multiple approaches to calculate or determine the ends of the whiskers. We showed you the 2nd option (in bold). The first graph on this page used a different method, which is why only one asterick (outlier) is present.

  • the minimum and maximum of all of the data
  • the lowest datum still within 1.5 IQR of the lower quartile, and the highest datum still within 1.5 IQR of the upper quartile
  • one standard deviation above and below the mean of the data
  • the 9th percentile and the 91st percentile
  • the 2nd percentile and the 98th percentile

Let’s take a look at an example, where box plots can quickly convey information. If you wanted to compare the number of hours worked each day, you could create a box plot for each worker on the same graph.

For the three workers, you can quickly conclude the following:

  1. Sara works less hours that Seth and Mary, because her blue box is lower than the other two. Her median line is lower, and the entire blue box is lower (which is the middle 50% of her data).
  2. Seth is very consistent around 8 hours a day, since his blue box is the smallest height (more data fits into a smaller range of values, so he has less variation above or below 8 hours).
  3. Seth has more outliers in his data, represented by the astericks. This is because he is more consistent around 8 hours, so anytime he varies more than an hour above or below 8 hours, it is flagged as an outlier (unusual), given his past history of consistency. For Mary, she also has readings in the 10 or 11 hours a day range, but since her hours vary more around 8 hours, they are not deemed to be outliers as often.

There are other interpretations to make, but these stand out at a quick glance.

Using the recommendations in this reference guide, here is a review of the steps to create a box plot from a set of data (using the 1.5 IQR method for the whiskers).

  1. Determine Median (50th percentile) = 146
  2. Determine 1st quartile (25th percentile) = 141.5
  3. Determine 3rd quartile (75th percentile) = 150
  4. Calculate outlier range “whiskers” as (1.5 * (Q3-Q1)) = 12.75 from median (133.25 to 158.75)
  5. Calculate Interquartile Range (IQR) by taking Q3 – Q1 = 150 – 141.5 = 8.5
  6. Draw line through median
  7. Add asterisks if data outside outlier range

You can download this diagram for FREE by clicking on the image, or going to the Boxplot Reference Guide page.

« Back to Glossary Index Download Tooltip Pro