Lesson video

In progress...

Hi there.

My name's Ms. Lambell.

You've made such a fantastic choice deciding to join me today to do some maths.

Come on, let's get going.

Welcome to today's lesson.

The title of today's lesson is comparing with box plots, and that's within the unit graphical representations of data and that's cumulative frequency and histograms. By the end of this lesson, you'll be able to interpret a box plot and compare two box plots.

Keywords that we'll be using in today's lesson.

Always worth a quick recap, I think, are box plot.

Remember, this is a diagram that clearly shows the minimum and maximum value of a set of data along with the three quartiles, the upper and lower quartile and the median.

The median is the central or middle piece of data when the data are in numerical order.

It is a measure of central tendency and it represents the average of the values.

The range is a measure of spread.

It's found by finding the difference between the highest and the lowest values.

Now all of those terms should be familiar to you by now, but there's all, like I said, it's always worth a quick recap.

Today's lesson is split into two separate learning cycles.

In the first one, we will concentrate on interpreting a box plot, what's that box plot telling us about that information? And in the second one, we will focus on using it to compare.

That's the most powerful thing about a box plot is that you can compare two sets of data very, very quickly.

Let's get going with the first one.

Like I said, we will be interpreting a box plot.

I'd like you please to have a go at completing the following using the words at the bottom of the page.

And your words are, A, upper quartile.

B, largest.

C, 25%.

D, median.

And E, difference.

Now, you should be familiar with all of this.

I know it's a bit mean to chuck you in straight in the deep end and ask you to do a check of your understanding from previous lessons, but I know that you'll be able to do it.

Don't worry if you can't remember 'cause, obviously, in a moment, we'll go through the answers, but pause the video, give it your best shot, and then come back when you are done.

Well done, thank you for doing that.

The range is the difference, E.

So E should be there.

The range is the difference between the smallest and the, and it should be B in their, largest values.

Anything below the something represents 50% of the data or 50% of the data is halfway through.

And we know that halfway through the data is the median, so should have been D.

Anything below the median represents 50% of the data.

Anything below the lower quartile represents what of the data.

And we know that a quartile splits the data into four equal parts and the lower quartile represents 25%.

The missing percentage there was C.

Anything below the lower quartile represents 25% of the data.

And then obviously we know what's going to go in that final one.

But let's just make sure we understand anything below the something represents 75% of the data.

We know it's, A, upper quartile 'cause that's the only one left.

But remember that the quartile is three quarters of the way through the data, 75% of the way through the data.

Now that's going to be really important for what we're doing today.

So I'm just gonna read through each of those, making sure I read the words rather than the letters.

The range is the difference between the smallest and the largest values.

Anything below the median represents 50% of the data.

Anything below the lower quarter represents 25% of the data.

Anything below the upper quarter represents 75% of the data.

We are going to draw a box plot representing the temperature in 11 cities and we're going to use the following information.

So instead of me giving you the raw data of those 11 cities and asking you to find out the upper and lower and quartile in medians, et cetera, I'm going to give you the information in a slightly different way.

But you are going to be able to draw a box plot than the same.

The lowest temperature was 11 degrees C.

The median temperature was 21 degrees C, 75 of the temperatures were less than or equal to 30 degrees C.

The range of temperatures was 20 degrees C.

The middle 50% of the temperatures had a range of 16 degrees C.

We are now going to use each of those statements to draw part of our box plot.

Let's start off nice and easy then.

The lowest temperature is 11.

So we know the lowest value in our box plot is 11.

So let's mark that in.

The next statement we can see that the median temperature was 21 degrees, therefore we're going to draw a median line in at 21.

Remember the height of that line doesn't matter.

Next one's a little bit trickier.

75% of the temperatures were less than or equal to 30%.

Well, we know that 75% of the data is the upper quartile, so we now know that the upper quartile needs to go at 30.

So let's draw that in.

The range of the temperatures was 20.

We know that the lowest temperature was 11 and we know that the range is 20.

So we know the difference between the lowest and the highest has to be 20, meaning that highest temperature must have been 31.

Let's just check 31, subtract 11 does give me a range of 20.

So there is our highest value.

And then the middle 50% of the temperatures had a range of 16.

Remember, our box represents the middle 50% of the data.

We already have our upper quartile at 30 and we have our median at 21.

So we now need to identify where the lower quartile is going to be.

The difference between the upper and the lower quartile is 16.

So just as I did with the range, I now know that my lower quartile is at 14.

I can then complete my box plot.

And there's quite a lot to go through there.

Just make sure that you are happy with that.

If not, rewind a video and then move on when you are happier.

We'll take a look at another example though, so don't worry.

This is the data representing heights of members of a running club.

The tallest member was 1.

89 metres.

50% of the heights are below 1.

65 metres.

25% of the heights were greater than 1.

71 metres.

The range of heights was 68 centimetres and the middle 50% of the heights had a range of 0.

25 metres.

Again, we'll go through this step by step.

The tallest member was 1.

89 metres.

Now, before placing that onto our graph or our box plot, I should say, it's really important.

We've interpreted the scale and we can see that every five squares is representing 0.

That means that each square is representing 0.

02.

So 1.

89 is here.

50% of the heights are below 1.

65.

That means that the median is 1.

65.

Moving on, 50% of the heights are below 1.

65 metres.

We know that the 50% of the heights is going to be the median, so we can draw our median line in.

25% of the heights were greater than 1.

71 metres.

So 25% of the heights were greater.

That means 75% were less than.

This means then that the upper quartile is 1.

71 metres.

Let's draw that line in.

We know that the range of the heights was 68 centimetres.

Now we need to take care here because we've got the height of the tallest member in metres and this is in centimetres.

So I am going to change 1.

89 metres into centimetres, which is 189 centimetres.

And then I'm going to work out where my lowest value is going to be by subtracting the 68.

So if I take my 1.

89 or 189 centimetres, subtract 68 centimetres, I end up with my lowest height being 1.

21.

The middle 50% of the heights had a range of 0.

25.

We know that the upper quartile was at 1.

71.

We need the range to be 0.

25.

That means my lower quartile is going to be at 1.

46.

And then we complete the box plot.

Have a go now for me please at this check for understanding.

Below is a box plot showing the test scores of a class.

I'd like you please to fill in the gaps.

The lowest score was something marks the something score was 38 marks.

75% of the data is greater than so many marks.

What percentage of the data is 24 marks or less? The range of the middle of 50% of data is how many marks and the upper quartile is how many marks.

Pause the video, get your answers to these and then come and check in with me when you are done.

Great, let's check through those answers then.

The lowest score was six.

We can see that the extreme point on the left of the box plot is six.

The highest score was 38.

So we find 38 on our box plot scale and we can see that clearly the highest is the other extreme.

75% of the data is greater than how many marks.

75% of the data is greater.

That means 25% of the data is less than, and that's the lower quartile.

The lower quartile is at 14.

What percentage of the data is 24 marks or less? So we find 24 on the marks and we can see that that is the line within the box and that therefore is the median.

And then we know that the median represents 50% of the data.

The range of the middle 50% of the data is how many marks.

That's basically the width of the box.

And the width of the box is 14.

It goes from 14 to 28.

So that is a difference of 14.

The upper quartile is how many marks, and we know that the right hand edge of the box gives us the upper quartile and we can read that off as 28.

How did you get on? Well done, fantastic job on that.

Let's now move on.

You now are going to have a go at Task a, so very similar to what we've just been doing.

So you're gonna pause the video and then you are going to find the missing values or words and then you'll come back when you are done.

Well done.

And B, again, I'd like to fill in the gaps please using the box plot about heights.

Super, and then question two.

I'd like you to draw a box plot using the information about the 11 cities and their temperatures.

Great work.

And b, let's finish off for Task A with question 2b.

Again, pause the video and come back when you are done.

Well done.

Let's check those answers.

So we should have, the lowest score was four.

That was the extreme point to the left.

The highest score was 34.

We could see that the extreme point to the right was at 34.

So that's the highest.

75% of the data is greater than 16.

So that is the lower quartile.

50% of the data was below 22 marks, 'cause we can see that that is the line within the box, which is the median.

And we know that that's 50% of the data, the range of the middle 50% of the data, basically the width of the box, is 10 and the upper quarter was 26.

We can just read that off.

We know that that is the right hand side of our box.

And onto 1b, the shortest person was 1.

27 metres.

The tallest person was 181 centimetres.

I hope you noticed there that I've given you that in centimetres and you'd done that conversion.

If you put 1.

81, you just needed to check carefully what units we wanted the answer in.

25% of the data is greater than 1.

66 metres.

50% of the data is greater than 152 metres.

The range of the middle 50% of data is 0.

24 metres and the lower quarter was 142 centimetres.

Now because that was the second question, I did put in some questions there where I was switching between centimetres and metres.

So I'm hoping that didn't trip you up, but I'm sure it didn't.

Onto question two then.

This is what your box plot should look like.

You should have your lowest value at 13, lower quartile at 16, median at 25, upper quartile at 28, and your highest value at 34.

And then B, your lowest value should have been at 1.

23 metres.

The lower quartile should have been at 1.

47 metres, the median at 1.

59 metres, then the upper quartile at 1.

68 metres, and then the tallest person was 1.

82 metres.

How did you get on? Well done.

Now we can move on then to our second learning cycle.

So we're gonna look now at how we can use our box plots to compare sets of data.

Box plots are an excellent way comparing different sets of data because at a glance we are able to see which of the sets of data has, for example, the smallest value, the highest value, the highest median, the smallest range, the smallest range of the 50% of data.

And remember that's signified by the width of the box.

Below are the box plots of test scores for two different classes.

What can you tell from these box plots? Pause the video and write down as many things as you can that you can tell me about these two different classes using those box plots.

Pause a video and come back when you've written down lots of wonderful things for me to see Well done.

I'm gonna go through some of these now and then what I'd like you to do is to check them off your list and then I'll also ask you questions at various points.

Class B did better on average.

How do we know that Class B did better on average? The reason we know that Class B did better on average was because their median was higher.

If we look at the line within the box, that's the median, and Class B, we can see it was higher.

For Class B, the median was 24.

And Class A, the median was 22.

So on average, Class B did better.

The middle 50% of the scores for Class A were much more consistent.

I'm wondering if you wrote that down.

How do we know that? And we know that because the box is narrower for Class A.

Class A, we can see the box is narrower, meaning that's showing more consistency for that middle 50% of the data, the range of the middle 50% of data for A was 10 and for B it was 14.

So we can see that A was more consistent.

Hopefully you also notice that Class B had the highest score outta the two classes.

But how do we know that? And we know that because the right hand mark for Class B is further to the right.

We can see that, for A, the highest score was 34, but for B it was 38.

But we might not need to know exactly what those scores were.

We can just at a glance recognise that Class B had the highest score outta the two classes.

Sometimes it's not obvious from the box plot which set of data has the largest range.

And it certainly looks here very difficult to tell which has the largest range.

So I'd like you please to write down for me what is the range of the scores for Class A? Remember the range is the largest takeaway, the smallest, it's the largest score for the class and the smallest score for the class.

The range then for Class A is 30, the highest was 34, the lowest was four.

That gives a range of 30.

What was the range of scores for Class B? What did you come up with there? Should have got 32.

The highest was 38 and the lowest was six.

Difference between 38 and six is 32.

So the range for Class A was 30 and the range for Class B was 32.

What can we deduce from this? We can deduce the scores for Class A were slightly more consistent than Class B.

It's really important that rather than just stating the two different ranges, we actually talk about what that means.

And the range shows us how consistent the data is.

Have a go at this question.

You can complete the missing information using the box plots.

And these box plots are about house prices in London represented by the purple box plot and the rest of England and Wales.

And that is represented by the green box plot.

Pause the video and then have a go.

Pause the video now and I'll be waiting when you get back.

Great work.

Let's check those answers.

Now we've got lot of missing words in this sentence.

So what's important? The important words here are average.

And we know that in a box plot is the median that shows the average and the median is represented by the line within the box and higher.

So we wanted to know which one had a higher median and that was London.

So houses are more expensive in London on average as their median was higher.

So we don't just say London's median was higher.

We actually state what this is telling us.

And this data is about house prices.

So our answer must refer to house prices.

Something had the cheapest house overall.

So if we're looking with the cheapest house overall, we're looking for the point that is furthest to the left.

And we can clearly see that's the rest of England and Wales.

Something house prices were less consistent.

The range is our measure of consistency.

Which box plot was the widest? And it's really obvious from this box plot to see that London, London had a very wide range of house prices, which we probably would expect wouldn't we? So London's prices were less consistent and that's because they had a larger range.

Often, you'll be given a box plot and some information and asked to draw a box plot and then make some comparisons.

Now it's really difficult to make comparisons currently.

So what we're going to do as the questions asked us to do anyway, is to draw a box plot.

Here is my box plot for Class A.

And you're more than familiar with drawing box plots.

So you know how I've drawn that.

When we compare data, we must comment on two things.

The first of those is the average, and that's represented on a box plot by the median.

And the median is the line within the box.

We should also compare using the range, so the range of the data, the difference between the highest and the lowest.

And remember that's represented by the width of the entire box plot.

Looking at these two box plots, we are able to say that on average Class A did better as their median score is higher.

So notice I don't just state the medians, I don't also just say Class A's median is higher.

I have to make sure that I'm interpreting the actual data.

And the data here is about test scores.

This means that Class A did better as their mean score was higher.

So I've said who was better and I've given a reason why I know that.

The scores of Class B are less consistent than Class A as their range is larger.

We can see that the box plot for Class B is spread over a wider range than Class A.

Again, it's really easy to see this once we've drawn a box plot.

The middle 50% of data of Class A is more consistent than Class B as the box is narrower, showing more consistency in their scores than B's.

So we could also consider the range of the middle 50% of data rather than the range of the whole of the dataset.

Wow, onto Task B already.

I'd like you please to pause the video and fill in the missing gaps.

Good luck, and then come back when you're done.

Well done.

And question number two.

You're gonna compare the data represented on these pairs of box plots.

And this is a time that pupils could hold their breath for.

Really important you are specific about what the data is telling you and how you know that.

What part of the box plot did you use? Pause the video and then come back and we'll check those answers for you.

Well done.

And part b, the masses of apples from two different shops.

So again, you are comparing the two sets of data and I'm expecting two comparisons, remember.

Well done.

And three, this time I'd like you please to draw the box plot and then make a comparison between Class A and Class B.

Pause the video and then come back when you are done.

Well done on those.

Let's check those answers for you.

One, the average maximum daily temperature was higher or you may have put hotter in the 2000s on average, as the median was higher.

The range of the middle 50% of temperatures has not changed as the boxes are the same width.

The 1960s had the lowest overall average maximum daily temperature.

And overall we can see that the average maximum daily temperatures have risen or you may have chosen to write increased.

Question 2a, you should have something along these lines.

On average, the pupils in Class B could hold their breath for longer as their median was higher.

Class B were much more consistent in their times as the box.

The middle 50% is narrower, so you should have referred to the range and the medium.

B, on average, the apples from Shop B were heavier as their medium was higher.

And the apples from Shop A had more consistent masses as their range is smaller.

And then on to question three, you needed to draw the box plot.

So pause the video, check your box plot, and then these are your two comparisons.

On average, Class A did better as their median was higher.

The middle 50% of data of Class B was more consistent than Class A as the box's narrower, showing more consistency in their scores than A.

Now let's summarise our learning from today's lesson.

We know that box plots are an excellent way of showing data, but they're even more powerful when we're comparing sets of data.

It allows us very quickly at a glance to see which set of data has the smallest value or the highest value, or the highest median, the largest range, or maybe the smallest range of the middle 50% of data.

So it's a really great tool for allowing us to make comparisons very, very quickly.

Remember we're making those comparisons to refer to the median and also the context of the question and the middle 50% of data or the range and how that is affecting the consistency.

And again, referring this back to the data.

Thank you so much for joining me and I hope to see you again really soon.

Take care of yourself and goodbye.

I've finished the video