video

Lesson video

In progress...

Loading...

Hello everyone, my name's Mr. Gratton and thank you so much for joining me for another maths lesson.

In today's lesson, we will be comparing across three summaries of a dataset, the mean, the median, and the mode.

And building an understanding of the different insights each give about a dataset.

Pause here to familiarise yourself with the definitions of data, the mean, the median, the mode, and central tendency.

Before we look more into the insight that each summary gives us, let's practise calculating the mean, median, and mode together.

As I've already introduced all three of these averages, the mean, median, mode, are methods of summarising a dataset usually, but not always, using a single value.

The mean is the sum of all of the data points in a dataset divided by the number of data points.

Whereas the median is the middle value after you've put the dataset into order of size.

If there are two such middle values, you can add 'em together and then the sum of those two values is divided by two.

And lastly, the mode is the most frequently occurring data point in a dataset.

Whilst it is not necessary to put the dataset in order of size to find the mode, it is helpful, since all of the data points with the same value are grouped together, and so, easy to spot their frequencies.

Calculating one of the averages of a dataset gives us such a different insight compared to looking at the raw data.

Actually, it's very hard to find any insight by just looking at a bunch of numbers.

It gives us a far more direct insight on a dataset than trying to interpret a graph or a chart, such as this dot plot.

Calculating an average is a direct, straight to the point way, of summarising a dataset by one or a few values.

For example, the mean of 24.

45 visually is around here, but what does it mean? Well, a mean is a typical value.

This means if we were to take all of the values and all the data points and add them together, and then share them equally, each data point would have a value of 24.

45.

Usually a mean is either central to a dataset, or close to values of a dataset that are highly populated or highly frequent.

Okay, let's have a look at an example.

The Oakfield Academy basketball team played six matches, and these were their scores per match.

In order to calculate some of the averages, specifically the median and the mode, we can put these in order of size.

So these are the exact same scores, just now smallest to largest.

To calculate the mean, we can add together all six scores, and then divide by six because there are six data points, in this case, six matches.

The mean is 13.

This means that the mean average, or typical amount of points they scored each match is 13.

In order to find the median, we first of all have to put the numbers in order of size.

Thankfully, we've already done that.

Next up, we look to the middlemost number in the dataset.

In this case, there are two middlemost numbers.

In order to calculate the median, I add both of those numbers together, so 12 and 14, the answer to that is 26.

I then half that total to get 13, the median.

Using the median, we can also say that the basketball team scored on average 13 points each match.

So, does that mean that every average will give me an answer of 13? Whose statement is correct? Andeep says, "To summarise a dataset, I only ever need to calculate one of the averages as they'll always give me the same answer anyway." Whereas Sam says, "Sometimes the mean will give me a different answer to the mode or a median." "Each average gives a different summary of the data." Let's see who's correct by calculating the one average we haven't considered yet, the mode.

Here are the data points again, the basketball scores.

What is the mode of the dataset? Well, mode means the most frequent value, which is 14.

Hang on a second, the mean and the median were 13, and therefore the mode is different in value to the mean and the median.

The mean, the median, and the mode can be the same, but they do not need to be the same.

Quite frequently they will all be different, even slightly different to each other.

Okay, onto a quick check for understanding.

Here is a dataset.

Two of the averages are the same, whilst one of the averages is different from the other two.

By calculating all three of the averages, find out which one of the three is the odd one out.

Pause now to give yourself some time to calculate all three of the averages.

Remember, it is best practise to put a dataset in order if you can.

This will help finding the median and the mode.

The answer is the mode, the mode is the odd one out.

This is because the mean and the median are both 26, whilst the mode is 25.

Next quick check, which of these numbers is the mode of the dataset? There might be an extra word that you want to use when describing the mode of this particular dataset.

Pause now to have a look through each of those options, and think about which extra word you might want to use.

Interestingly, there are two modes to this dataset, 15 and 20.

If a dataset has two modes, it is called bimodal.

You do not find the middle of them, like you do with the median.

You simply keep both modes and say, this dataset is bimodal with 15 and 20.

Onto some independent practise tasks.

I have given you three datasets, each dataset has one of the three averages already calculated for you.

Complete the other two averages for each of the three datasets.

Pause here to give yourself some time to do all six of those calculations.

And onto question number two.

With these three datasets, I've given you none of the averages.

So you will have to calculate the mean, the median, and the mode for all three datasets.

The reason why is because in question three, you need to use your answers to question two.

One of these datasets is different in some way to the other two.

After you have found the mean, the median, and the mode, try and identify which dataset is the odd one out, and explain why you think that is the odd one out.

Pause now to give yourself some time to do both those questions.

Onto the answers for question one.

For dataset A, the median was two and the mode was one.

For dataset B, the mean was 24 and the median was 25.

For dataset C, the mean is 5.

81 and the mode, ah, the dataset is by modal at seven and 8.

5.

And for question number two, here are all the averages.

Pause now to compare all of your averages to the the ones on screen.

Okay, but which one was the odd one out? Well, there are actually many possible responses to this question, depending on what you were focusing on.

Well done if you got any of those answers, or any other reasonable response to finding which one was the odd one out.

For the next two cycles of this lesson, we'll be looking at the mean, the median, and the mode in a slightly different way.

What happens if the average that we've calculated looks different to what we want or what we expect? Is it because we've calculated it incorrectly? Is it even a representative summary at all? And is it even still useful? Let's have a look first at the mean.

The mean can be a useful representation of a dataset if the mean lies close to groups of common or highly frequent values, or if the mean lies towards the centre of a dataset.

In summary, the mean is a very useful representation if it is close to many data points and is close to the centre of its distribution.

Let's look at this practically on a dot plot.

The mean of this dataset is approximately 42.

4.

We know that the mean can be a very representative summary of this dataset, as it fits both of the criteria we just looked at.

It both lies close to other common data points, 42 and 43 are very common data points, and 42.

4 is located very close to the centre of the distribution of the dataset.

Therefore, the mean is a representative summary of this dataset.

Here's an example of some raw data.

To calculate the mean, I add up all of the numbers, then divide by seven because there are seven data points.

The mean of 23.

3 is a representative summary of the data, as it is close to 23, which is the middlemost value of the dataset.

Here's a second example, the mean happens to be between 53 and 56.

Whilst the mean of 55.

2 isn't the most central value, it's still pretty close.

And it is definitely close to other common data points, the 53 and the 56.

Therefore, 55.

2 is also a representative summary of this dataset.

Let's have a look at this third dataset.

Let's find the mean by adding up all the numbers, and then dividing by 10 because there are 10 data points.

The mean is 16.

3, which lies around here on the dataset.

Laura says, "The mean of 16.

3 cannot be correct." It is nowhere near the centre of the dataset, and it's pretty far away from, well, everything except for the 20.

However, Sophia has a different perspective.

She said the mean must be correct because the calculations, as you can see on screen, were done correctly.

And so, who's correct, Laura or Sophia? 16.

3 is the correct mean for the dataset.

There are no errors in the calculations that we've done to get 16.

3.

Even if a mean is correctly calculated, it may not be the most representative summary to summarise the dataset if its value isn't close to the centre, and/or it isn't close to many common or frequently occurring data points.

In this example, the mean of 16.

3 is neither close to the centre of the data, and it's only close to the data point 20, which appears once.

Whilst 16.

3 is a correctly calculated mean, it may not be the most representative summary of this dataset.

Other averages, or other calculations, may be more representative when summarising the dataset in one value.

However, just because an average isn't representative of a dataset, it can still be very useful when understanding the dataset.

For example, with 16.

3 as the mean, it definitely isn't representative, but it is very helpful in showing us how unbalanced the dataset is.

Remember, the mean is the same as saying, adding up the values of all the data points and redistributing them in a way where every data point will have the same value.

After this redistribution, we can see that every single data point, except for the 20, changes in size dramatically.

This shows how unbalanced the dataset is, and so is still very useful when understanding the properties of a dataset.

Typically, a dataset is less likely to have a representative mean if it has one or two extremely large or extremely small values.

In this example, 27 and 49 are much, much larger than the rest of the data points, which have a value between seven and 12.

The mean of this dataset is around 14.

4, which is above every single data point, except for those two extremely large values.

Right, onto some checks for understanding.

In which of these dot plots would the mean be a representative average or summary of the data? Pause here to think through your conclusions.

The answers are A and D.

For A, 4.

5 is pretty much right in the centre of the dataset and it's close to five, which is a frequently occurring value.

For D, the mean of four is even more representative a summary of the dataset, as four is exactly in the centre, and four is the most frequently occurring, or the modal, value in the dataset.

For B, you've got those two extremely large values that make the mean not quite as representative.

Whereas for C, 4.

5 might be close to the centre of the dataset, but it's nowhere close to any highly frequent values in that dataset.

In which of these small datasets would the mean be a suitable and representative summary of the data? Think about where each of the means would go on each dataset, and then think whether its placement and its proximity to other data points make it a representative or unrepresentative summary.

Pause now to have a look through all four of those datasets.

The answers are A and C.

The means of A and C are pretty central, or close to the centre of the dataset, whilst also being very close to frequent data points.

In the case of A, it is very close to the frequent values of four and five.

In comparison, B, whilst fairly central, is very far away from all of the other data points.

In comparison, D is definitely not central, and it is very far away from any of the common data points, 44 and 47.

Okay, what is the mean of this dataset? Sophia says that the mean is 27.

75, however, Laura says that makes absolutely no sense.

27.

75 is bigger than all of the data points, bigger than 22, which is the largest data point.

Who's correct, Sophia or Laura? Laura is correct, and Sophia is most certainly incorrect.

None of the averages can have a value greater than its largest data point, or less than its smallest data point.

In this example, you can never have an average greater than 22 or less than four.

This means that the mean, and the median and the mode as well, must all lie somewhere between four and 22 in the range of that dot plot.

Here is Sophia's method.

She added up all of the data points, then divided by four to get a mean of 27.

75.

At what step did Sophia go wrong in her method? Pause here to have a quick look at her method, and then the data presented above.

Sophia says, "I divided by four because there are four columns of data on the dot plot." This is incorrect.

You still need to divide by 11, as there are 11 dots in the dot plot.

Or if you look at the raw data, there are 11 data points.

This is a good way of spotting whether the mean is a sensible answer or not.

If it does not lie between the smallest and the largest data point, it is guaranteed to be wrong.

And this check for understanding demonstrates that point really well.

Three of the five values shown at this slide represent the averages, the mean, the median, and the mode for this dot plot.

And the other two are either impossible or improbable to be the averages, based on what we've discussed so far.

Which of these values do not represent an average, and why? Pause to have a look at all five of those values and the dot plot, and come up with your answer.

The values that do not represent an average are 16 and 49.

49 is bigger than the maximum value of the dataset, and therefore it is impossible to represent any of the averages.

In comparison, 16 is the maximum value with a very low frequency.

The averages are less likely to be the maximum or minimum values, unless they have a very high frequency.

Okay, onto some checks for understanding.

For question one, I've given you four means and three datasets.

One of the means is an odd one out, but the other three match with the three datasets.

Using an understanding of what can be a mean and what cannot be a mean, match the means to its correct dataset.

And for question number two, calculate the mean of this dataset, and then explain whether it is a representative summary of the dataset or not.

Pause now to give both those questions a go.

Onto question number three.

Jacob calculates the mean of this dataset to be 10.

Without calculating the mean in any capacity, explain how you know for certain Jacob is incorrect.

For part B, then calculate the mean.

Then C, using your mean that you've just calculated, give one possible explanation for why Jacob got a mean of 10 rather than the true correct mean that you just calculated.

Pause now to give that a go.

And for question number four, match the mean to each dot plot.

Pause here to have a look, and remember, one mean does not match with one of the dot plots.

And finally, question number five, answer these two questions.

Question A, one of the students thinks that the mean is not a representative summary of the dataset.

Explain why their observation is valid.

And for part B, which of these statements is still true, knowing that the mean is not a representative summary for this dataset? Pause to give both of these questions a go.

Okay, onto the answers.

For question one, dataset A matches with a mean of six, B with a mean of zero, and C with a mean of 14.

For question number two, the mean is 39, which is not a representative summary of the data, because 39 is neither near the centre of the dataset, nor is it close to, well, any values.

For question number 3A, 10 is far lower than the lowest data point of 103, and therefore it is impossible for 10 to represent the mean.

If you calculated the mean correctly, it would've been 110, not 10.

And for question 3C, there are many valid answers.

Pause here to see if any of these valid answers match with your own.

For question number four, dot plot C matches with 5.

3, A with 10.

2, and B with 11.

9.

17.

1 is far too big to be a reasonable mean for any of those dot plots.

Finally, for question number five, the mean is 57.

3.

The student's observation is valid, because 57.

3 is neither close to the centre of the data.

It is far too far left, nor is it close to any other score.

However, this does not mean that the summary, the mean, is not useful.

The mean being unrepresentative definitely shows how unbalanced the distribution of marks is.

We've got two very small scores of 11 and 13, and then a lot of similar scores at 72 to 80.

Right, we've analysed the mean, but what about the median and the mode? Let's have a look.

Starting with the mode, if the mode exists, it is usually very easy to spot, like in this dot plot.

If I were to take all 173 data points on this dot plot, and then calculate the mean or the median for them, it would either take me a long time, or I would require computer assistance.

But even without knowing the value of this data, you can see where the mode is, it is right here.

Now that I've put a scale to it, I can see that the mode is here.

The mode of this dataset is 60, super easy to spot.

Furthermore, the mode is especially useful if the dataset is noticeably bimodal or even trimodal, such as in this dataset.

This dataset is clearly bimodal at the points 74.

5 and 80.

This dataset has two distinct modes that are relatively far apart in the data.

None of the other averages, the mean, or the median, can use two numbers to summarise a dataset.

That is a benefit of the mode over those other two.

On the other hand, the mode may not be very representative of the dataset if the modal frequency is not particularly high compared to the other frequencies in a dataset.

So let's compare two different distributions.

In this distribution, the dataset is very clearly bimodal at 12 and 19, because those two peaks are very high compared to the other values on that dataset.

However, with this dataset, the two peaks at 16 and 22 aren't particularly high in comparison to the other values around it.

And so what is the point in representing this very evenly distributed data using 16 and 22? If a dataset has far too many modes, is there really any significance in representing the data by all of those values? In this dataset, you've got modes at zero, two, four, six, eight, 10, and 12.

Is it really a summary if I'm listing more than half of the values that occur in a dataset? No, not really.

A typical bit of advice is unless you've got a massive range of data, if a dataset has more than three modes, or it being trimodal, it is not super sensible to summarise using the mode, as too many modes make it not very representative.

Just because a mode is unrepresentative does not mean it isn't useful, it can be still very informative.

In this example, we can say there is no mode, because there are too many highly frequent values.

This is a helpful summary when paired with the statement that there is a lot of variety in the frequency of different values in this dataset.

This implies that there are several equally highly frequent values, in addition to lower frequency values.

On the other hand, this dataset is bimodal, but only just, because most other frequencies are similar to the modal frequency.

This means that there is a high level of consistency, or very little variation in the frequency of each value in this dataset.

And one final advantage of the mode is it is the only one of the three averages that can be used when we're dealing with non-numerical data, specifically qualitative data.

Here's a list of different colours.

The mean is impossible 'cause you can't add colours together, and there's no order for you to put them in for the median to be possible.

But we can see that pink is the modal colour, 'cause it appears more than blue, green and purple.

Onto some checks for understanding for the mode.

Match the dot plot to the statement about the mode.

Pause here to take some time to read each statement and match it to the correct do plot.

And the correct answers are as follows.

Onto the median, where would the median be on this dot plot? So Aisha theorises that "The median will be around the number 16, since that's the middle of the number line being used" to represent this dot plot.

On the other hand, Lucas says "The median will be at six, since that is the middle data point, no matter how spread out the data is." Let's go through the method to find the median thoroughly.

There are 16 data points in this dataset, and so the median will be at the 8.

5th data point.

This will be in between the eighth and the ninth data point.

Starting at data point three, let's go and find the eighth and the ninth data point.

First, second, third, fourth, fifth, sixth, seventh, eighth, and ninth data points.

Because the eighth and the ninth data points are both six, the median is also six, and so Lucas is correct.

This is a benefit of the median.

It will represent a central value of an ordered dataset, most of the time, no matter how spread out or diverse the values of the dataset are.

This is especially true when there are one or two extremely large or extremely small values in the dataset, the median will account for that.

There are certain types of distribution of a dataset that make the median not as effective a representation.

This typically occurs when the values towards the middle of the data are not as frequent as the values closer towards the edges of a dataset.

For example, this occurs if a dataset is bimodal, like in this diagram.

Here, we've got the modes at three and nine, but the median is at the number six.

Whilst the modes both appear five times, the location of the median, six, only appears once.

And the numbers nearby it, five and seven, only appear once between them.

Therefore, the median isn't super representative, because its value and the value of the numbers around it are not super frequent.

Right, whose statement is correct for this dot plot? Aisha says, "The mode is a representative summary as the dataset is distinctly bimodal at 64 and 80." Whereas Lucas says the median is not a representative summary of the dataset, as the median of 72, well, it isn't even a data point.

And it also isn't close to many frequent values.

They are both correct.

Sometimes one summary is far more representative than a different one.

That's a benefit of having three summaries, one of them will be more representative than another quite a lot of the time.

However, just like with the other two averages, using the unrepresented median may still be very useful.

In the case of the median, it is a single value that separates the dataset into two subgroups.

The 50% of the dataset that are less than or equal to the median, and the 50% of the dataset that is greater than or equal to the median.

In this example, 72 splits the dataset right down the middle into two halves.

Onto our final checks for understanding.

Match the dot plots to the statements about its median and its mode.

Pause here to match the statements to those dot plots.

Right, and the answers are as follows.

Onto our final set of independent practise tasks.

For question one, match the dataset to its average that is pretty representative as a summary.

For question number two, which of these three averages is the most representative for the given dot plot? Write a sentence to explain which one you've chosen.

Pause here to answer both of those questions.

And for question three, we're going to focus only on the median and the mode.

For which of these two datasets is the median the more representative summary, and for which one is the mode the more representative summary? Write a sentence for each of the two datasets.

Pause now to consider your explanations.

Okay, for question number four, by calculating the mean, the median, and the mode for both of these datasets, consider which of the averages is the most representative summary for each.

And for a bit of evaluation, what benefit might there still be for using the averages that are less representative? Pause now to work through 4 A, B, and C.

And finally, for question number five, for each of the mean, the median, and the mode, you've got two statements, one that is a positive, and one that is a potential drawback.

Match each statement with each of the three averages, and put it into the correct row to show whether it is a positive or a potential drawback.

Pause now to match all six of those statements.

Onto the answers, for question one, A matches with a median of 70, B matches with a mean of 14.

9, and C matches with a mode of 12.

Well done if you got all three of those.

For question number two, the most representative average for this dot plot is the mode.

This is because there is a very clear bimodality to it, with peaks at three and eight.

Onto question number three.

For dot plot A, the median is more representative because, well, quite frankly, there is no mode if there are six most frequent values.

Whereas for dot plot B, the mode is much more representative because the median is nowhere near the high frequency values of three and nine.

For question four, dataset one, the mean is 7.

4, the median three, and the mode three.

The median and the mode are the most representative summaries.

However, we can still use the mean because it shows how unbalanced the data points in the dataset are.

For dataset number two, the mean is 4.

4, median four, and the mode is trimodal at two, four, and six.

Because both the median and the mean are pretty close together, they would both make fairly representative summaries.

On the other hand, the mode is not as representative because for such a small dataset, having three modes, it being trimodal, means that none of the three modes have much significance.

Finally, for question number five, these are the correct positives and drawbacks put with the correct averages.

Pause here to match what you've done with the answers on screen.

And that's all for today's lesson.

Thank you so much for joining me in this lesson where we have looked at mean, the median, and the mode, altogether, and used them as summaries to represent a set of data.

We've also looked at the mean, the median, and the mode, and examples where they may be less representative summaries of a dataset, but also considered their usefulness even if they are not that representative.

That is all for today's lesson.

Thank you so much for joining me, and I hope to see you soon for some more maths.

Have a good day.