video

Lesson video

In progress...

Loading...

Hello there, and thank you for choosing today's lesson.

My name is Dr.

Rowlandson and I'll be guiding you through it.

Welcome to today's lesson from the unit of comparisons of numerical summaries of data.

This lesson is called checking, understanding of summary statistics from a list.

And by the end of today's lesson we'll be able to calculate the mean, the median, the mode, and the range from a list of data.

Here is a previous keyword that we'll use again during today's lesson, so you may want to pause the video if you want to remind yourself of what this means before press and play to continue.

Lesson contains two learning cycles.

In the first learning cycle, we're going to be calculating summary statistics, and then in the second learning cycle, we're gonna use that to solve some problems involving summary statistics.

But let's start off with calculating summary statistics.

Here we have Lucas, Jacob, and Sofia.

They have been provided with some data in a list.

The data is about the number of houses on streets in one part of a town, and there's the list of data we can see.

They are trying to work out what the typical number of houses per street is for their town, and they do this by looking at data and summarising it in their own words.

Let's hear from them.

Lucas says, "They differ by quite a lot from four to 83!" Jacob says, "Yes they do, but most of them are around 30 to 40." And Sofia says, "Yes, and three streets have 33 houses on them." It looks like what they're trying to do is summarise the data by looking at it and explaining what they can see in their own words.

But this can also be done formally using summary statistics and summary statistics can be calculated from this data.

For example, here we have the data which is currently unordered and that can make it difficult to analyse.

The data is more easy analysed when it's written in order like we can see here.

Now we've done that, let's go back to what they said.

Lucas said that they differ quite a lot from four to 83.

Lucas's comment seems to be about the spread or how varied the data is.

This can also be done with a formal summary statistic.

An example of that is a range.

The range is a measure of spread or dispersion which indicates how varied or consistent the data is, and the range can be calculated by subtracting the lowest value from the highest value.

So in this case, the range of the number of houses per street is 83 subtract four which is 79.

So let's check what we've learned.

Which of these data sets has the greatest range? You've got a, b, c, and d to choose from.

Pause the video while you make a choice and press play when you're ready for an answer.

The answer is a, and the range for each data set is displayed on the screen.

Let's go back now to our data about houses on streets and take a look at what Jacob said.

Jacob said that most streets have around 30 to 40 houses on them.

Jacob commented on what the more typical values were in this data, and there are some ways to do that formally as well.

A measure of central tendency is a summary measure that attempts to describe a whole set of data with a single value that represents the middle or the centre of its distribution.

One measure of central tendency is the median.

The median is the central or middle piece of data when the data are in numerical order, and that's what we can see with this data here.

So as they are in order, we can look for the middle by either going from the outside of the data to the inside or working out which position is the middle and can get a median of 33.

This is because 33 is both the 10th and 11th position and the median is in between those two positions.

So the median is also 33.

So let's check what we've learned.

What is the median of this data set? Pause the video while you're work it out and press play when you're ready for an answer.

The median is 119.

It is the most central value.

True or false? The median of this data set is eight.

Is that true or is it false? Pause the video while make a choice and choose a justification as well, and then press play when you're ready for an answer.

The answer is false because the data is not in order.

Let's now look at another example of finding the median and then you are gonna have a go again yourself.

What is the median of this order data set? Well, let's take a look.

If we cross off the first and the last and then the second and the second to last and so on until we get to the middle, what we'll see is that the middle is in between 20 and 24.

So what we need to do is find what value is in the middle between 20 and 24.

We need to find a midpoint.

One way we can do that is by adding them together and dividing by two.

20 plus 24 equals 44, and then we divide that by two you get 22, so the median is 22.

Your turn now.

What is the median of this ordered data set? Pause the video while you work it out and press play when you're ready for an answer.

The median is 27 and there's your calculations to our data about houses on streets.

And look at what Jacob said again.

Let's now go back to our data about houses on streets and what Jacob said.

He was commented on what the more typical values were in his data set, and we previously looked at a measure essential tendency called the median.

Another measure of central tendency is the mean.

The mean or the arithmetic mean for a set of numerical data is the sum of the values divided by the number of values.

For example, for this data would add together all the values and divide by 20 because that's how many values there are.

This would give 687 divided by 20, which is 34.

35.

Let's check what we've learned.

Which of the following calculations we'll find the mean of this data.

You have four options to choose from, and it could be more than one.

Pause the video while you make your choices and press play when you're ready for answers.

The answers are c and d, and why is that? Well, with a, there are six pieces of data.

With b, you haven't got any brackets.

So without the brackets only the 12 will be divided by six.

For part c, you've got the brackets that sorts out that issue.

For part d, you don't have any brackets, but the numerator in this fraction will be evaluated first, so it will be equivalent to the calculation in C.

So let's take a look again at the data about houses on streets and look at what Sofia said.

She said that three streets have 33 houses on.

Sofia was commenting on the most common number of houses per street.

This is another measure of central tenancy, which is the mode and the mode is the most frequent value.

Some data though has no mode and some data can have more than one mode.

In this case, the modal number of houses is 33 because that's a data value with the highest frequency.

So let's check what we've learned.

Here we have a dataset which is bimodal.

The modes are three and six and below we have another dataset.

What you need to do is complete this order dataset so that it is bimodal and it might be more than one answer.

Pause the video while you work this out and press play when you're ready to see the answer.

Well, there are two possible answers to this.

It could be 27 or it could be 28.

So back to our data about houses on streets, this data can be summarised using summary statistics.

For example, using measures of central tendency.

The mean number of houses per street is 34.

35.

The median number of houses per street is 33 and the mode number of houses is 33.

Or we can summarise this data using a measure of spread.

For example, the range is 79 houses.

Now, Lucas, Jacob, and Sofia did not calculate any of these formal summary statistics.

They just looked at a data and described what they could see in their own words, but they did manage to analyse the data fairly well without using any of these formal summary statistics.

Why was that? Why do you think you were able to summarise the data just by looking at it and describe it in their own words, and when might they struggle to do that? Pause the video while you think about this and press play when you are ready to continue.

Well, the dataset was fairly small.

There were only 20 pieces of data.

Imagine if they tried to do this with a larger dataset, they might have some trouble there.

Let's take a look at example.

Wow, that's a lot of numbers, but I have seen larger data sets than that before.

For large data sets, summary statistics are harder to estimate just by looking at the data and noting what you observe.

So the calculations are needed to get a better understanding of the data and technology such as spreadsheet software would be the most efficient way to process this for large data sets.

What you don't wanna be doing is adding together all these numbers yourself manually and then dividing by how many there are when you could use something like a spreadsheet to do that for you.

So let's check what we've learned.

A large data set of 6,500 values has a sum of 169,000.

The minimum value is 14 and the maximum value is 37.

What is the range and the mean of this data? Pause the video while you work these out and press play when you are ready for an answer.

Here are the answers.

The range is 23, the mean is 26.

Okay, it's over to you now for task A.

This task has one question and here it is.

You have some cards.

You need to group the cards together that represent the same data set and complete the cards that have gaps.

Pause the video while you work through this and press play when you're ready to see the answers.

Here are the answers to this question.

Pause the video while you check this with your own and then press play when you're ready to continue with the lesson.

You're doing great so far.

Now let's move on to the second part of today's lesson Which is problems involving summary statistics.

Laura has collected some data about the number of books eight different pupils are currently reading.

She calculates that the mean number of books being read is two, but what does this actually mean? Pause a video while you think about this and press play when you're ready to continue.

It means that if the total number of books were evenly distributed, they would each have two books.

It doesn't mean that each person is reading two books, so might be reading less than two books, so might be reading more, but if they all brought their books in, put them in a pile and then shared them out equally between them, they would all have two books.

So how many books would be read by the eight pupils in total? Pause the video while you think about this and press play when you are ready to continue.

Well, there are two books per pupil and there are eight pupils.

So two, multiply by eight means there are 16 books in total.

If we know the actual number of books seven of the pupils were reading, can we calculate the number of books that the eighth pupil was reading.

In this case Sofia on the far right hand side we don't know how many books she was reading, but could you word that out with the information you can see for the rest of them? Pause the video while you have a think about this and press play when you're ready to continue.

Well, we know that the total needs to be 16 and the total of the seven pupils so far is 13.

So the eighth pupil must be reading three books.

Let's check what we've learned.

A data set of 15 values has a mean of 12.

What is the sum of these 15 values? Pause the video while you work this out and press play when you're ready for an answer.

The answer is 180, which we can get by multiplying 12 by 15.

Here we have some data and oh no, ink has ruined one data.

But don't worry, we know that the mean of this data is 12.

So what number is missing? Pause the video while you work this out and press play when you are ready for an answer.

The answer is 11.

Here we have Aisha, Andeep, Izzy, and Jun.

They are trying to imagine what a dataset might look like based on the summary statistics.

Now they are only using small data sets with integer values to check their understanding of summary statistics.

So let's hear from them.

Each of them is going to come up with a data set of five values that has a mode of three.

Let's take a look at what they came up with.

Here's Aisha's, here's Andeep's, here's Izzy's, and here's Jun's.

As we can see here, each of them have set a data set with five values that have a mode of three because three is the most frequent value in each dataset.

Can you think of a data set that has a mode of three? Pause the video while you have a think and press play when you're ready to continue.

Now they're going to come up with a data set of five integer values with a median of four.

Let's hear what they came up with.

Here's Aisha, here's Andeep, here's Izzy's, and here's Jun's.

What did you think of these? I'm not sure all of these have a median of four, who hasn't got four as the middle value? Pause the video while you think about this and press play when you're ready to continue.

Well, it's Aisha, her data is unordered.

When ordered it would be three, three, three, three, four, so three would be the median value of her dataset.

Can you think of a data set where the median is four.

Pause the video while you come up with your own and press play when your ready to continue.

Now they're going to come up with a data set of five integer values with a range of five.

Let's hear what they come up with.

Here's Aisha's, here's Andeep's, here's Izzy's and here's Jun's.

I'm not sure all of these data sets have a range of five.

Who hasn't got a range of five? Pause the video while you work this out and press play when you're ready to continue.

It looks like it's Jun, his data is in ordered.

His lowest value is one and his highest value is seven so his range would be six.

Can you think of a data set as a range of five? Pause the video while you come up with your own and press play when you're ready to continue.

Now they're going to come up with a data set of five in integer values with a mean of six.

Let's hear what they came up with.

Here's Aisha's, here's Andeep's, here's Izzy's and here's Jun's.

We could check these by adding them together and dividing by five, but there might be a quicker way of checking them.

If these data sets each have a mean of six, what should the total be? Pause the video while you think about that and press play when you're ready to continue.

The total should be 30 because the mean is the value when the data is evenly distributed across all data points.

Can you come up with your own set of data where the mean is five? Pause the video while you come up with your own and press play when you're ready to continue.

Now they're going to come up with a data set of five integer values with a mean of five and a median of five.

Let's see what they came up with.

Here's Aisha's, here's Andeep's, here's Izzy's and here's Jun's.

I wonder if we can make this a bit more challenging.

Is it possible to find a data set that also has to have a mode of three as well as a mean of five and a median of five? Pause the video while you think about this and press play when you're ready to continue.

Well, the ordered dataset would have to be three, three, five, and two other numbers.

Let's count x and y where x and y are different, greater than five and would total 14.

Let's think about why that is the case.

You need to have three and three because the mode is three and then the other three numbers would all have to be different to each other so that none of them have a frequency of two.

You need to have five in the middle of your data set, which means x and y would have to be greater than five because the two threes are less than five.

That will keep five in the middle and they would have to sum to 14 because the data so far sums to 11, but we want them to sum to 25 so at the mean is five.

That means there's 14 left over.

So x and y would have the total 14.

So what are the values of x and y? Well, if there are integers, six and eight.

So let's check what we've learned.

Find a data set of five integer values that has a mean of eight, a mode of six and a median of seven.

Pause the video while you work this out and press play when you're ready for an answer.

Let's take a look at this together now.

As the median is seven, the numbers in order must be two unknown values, Let's call them a and b, seven and then two more unknown values, let's call them c and b, but seven needs to be in the middle.

The mode of six means that actually this data set must be six, six, seven, and then c and d, where c and d are different to each other so that this doesn't become bimodal.

The mean being eight means that value c and d must add to 21 and they must be greater than seven, where c and d are not equal to each other so that it doesn't become bimodal.

Now there are multiple answers you can have for this, but an example answer is six, six, seven, 10 and 11.

Let's take a look at another type of problem now involving summary statistics.

Each of the six tables in an art classroom has a paint pot with some painting.

At the end of the lesson, the teacher redistributes the paint evenly across the pots ready for the next lesson.

They now all have 45 millilitres of paint in them.

The teacher then finds a newer paint pot with 80 millilitres of paint in it.

So redistributes the paint evenly across the seven paint pots.

How much is now in each paint pot? Pause the video while you think about this and press play when you're ready to continue.

To begin with, there are six pots of 45 millilitres, so that is 270 millilitres of paint altogether.

Then the extra 80 millilitres of paint makes that a total of 350 millilitres of paint.

When you distribute that evenly across the seven paint pots, it means they would each have 50 millilitres.

You may be wondering what this problem has to do with summary statistics.

Well, this idea of the teacher pulling all the paint together and then distributing it evenly across each of the pots, that's the same idea as what we do when we calculate the mean.

So that means this problem could have been worded like this instead.

Six paint pots have a mean of 45 millilitres of paint per pot.

Another paint pot has 80 millilitres of paint.

What is the mean volume of paint of these seven paint pots? Can we see how this is the same as a problem we've just done? There are six pots with a mean of 45 millilitres, so that is a total of 270 millilitres of paint.

Then the extra 80 millilitres of paint makes out a total of 350 millilitres of paint to be shared between seven pots, and each pot would have 50 millilitres once shared equally.

So the mean of the seven paint pots is 50 millilitres.

That's pretty much the same solution that we had on the previous problem, but the word and change to consider the mean.

So let's check what we've learned.

Lucas tracks his mobile phone usage after the first five days.

His mean hours per day is 4.

5.

The next day he spends 7.

5 hours on his mobile phone.

What is the mean number of hours for the first six days? Pause the video while you work this out and then press play when you're ready for an answer.

Okay, let's take a look at this.

There are five days of 4.

5 hours, so that's a total of 22.

5 hours altogether.

If we then add the six day to it, the 7.

5 hours, we get 30 hours for those six days altogether.

If we then divide that 30 by six, that's five hours per day.

So Lucas's new mean time spent using his mobile phone is five hours per day.

Two groups of 10 pupils have had the mean age calculated.

Group one has a mean age of 14 years, and group two has a mean age of 15 years.

What is the mean age of both groups combined? We're going to work through that together in a moment, but perhaps pause the video while you think about what might you do to work this out before pressing play to continue? Let's try and visualise this problem now.

Group one has a mean age of 14 years, and that doesn't mean that each person in group one is exactly 14 years old, but what we do know is if we add the ages together, we would get 10 lots of 14 which is 140.

And group two their mean age is 15.

So if we add all their ages together, that would be 10 lots of 15 which is 150.

So what is the sum of all their ages? Both groups combined would've a total of 140 plus 150, that is 290 years.

And then if we distribute the 290 years evenly amongst all 20 of these people, we get 14.

5 years.

Here is Andeep.

He looks this and notices that the mean for group one was 14 and the mean for group two was 15 and the mean for all of them together was 14.

5, and he thinks maybe we didn't need to do this many calculations because this answer is the same as doing the mean of the means.

If we take the 14 and the 15, add them together and divide by two, we also get 14.

5.

Do you think and Andeep's method will always work.

Well, not necessarily.

Andeep's method works because group one and group two both have the same number of people in them, but if they had different numbers of people, this method wouldn't necessarily work.

Let's take a look at that now with a slightly different example.

Two groups of people have had the mean age calculated.

Group one has eight people with a mean age of 15 years, and group two has 12 people with a mean age of 20 years.

What is the mean age of both groups combined? Well, here's Andeep with his shortcut method.

He says the mean of the means is 17.

5 years old, which is done by doing 15 plus 20 divided by two.

Let's check Andeep's method by calculating the total combined age and then finding the mean of all 20 people the way we did previously.

Here we have group one and the mean age was 15 and there were eight people.

So the total of their ages is 120.

Here's group two, their mean age was 20 and there were 12 people.

So their ages sum to 240.

If we find the sum of both groups combined, that would be 120 plus 240, which is 360, and then if we divide that by the number of people, that would be 18.

So the mean age is 18 years.

So Andeep seemed pretty convinced that finding the mean of the means would work, but why does finding the mean of the means not work in this particular example? Pause the video while you think about this and press play when you're ready to continue.

The reason why was because the group sizes in this case were different.

That means these groups were contributing a different proportion of the total.

And what you may notice is that the mean for both groups combined is closer to the mean for group two than it is to group one.

This is because group two has more people, therefore it is contributing a greater proportion towards the total.

So let's check what we've learned.

The mean of the daily maximum temperature in Eastbourne has been calculated for June, July and August for 2022.

What is the mean daily maximum temperature of the three summer months? Pause the video while you work this out and press play when you're ready for an answer.

Well, what you can't do is just add together these three numbers and divide by three because June has fewer days than July and August.

So you can do each mean multiply by the number of days in that month and then add those together and divide by the total number of days and they get 23.

0 to one decimal place.

And here's another problem type.

A numerical dataset is represented by the algebraic tiles below.

Which of the summary statistics, mean, median, mode, range can we write an algebraic expression for? Pause the video while you think about this and press play when you're ready to continue.

Well, not all of them.

The median and the range need the data to be ordered.

The order will differ depending on the value of x.

For example, if x was one, then the dataset would be 24, 11, 12, three, three, and 19.

So the range is 21 and the median is 11.

5.

Whereas if x was six, then the dataset would be 44, 16, 22, 18, 18 and 14.

So the range is 30 and the median is 18.

However, we can work out some.

The modal algebraic expression is clearly three x.

That's because there are two expressions with three x.

But one thing to bear in mind is if x was equal to 10 for example, then two x plus 10 would be equivalent to three x.

So finding the mode would be easier once the value of x is established.

And the mean algebraic expression can be calculated as a sum can be found and then distribute across all six tiles, we could add together all these algebraic terms and then divide by six to get two x plus 10.

Let's check what we've learned.

Work out an expression for the mean of these six algebraic expressions.

Pause video while you do that and press play when you're ready for an answer.

Two x plus five, feel free to pause the video if you want to check the working.

Okay, it's over to you now for task B.

This task has eight questions and here are questions one, two, and three.

Pause the video while you work through these and press play when you're ready for the next set of questions.

And here are questions four, five and six.

Pause a video while you work through these and press play when you're ready for the final questions.

And here are question seven and eight.

Pause the video while you work through these and press play when you are ready for answers.

Okay, let's see how we got on.

Here are the answers to questions one, two, and three.

Pause the video while you're checking against your own and press play when you're ready for more answers.

And here are the answers for questions four, five, and six.

Pause the video while you check these against your own and press play when you're ready for the rest of the answers.

And here are the answers to question seven.

Pause the video while you check them and press play when you're ready for the answers to question eight.

Here are the answers to question eight.

Pause the video while you check this and press play when you're ready to summarise today's lesson.

Fantastic work today.

Now let's summarise what we've learned during this lesson.

A list of data can be summarised using statistics.

For example, measures of central tendency, which include the mean, the mode, and the median.

Or we could have measures of spread.

A measure of spread or dispersion could be the range.

The larger dataset, the less practical is the calculate summary statistics by hand.

So spreadsheets and other software can be helpful as they can produce some statistics for huge data sets.

Thank you very much.

Have a great day.