Loading...
Hello, I'm Mrs. Lashley and I'm gonna be working with you as we go through the lesson today.
I'm really hoping you're ready to try your best and make the most of this lesson.
Today's learning outcome is to be able to evaluate different statistical measures of which you've done those before and draw conclusions about a data set.
There's now two slides of words where you should be familiar with them from prior learning, but you may wish to pause the video just so that you can reread them and make sure you feel okay before we start this lesson.
So this is the first slide of words.
Second.
So once again, you may wish to pause the video just so that you can read them and check your okay before we move on.
So our lesson has got two learning cycles.
The first one is all about summary statistics.
We're gonna review what they are.
And then the second learning cycle is about drawing conclusions so that we meet our outcome for today's lesson.
So we're gonna make a start on summary statistics.
So on the screen there are four small data sets.
Small because really data sets are normally very large, full of many values, whereas these only have seven in them.
So because we can see the dataset, we know that they're fairly different from each other.
But what do the summary statistics tell us about these data sets? Well, the mean is one of our summary statistics.
It's a measure of central tendency and we find the mean by finding the sum of all the data and then dividing by how many pieces of data there are.
If we do that for these four data sets, what do we get? Well, they all have a mean of four.
So despite the fact that they're very different, you can see that from knowing the values within the small data sets, they have a measure of central tendency, the arithmetic mean is the same as four.
So another summary statistic and also another measure of central tendency is the median and the median is the most central piece of data when the data is ordered.
All of our data sets are already in order.
So it's about finding that middle, that central piece of data.
And so here the first one, the median is four, the second set and the median is also four.
The third set has a median of one, and the last one has a median of four.
So three of the four data sets have the same median.
So once again, we can see that these data sets are fairly different, but they're two summary statistics, the mean and the median, have matched on three of the four of them.
So another summary statistic, the, again, another measure of central tendency, is the mode.
The mode is the most frequent piece of data.
So set one, all of them are four.
So we would say the mode is four.
Set two, none of the numbers, none of the values repeat.
There isn't a more frequent one than another.
So we'd say it has no mode.
Set three has got a mode of one.
Set four is bimodal.
Two and four both appear twice in the list.
So the mode is different in all the data sets.
So now we've seen that we've got up to three summary statistics and we start to see a difference now.
So if you only have the one summary statistic of the mean, it describes the data and suggests that they're all very similar data sets, but we know that they're different from each other.
And lastly, we have a range.
So this is a summary statistic, but it's not a measure of central tendency.
It's a measure of spread or dispersion.
It's looking at how varied the data is.
So we're gonna find that by doing the difference between the highest and the lowest values.
So on the first one they were all fours.
There was no range, they were not varied, they were exactly the same.
So the range is zero.
On the set two, the range is six.
On set three, the range is 15.
And on set four, the range is nine.
So we can see again that the range has given us some additional information that describes the differences between these four data sets.
I've already alluded to this, but the more summary statistics you have, the greater understanding of the dataset you'll gain.
Because if you didn't have the values, if you only have the summary statistics, if you only had mean for all of them, you might think that they have a very similar structure of the dataset, similar distribution.
But we can see having four summary statistics that actually they're quite varied and different.
So just a quick check, a true or false.
So two datasets have the same mean value.
This means the sets are identical.
True or false, so pause the video whilst you make a decision.
It might be that you want to go back over this last few slides.
Press play when you're ready to justify your answer.
So it's false.
We just saw we had four data sets all with the same mean value of four.
None of them were identical to each other.
They were all different data sets.
So what's the justification? It means that if the data was evenly distributed across the data set, it would be the same value.
On average, the data sets are the same.
So again, once pause the video and then when you're ready to check press play.
So it's the top one.
The mean value as a measure of central tendency is giving you a single value that describes the data.
This has been calculated by finding the total and then evenly distributing it across every data point.
So let's have a look at what does this mean about the summary statistics and in context what does this mean? So a netball manager needs to select a goal shooter, and we're gonna use the initials GS, for the next match.
She's got a choice of two goal shooters.
The club statistician has provided some summary statistics about goals scored over the last 20 matches to help them inform the manager's decision because ultimately the role of the goal shooter in the team is to score the goals.
So we want to know how good they are at scoring goals.
The mean number of goals per match is the same for both of the goal shooters.
So goal shooter one has a mean of 21 and goal shooter two also has a mean of 21.
So based purely on this information, the manager would could choose either of them for the next match because on average by the mean, they score 21 goals per match.
The statistician also gives the range.
So the goal shooter one has a range of 18 and the goal shooter two has a range of seven.
So despite the fact that they both score on average, using the mean as our average, 21 goals per match, the range shows that goal shooter two is more consistent because their range is smaller, their data, the data that has been collected is less varied.
So you are gonna be more reliable potentially that you will score close to that 21.
So with this additional information, the manager is more likely to choose goal shooter two because they are more consistent.
We get another summary statistic and this time it's the mode.
So the goal shooter one's mode is 25.
So remember that means that the most frequent value is 25, whereas goal shooter two is 20.
So goal shooter one has a higher modal value than goal shooter two.
So we know that there are less consistent because their range is more, but actually their modal value is higher.
So their most frequent amount of goals scored is 25.
Using all of the summary statistics that the club statistician has provided, who would you select? Would you go for goal shooter one or goal shooter two? Pause the video whilst you think about that and come up with a decision and justify your reasoning and then press play.
So personally I think I would still go for goal shooter two.
I know that goal shooter one is gonna, on average or more frequently scores higher amounts of goals.
But because of that inconsistency in the range, that means there are gonna be some matches where the goal shooter one scores a lot lower than the the mean of 21.
So I'm gonna go for goal shooter two personally because of their consistency and the fact that their mode of 20 is very close to their mean of 21.
And so I think I'm gonna get about 20 or 21 goals from goal shooter two in a match.
I'm not guaranteed that with goal shooter one, there's a much more variety of their scores.
So here's a check.
Data has been collected and processed about the number of sweets per packet into different multipacks.
So you've got the mean number of sweets per pack, the modal number of sweets per pack, and the range of sweets per pack.
So the company claims that an average of eight sweets per pack, which multipack supports the claim the most? So pause the video.
Use those summary statistics to decide which of the multipacks supports the claim that there is an average of eight sweets per pack.
And press play when you're ready to check your answer.
So I went for multipack one, but there is a chance that you are gonna go for multipack two and as long as you've justified it using the summary statistics, and you might wanna do that with your partner if you're sat next to somebody and discuss why you went for the one you went for.
So multipack one, both measures of average, which is the mean and the mode, are eight or greater, whereas on multipack two the mean was seven.
So that was lower than that claim, although there is more consistency in the multipack two because the range was smaller, the consistency is an average between seven and eight.
So actually that consistency on the average is gonna be less than the claim.
So multipack one is the claim or more because you want more sweets than anything.
So continuum, we've got Izzy here and her most recent test scores for maths.
So we can see we've got seven test scores.
She can represent them as a dot plot.
So each dot is above the value of her test score.
So we can see the 46 is our lowest up to the 91, which was her highest test score and she can work out the summary statistics, mean median mode and range.
So the mean is 72.
So you can see that's quite a central value because it's a measure of central tendency, it's not a score that she actually achieved.
And remember that's because this is taken the total of all of her scores and then evenly distributing them back out.
The median score is 77.
So because there are seven scores, that's gonna be the fourth piece of data, when in order.
The modal score, she got 86 twice.
So that's her modal score.
And her range is 45.
So that's the difference, how varied her scores are.
That's the difference from her best score of 91 down to her, her worst score, if you want to look at it like that of 46.
And Jacob's got his most recent test scores for maths as well.
So again, we've got a list of data, seven pieces of data and he can represent it on a dot plot.
And just straight away, I'm hoping you can see there's a very different distribution of data compared to Izzy's.
So if he works out the summary statistics for his seven scores, his mean is 77, his median is 76, his mode is 73, but his range is much smaller.
So his modal value and median value was actually less than Izzy's, but the range is much smaller.
So if we look at both of them together, we can use the summary statistics to compare their scores.
So we've got their means and the median, mode and the range in a table.
And what does that actually mean? How can we interpret this sort of difference between their two? So Izzy had a mean of 72 and Jacob had a mean of 77.
So Jacob's average typical score, so when it's evenly distributed against, across all tests was higher.
Both of their central scores, their median was similar to each other.
77, 76 is not much between them.
Izzy's most frequent score was higher and Izzy's scores were more varied.
Jacobs were more consistent.
And you might be able to think about why that might be.
Perhaps Izzy's scores were, she was trying really hard at the start.
If you look, if we decide that this list of data is chronological, maybe she tried really hard, she revised really hard, got that 91 and then potentially became a bit complacent.
So her scores dropped off and then she sort of picked it up again at the end.
Whereas maybe Jacob has been consistent throughout all seven tests by revising the same way or changing the way he revises because he noticed that way he works better for him and therefore his range is much smaller.
And so perhaps that's why she has a modal score of 86 because at the start she was working hard for it and then became complacent.
So a check for you.
So these are two other pupils, Alex and Jun's.
And we've got their summary statistics.
So which of these statements is accurate for comparing the mean of Alex's and Jun's scores? Pause the video whilst you read through your options and then press play to check.
So C, the mean is the average or the one we source, use the word average and it's because spreads the data out evenly.
So the typical score, Alex's typical score was 75 compared to Jun's 67.
So it was higher.
If Alex's scores were all summed together and distributed equally, that means score would be higher than if Jun did the same.
Another check, which of these statements is accurate to compare the range of Alex's and Jun's scores? Once again, pause the video, read the statements, make your decision and then press play to check it.
So it's A, the variety of scores.
The range is a measure of spread or dispersion.
It's looking at how varied the data set is.
And you can see that Alex's range is small, it's 15 compared to Jun's of 33.
So we're onto the first task of the lesson.
So question one, in order for Aisha to check her progress in maths from year seven to year eight, so we're not comparing to another pupil, we are comparing to herself, but just a year later in more detail, she calculated the median and range of her scores in her weekly quizzes.
And you need to match the summary statistic to its correct interpretation.
So press pause whilst you're doing that question and when you press play, we're gonna go through the answers.
So it was A and D, the median is that middle or central piece of data went in order.
So her middle score in year eight was higher than in year seven.
Her median in year eight was 18, her median in year seven was 17.
And B says her middle score, so that was the median, but it's got it the wrong way round.
It says that year seven was higher than year eight, which is incorrect.
Her scores were more varied in year eight than in year seven.
Well if it's about variety, that's to do with the range and that's wrong.
It was D, her scores were more varied in year seven than in year eight because the range was higher.
The higher the range, the more dispersed the data, the less consistent if you were thinking of it from like a sports point of view.
So eight is greater than six and therefore in year seven her scores were more spread out, they were more varied.
So we're now up to the second learning cycle and this one's all about drawing conclusions.
We touched a little bit on drawing conclusions in that first learning cycle, but we are really gonna focus on that as our outcome for this learning cycle.
So here we've got some data about two companies, Data Inc and Stats and Co.
The two data sets are showing the number of hours a set of employees work each week across two small businesses.
So it's a small business, we've got small data set and it's about the number of hours that they work.
In which business did its employees work a greater number of hours per week? So from looking at that data set, can you make an answer to that? In which business was it Data Inc or Stats and Co, did its employers work a greater number of hours per week? So in order to really compare the data, especially, you've got to remember that often data sets are not this small.
Data sets are often very large and we'd probably use some sort of spreadsheet software to help us with the analysis because you cannot compare all the numbers and the raw data that we start to use our summary statistics.
So where possible, we want to use more than one.
And we saw that in the first learning cycle that when you have a mean, they can all have the same mean, but it doesn't mean the data sets are identical.
So the more summary statistics you have, a better understanding and a fuller picture of the real dataset.
So if we go to this question of which one has greater number of hours per week.
So the average, if we work out the average, which will, there's the mean, the typical number of hours worked for people in Data Inc is higher.
So it's got a mean of 33.
9 compared to a mean of 32.
3 in the Stats and Co company.
But Stats and CO had less variance with a range of six compared to a range of 17.
So we are using, when we're drawing our conclusions, we're using at least one measure of central tendency.
So that's our mean, our median or our mode and a measure of spread, which for us at the moment, that is range.
So in conclusion, we could say that Data Inc works a greater number of hours on average per week, especially since their modal number of hours is also higher.
In the conclusion, we are adding another statistical summary to back up our claim.
So if we look at this, the parts of my conclusion, the first part is we're using a measure of central tendency and here it is the mean.
And we are giving the values, we're stating the actual value of the mean.
We're not just saying, oh, it's higher, we've calculated it and we're adding it into our conclusion.
We're then putting a measure of dispersion, which for us, as I say at this moment in time, that is the range.
In your future learning, you're gonna learn different measures of dispersion, but currently it's the range.
And then we're gonna conclude what we're trying to say and support it with a third summary statistic, which is going to be another measure of central tendency.
So a check for you, the table of summary statistics shows the number of hours worked by the employers of two businesses, two different businesses.
You've got All Stats and Infotech.
Which combination of these statements creates the correct comparison of these results? So pause the video, think about those sort of the structure of your conclusion and decide which ones would make the best conclusion for comparing these two companies.
Press play when you're ready to check.
So A, All Stats had a higher typical number of hours worked, so that's their mean value.
And then going on to C, when talking about the range and All Stats also had less variance in the number of hours worked.
And then we're concluding, therefore All Stats worked a greater number of hours.
So we've got a measure of central tendency.
Doesn't have to be the mean, but that's the one that was written on the check.
Then followed by a measure of dispersion, which is gonna always be range for us currently and followed by a conclusion.
And the conclusion could be even better if a final comparison was used.
So if we change that statement, we could have said in addition, All Stats most frequent value is 35 compared to Infotech's 31.
All Stats definitely work a greater number of hours.
So we've added an additional summary statistic to really back up our claim.
So conclusions can be made when comparing two data sets about the same context.
So that could be working hours of the two companies like we just saw or test results over a period of time.
However, they can also be used when interpreting the results of an investigation.
So, Andeep surveyed a sample of 12-year-old children about which genres of music they like to listen to.
So he's done it as a tally collection.
So he's got rock, country, pop, dance, R&B and classical.
And Sophia is putting together a playlist for the local elderly care homes communal room.
So she said based off of Andeep's data.
I will mostly pop, rock and dance on the playlist.
What could be wrong about Sophia's conclusion? Pause the video whilst you think about that and press play for us to move on.
I'm hoping you come up with something like elderly people might not like the same genres of music as children.
Andeep's sample was 12-year-old children and she is trying to make a playlist for a local elderly care home.
So they might have different tastes in music.
So when we make conclusions from data investigations, we should think about the limitations of our findings.
We should take care not to overgeneralize our results.
So here are some examples.
Izzy, "I surveyed 20 people and found that they spend an average of one hour, 32 minutes using their phones per day." Sam says, "This must mean that everyone in the world uses their phones for one hour 32 minutes every day." So Sam's conclusion may not be valid because time using a phone may vary according to age, what country you are in and other factors.
So this is an overgeneralization from the results of somebody else's statistical investigation.
We should also consider things that might have biassed the investigation before we draw conclusions.
So Alex says, "I went to a swimming pool and surveyed a sample of adults about how much time they spend swimming per week.
The mean amount of time was two hours, 15 minutes." Does this mean that every adult spends around two hours, 15 minutes a week swimming? What do you think? No.
So the people at the swimming pool may swim more than the general population.
So the sample of people that Alex has surveyed may be biassed and therefore lead to false conclusions.
Continuing with where we might find pitfalls in our conclusions is the data collection from a sample might only be valid for the sample or a population that is fully represented by the sample.
So here we've got an example of this.
I surveyed four people from each year group in my school about how long it takes to get to school.
The mean was 15.
4 minutes.
So is it fair to say that pupils in Izzy's school take around 15.
4 minutes to get to school? Is that fair to say that based off of her data that she collected and has analysed? Yes.
So she surveyed a sample that represented the full school, because she took four people from each year group.
If she had only done four people from one particular year group, then that would be less representative of the whole population, which in this case is the pupils of that school and therefore you probably can't draw this conclusion.
But her sample was trying to be representative of the whole population and therefore we can probably extend it to say that this is what the average pupil would take at that school.
Is it fair to say that school children in the UK will take around 15.
4 minutes to get to school? What do you think about that one? So the last one, it was fair to say that for the the school, the population of the school, is it now fair to extend that to the population of all children that that go to school in the UK? No.
So it doesn't represent all school children in the UK.
It represents this very localised population of this particular school.
And you can think about why that might be.
There are some pupils in the UK that will have to travel a very far distance.
Maybe they live in a village and they're at secondary school and they have to travel out to a secondary school.
It might be that people have relocated but haven't changed their school and so have to travel quite a far distance.
And the converse of that is there are some big cities that have many secondary schools, if we just focus on secondary schools, and their travel time will be very short because they don't have to travel very far.
So here is a check.
Izzy asks 10 of her classmates what their favourite pizza topping is.
The most common pizza topping was pepperoni.
Can we conclude that all pupils of Izzy's age like pepperoni the best? Pause the video and think about that.
And when you are ready to check, press play.
So no, the sample may not represent all pupils of Izzy's age.
And one of the reasons I've put, and you may have come up with your own, is she only asked 10 of her classmates.
And so therefore in that sample there may have been no vegetarians or vegans.
And so therefore pepperoni being a meat topping might be biassed.
It might have been skewed by the fact that she didn't ask any vegetarians or vegans who will have a different favourite topping.
Another check.
Sam went to a gaming cafe and asked all the customers how much time per week they spend playing board games.
The mean average was five hours.
Can we conclude that all adults will play about five hours of board games per week? Again, pause the video, decide if that's a yes or a no and justify why.
Press play when you're ready to check.
So no, this sample may not represent all adults as the sample was customers of a gaming cafe.
So it may be biassed towards adults that already play board games regularly and therefore the mean average may be too high for the general population of adults.
So up to the last task, it's got two questions to it.
So question one, you need to use the table of summary statistics.
So you need to complete the sentences using one of the two towns or a number.
Press pause whilst you're doing that question.
And then when you press play, we move to question two.
So question two shows data from the 2021 census about how people travel to work in Cambridge.
If we use this data from Cambridge as a sample, then we can conclude that 16.
8% of people in the UK travel to work by bicycle.
Explain why this conclusion could be wrong.
So pause the video and then when you're ready to check your answers to question one and question two, press play and we'll go through them.
So here are the answers for question one.
So remember you were filling in the gaps in the statements based on the summary statistics.
It may have been the town or city that it was appropriate for or, or one of those summary statistics.
So the mean number of days of air frost in Bradford is lower by 0.
6 days.
So checking the summary statistics, we can see that Bradford was the lower mean by 0.
6.
The range of 13 in Yeovilton shows more variability in the number of days of frost each year.
So because it had a higher range, it's saying that the number of days of frost was more varied.
So in conclusion, there was more frost in Yeovilton supported by the fact that the middle value of seven is higher by 1.
5 days.
So we can see that structure of the conclusion.
We've given a measure of central tendency, followed it up by the measure of spread or dispersion, and then concluded with an additional measure of central tendency.
Question two, you are gonna have potentially different answers and it might be that you need to check with the person next to you or your teacher whether yours is a valid explanation as to why the conclusion could be wrong.
So the sample only uses data from a single location.
So this data is only about Cambridge.
It's not about lots of representative cities or towns of the UK.
The UK proportion of who cycles to work is 2% based on the 2021 census.
You weren't to know that, you weren't to know that figure, but just to show it's a very, Cambridge is a very special case of cycling to work.
And if you are somebody that's ever visited Cambridge, you'll know how many bicycles there are there.
And that's not the normal of the, of a UK town or city.
So making a conclusion about the whole of the UK based on one single location is going to lead to false conclusions.
So in today's lesson, we have looked at what statistical summaries are, we've reminded ourselves about those.
So the measures of central tendency on mean, median and mode.
And then we have a measure of dispersion, which is our range.
And each of the summaries give us a different insight into the dataset.
And the more statistical summaries that you can calculate or you know, the better the understanding of the dataset that you are working with.
Conclusions that you drawn on, you need to be mindful of how the data was collected and the limitations of the sample.
So not to overgeneralize, think about if there's any bias within the sample and just being very mindful that you're not making a false conclusion.
Really well done today.
I look forward to working with you again in the future.