video

Lesson video

In progress...

Loading...

Hello, my name is Dr.

Rolandson.

I'm delighted that you'll be joining me in today's lesson.

Let's get started.

Welcome to today's lesson from the unit of numerical summaries of data.

This lesson is called problem solving with numerical summaries, and by the end of today's lesson, we'll be able to use our understanding of various statistical summaries to solve problems. Here's a reminder of some keywords that you may be familiar with and I'll be using again in today's lesson.

This lesson contains two learn cycles.

In the first learn cycle, we'll be critically evaluating statistical conclusions, and a second learn cycle will be considering context behind statistical conclusions.

But let's start with learn cycle one, critically evaluating statistical conclusions.

Statistical conclusions can be made when we compare two datasets.

Datasets can be compared by considering the central tendency and the spread of each group.

Central tendency can be described using an average such as the mean, the median, or the mode, and spread can be described using the range.

With each comparison, it is helpful to explain what the results mean in context.

Here, we have two dot plots.

The dot plots show the scores for two teams on a video game.

So in the top dot plot it's for Team A, and we've got a scale from 43 up to 63, and each dot represents a person.

So one person got 46 points.

And we've got the same for Team B.

Make two comparisons between the team's video game scores.

Pause the video and have a think about what sorts of things we might compare between these two teams and then press play when you're ready to continue.

One thing that you may want to compare between these two datasets is the central tendency, and we've got a few different options for what type of average we can use to compare central tendency.

For example, we could say that the mean for Team A, which was 53.

2, is greater than the mean for Team B, which is 50.

2, and then explain what that means in context.

In this context, it suggests that Team A scored better on average than Team B.

This response is very thorough and very clear.

It tells the reader all they need to know without necessarily looking at the graphs.

It says which measure is being used, it says the mean.

It says what those means are in the text as well in the brackets.

See, know Team A's mean, and Team B's mean without looking at the data again.

It says what the comparison is.

It says Team A is greater than Team B, and it also explains what that means in context.

It says about scores because this is about a video game score, and that's also really important because sometimes higher numbers tend to be a good thing, but sometimes in some contexts, the lower number might be a good thing.

For example, a golf score is good to get a lower score than a higher one.

That wasn't our only option for comparing central tendency though.

Another one could have been to use the median, so we could say the median of Team A, which is 53, is greater than the median of Team B, which is 50.

And again, explain what means in context.

It suggests that Team A scored better on average than Team B.

Or we could use the mode as our average and say the mode of Team A, which is 53, is greater than the mode of Team B, which is 48.

And again, explain what it means in context.

This suggests that Team A scored better on average.

So that's one comparison, comparing a central tendency.

All three of those options kind of say the same thing.

They tell us where the middle of the data is or the typicality of the data.

Let's say something different now.

Let's compare the spread.

We can do that by looking at the range.

We can say the range of Team A, which is 14, is smaller than the range of Team B, which was 18.

An explained context that this suggests that Teams A scores were more consistent.

They were closer together.

These answers can then be combined to make a complete solution to this question by choosing one of the averages and the range to compare.

The reason why we just use one of the averages is 'cause they all kind of say a similar thing about a data.

They all say about central tendency.

So we say something about central tendency and we say something about spread.

That's two distinct comparisons.

Whereas if we make two comparisons using two averages, they kind of are saying the same thing twice.

So for example, our solution could be the mean of Team A, 53.

2, is greater than the mean of Team B, 50.

2, suggesting that Team A scored better on average.

And the range of Team A, 14, is smaller than the range of Team B, 18, suggesting that Team A's score were more consistent.

Here's another example.

Jacob has made a comparison between two teams. Got Team C represented in that top dot plot and Team D represented in the bottom one.

Jacob says, "Team C scored higher than Team D on average." What evidence would support Jacob's claim? Pause the video and have a think about what statistical measure we could use to support Jacob's claim, and then press play when you're ready to continue.

We could evidence Jacob's claim by saying the median for Team C, which was 50, is higher than Team D, which was 49, and that would suggest that Team C, yeah, it did score higher on average than Team D.

On the other hand, what evidence would counter Jacob's claim? What statistical measure could you use to argue that Jacob is wrong? Pause the video.

Have a think about this and press play when you're ready to continue.

If we wanted to counter Jacob's claim, we could say that the mean for team D, which is 50.

5, is higher than for team C, which is 49.

5.

Now we have Laura who has made a comparison between these two teams. And she says, "Team D's scores are more consistent than Team C's scores." What evidence could be used to support Laura's claim? Pause the video, have a think about it, and press play when you're ready to continue.

We could support Laura's claim by saying how the range for team D, which is 15, is smaller than the range for team C, which is 21.

The range being smaller suggests that the points are all closer together.

That means the data's more consistent.

If we wanted to try and argue with Laura's claim, what could we say based on what we can see in this dot plot? Pause the video and think about how you might argue against Team D's scores being more consistent, and then press play when you're ready to continue.

While the range for Team C is bigger than Team D, we could argue against that showing consistency by saying how Team C's range is only larger because of these two very extreme values.

If the spread was measured in a different way that didn't use these two extreme values, then Team C would probably be more consistent because all the rest of the data is between 46 and 53, whereas the majority of the data for Team D is between two wider regions, 45 and 60.

Let's check what we've learned there so far.

The table shows summary statistics for the scores for two teams on a video game.

Which measures show how highly each team scored? The options are the A, the mean, B, the median, C, the mode, and D, the range.

You can choose more than one.

Pause the video, make your choices, and press play when you're ready to continue.

The answer is the mean, median, and mode all show how highly each team scored because they give a sense of a central tendency of the data.

Which team scored higher on average? Is it Team A, Team B, or they both scored equally high? Pause the video, make your choices, and press play when you're ready for an answer.

The answer is Team A scored higher on average, and we can see that because the mean, median, and mode for Team A is greater than the mean, median, mode for Team B.

Which measure shows how consistent each team's scores were? Is it mean, median, mode, or range? Pause the video, make a choice, and press play when you're ready to continue.

The answer is D.

The range is a measure of spread, so it indicates how consistent or varied a dataset is.

So which team had the most consistent scores? Is it Team A, B, or they were equally consistent? Pause the video, make a choice, and press play when you're ready for answer.

The answer is Team B.

Team B has the smallest range.

Therefore, the two end points the data, the highest and the lowest, are closer together than with Team A.

Therefore, the scores are more consistent with Team B.

Okay, it's over to you now for task A.

This task contains two questions with each question presenting you with a context and some data.

In question one, the table shows a statistical summary of data taken from The Met Office from 1941 to 2022.

The data shows the amount of rainfall per month in Durham and Lerwick.

You need to make two comparisons between the amount of rainfall in Durham and Lerwick based on this data.

Make sure those comparisons are distinct based on what's been discussed in this lesson.

You need to write a sentence or two for each comparison.

Pause the video, have a go, and press play when you're ready for question two.

And here is question two.

Two classes practise throwing basketballs into a hoop.

Each person had 20 attempts and the dot plot shows the number of points each person scored.

Lucas and Aisha used the data to make some conclusions.

Lucas says, "Team B scored higher than Team A on average," and Aisha says, "Team A's scores were more consistent." You've then got three questions to consider.

For each one, write down a sentence or two and provide the data that backs up your sentence.

Pause the video, have a go at this, and press play when you're ready for answers.

Well done with that.

Let's now go through some answers.

Question one, make two comparisons between the amounts of rainfall in Durham and in Lerwick.

One of these comparisons needs to be using an average and the other one should be looking at the spread.

So the mean of Lerwick, which is 98.

3, is greater than the mean of Durham, which is 53.

6, meaning that Lerwick got more rainfall on average.

Some things to look for here, have you explicitly stated which average you're using? In this case, it's the mean here.

Have you said what those averages are? Have you made a comparison and said that Lerwick is greater than Durham, and have you explained what it means in the context of this data? This data is about rainfall.

If you don't mention rainfall in your response, then that's not putting it into context.

So that word rainfall hopefully is in your answer somewhere.

Now, we've used the mean here.

We could also use the median instead and make the same point, but using 48.

4 and 92.

The second comparison looking at spread, we could use the range.

The range for Durham, 194.

2, is less than the range for Lerwick, which is 301.

7, meaning that Durham's rainfall is more consistent or more predictable.

Again, we are explicitly stating which measure we're using, the range, or making a comparison, we're saying one is less than the other and we're putting it in context.

So hopefully one of your responses is using an average.

That could be the mean or it could be the median and the other one is using the range.

Let's go through now question two.

So Lucas said Team B's scores were higher than Team A's on average and we wanna give evidence to support Lucas's conclusion.

We could give evidence by saying that the median for Team B, which is 15, is greater than the median for Team A, which is 14.

If we want to counter Lucas's conclusion for part B, we could say the mean for Team B, which is 12.

5, is less than the mean for Team A, which is 14.

1.

And Aisha says Team A's scores were more consistent.

If we want to give evidence to support Aisha's conclusion for part C, we can say the range for Team A, which is three, is less than the range for Team B, which is 17.

Great work so far.

Let's now move on to learn cycle two, which is considering the context behind statistical conclusions.

When examining statistical summaries such as averages, it can be helpful to consider the context that the data was collected in.

That's because data collected in another context may result in different results.

And that means results from one context cannot always be compared to the results to a different context.

For example, in 2021 to 2022 football season, Chelsea won the Women's Super League, scoring an average of 2.

91 goals per game.

Sofia makes a conclusion based on this data.

She says, "Wow! in that same year, my school team scored an average of 3.

7 goals per game.

That means our team was better than Chelsea." Do you agree with Sophia's conclusion? Pause the video, have a think about this, and press play when you're ready to continue.

Hopefully you're thinking no.

Sofia's team is not necessarily better than Chelsea's based on this data because Sofia's team and Chelsea play in different leagues, so it's very different contexts.

Chelsea scored fewer goals on average, probably because they were playing better opponents, with all respect to Sofia's team and their opponents.

Here's a different scenario.

Jun is looking at his test from different years at school.

He says, "When I was 11 years old, my average test score was 72%, and when I was 14 years old, my average test score was still 72%.

That must mean I did not improve in this subject for three years.

Do you agree with Jun's conclusion? Explain why you agree or disagree.

Pause the video, have a think, and press play when you're ready to continue.

Hopefully you're thinking no, you don't agree.

The tests happened during different school years, so they would've at least contained different questions or most likely would've contained different questions.

And the test that June did at ages 14 would've probably been more difficult for the test he did when he was 11 years old.

So the fact he scored the same percentage on these two tests where one is harder than the other because a bit more time has passed might actually mean that he isn't proven.

It's just that the tests are getting harder at the same time.

So it's hard to compare the percentages of these two tests because they happened in very different contexts.

Here's another scenario.

Aisha competes in a 100-meter race competition each year.

She says, "In last year's final, I got silver medal for coming in second place.

And in this year's final, I got a gold medal for coming in first place.

I must have run faster this year than I did last year." Do you agree with Aisha's conclusion and can you explain why you agree or disagree? Pause the video, have a think, and press play when you're ready to continue.

Hopefully you're thinking, no.

She might have run faster this year than last year, but that's not necessarily the case based on what she said.

We need to remember that when you run a race, you run against opponents and you get a gold medal or you come first place because you run faster than the other people you're against.

So the reason why she only got second place last year might have been because she ran against faster opponents last year than she did this year.

This time Aisha says, "In my last year's competition, my average race time was 14.

1 seconds.

In this year's competition, my average race time was 13.

7 seconds.

I must have run faster this year than I did last year." Do you agree with Aisha's conclusion this time? And can you explain why? Pause the video while you think and press play when you're ready to continue.

Hopefully this time you're thinking yes.

That's because the data that Aisha's provided supports the conclusion she's made.

Aisha's average time was quicker this year than it was for last year, and that's regardless against who her opponents were.

It's a time of her race this time.

So the fact it's a short time means she must have run faster regardless of who she's against.

Let's check what we've learned there.

True or false? Conclusions to investigations should take into account the context which the data was collected in.

Is that true or false? And your justifications are, A, the context can distract you from the numerical results of the investigation.

And B, the context can help you see whether there might be an alternative explanation for the results in the investigation.

Pause the video, make your choices, and press play when you're ready for an answer.

The answer is true, and that's because the context can help you see whether there might be an alternative explanation for the results of the investigation.

For example, with Aisha's race, the context might say the reason why she only got a silver medal last time was 'cause she was against faster opponents.

Okay, it's over to you now for task B.

This task contains four questions.

In each question, someone will present you with some data and the conclusion, and what you need to do is explain why their conclusion could be wrong.

Do this by writing a sentence or two.

Really take into account the context from which they've made their conclusion.

Here's questions one and two.

Pause the video while you have a go and then press play when you're ready for the next questions.

And here is questions three and four.

Pause the video while you have a go and press play when you're ready for some answers.

Great job with that.

Let's now work through these one at a time.

Question one, Alex has represented his school and his county in swimming races.

He says, "When I raced for my school, I got a gold medal.

But then when I raced for my county, I only got a bronze medal.

That must mean I swam slower when racing for my county." Explain why it could be wrong.

Well, Alex may still have swum at the same speed but against faster opponents when racing for the county rather than the school.

Question two, Laura's looking at her school reading book records.

She says, "When I was six years old, I read an average of 9.

2 books per month.

I now read an average of 1.

4 books per month.

I must be a slower reader now than I was when I was six years old." Can we think why she might be wrong here? Well, books are different lengths, and the chances are the books that Laura read when she was six years old were probably much shorter than the books she's reading now.

So she's not necessarily reading slower, she's just reading bigger books.

Question three, a survey found that 18% of workers in the town of Scarborough walk to work and that 7% of workers in the city of London walk to work.

Izzy says, "There are more people walking to work in Scarborough than in London." Why is Izzy wrong? Well, one reason could be that the population of cities are usually greater than populations of towns.

And you can see in the word in the question, it says the town of Scarborough and the city of London.

So 7% of a large population could actually be more than 18% of a small population.

If you think of a really, really big number and find 7% of it, then think of a small number and find 18% of that, you might find that your 7% is bigger than your 18%.

Question four, a survey in 1971 found that 15% of workers in Liverpool travelled to work by bus.

And a survey in 2023 found that 11% of workers in Liverpool travelled to work by bus.

Jacob says, "Liverpool does not need to run as many buses as they did in 1971 because there are fewer people using them to travel to work." Why might Jacob be wrong here? Well, the population of Liverpool may have increased since 1971.

It's kind of a similar situation to question three.

11% of a large population could be more people than in 15% of a small population.

Fantastic work today.

Here's a summary of what we've learned in this lesson.

Comparisons can be made between datasets by using statistical summaries such as averages and the range.

However, the validity of those statistical conclusions should be critically evaluated both by the researcher themselves who are making those conclusions as part of the data handling process, but also by anyone who might then use those conclusions to take actions elsewhere.

That's because statistical summaries may be selected or omitted when the data is used in the real world.

And in order to make accurate conclusions to a statistical evaluation, we really need the data and knowledge about how it was collected and the conditions it was collected in, such as the context.

Well done today.