Lesson video

In progress...

Hello everyone, and thank you all for joining me, Mr. Gratin, in a sampling lesson.

Well, we will look at some limitations of a sample including sample size and bias.

Pause here to check some of the keywords that we'll be using today.

First up, let's have a look at how our conclusions may change or not as we collect different samples from a population.

An investigation collects data on 15 restaurants in Oakfield.

Out of the 15 restaurants, six of them reported a decrease in profits over the last year.

So, our conclusion is 40% of all restaurants in Oakfield suffered a drop in profits.

Laura says that sampling is great, because our representative sample can be generalised to the rest of the population, but Alex isn't so sure.

What if you collected a different sample and got different results?

Surely the conclusions will change if the data is different.

If you conduct an investigation with a random sample and then repeat the same investigation again with the exact same population and sampling techniques, but with a different random sample of people or things being chosen, then there is still no guarantee that the two samples will look the same.

This means any conclusions you make may change, if you took a second different sample.

Let's use a dice rolling investigation to demonstrate this.

Imagine rolling a dice with integers one to six a total of 30 times.

Here is a dot plot showing the outcomes of this investigation.

There are 30 dots, because we rolled the dice 30 times.

From this particular sample, we could make a conclusion that the dice is very biassed towards six appearing more frequently and biassed against four appearing at all.

But what if we roll the exact same dice another 30 times giving these results.

Unlike the first investigation, for this second sample we could conclude that the dice is very biassed towards the outcome of three appearing and actually, against six appearing instead.

This variance or natural differences in data gained from different samples is also true when collecting a sample from a population such as a classroom of students.

Grab a calculator and we can demonstrate this now.

Imagine every student in a classroom generating a random number using their calculator like so.

Select the catalogue button, choose Probability, then scroll down and select Random Number.

Press execute to generate a random number.

The teacher in the class could choose a sample of five students to say the value of a random number they generated.

Is it likely that if a teacher picked a different five students from the class that the same five random numbers will be given?

I very much doubt it.

Both Jacob and Aisha conduct an investigation.

Jacob asks several of their friends and all of them said hockey, whereas Aisha asked 10 of her friends and most of them said rugby.

Pause now to consider whose conclusion to this investigation is correct.

It is impossible to tell which conclusion is correct or if either are even close to being correct, but why?

Pause here to consider which of these statements helps explain why it's impossible to tell whose conclusion is correct.

Every sample is different and can lead to different conclusions.

Furthermore, both samples were very biassed, because they asked only certain groups of people rather than a random sample of people from the whole population, from the whole group of people who go to the school.

Everything that we've said so far seems pretty negative.

Laura sees all of these criticisms and asks, "What's even the point in taking a sample if you can't even generalise the results?

" But Alex has hope.

What if the data collection method changed?

What if a larger sample size helps improve how much we can generalise the results?

Let's demonstrate what Alex is trying to say by going back to our previous investigations where we rolled a dice 30 times.

30 times being quite a small sample size.

The data collected from a small sample size can be impacted a lot from a few anomalous results.

For example, a few extra rolls landing on three in sample two makes the data seem incredibly biassed, but in reality that difference is only three or four more rolls landing on a particular number.

That's quite unlikely, but not that unlikely to happen by random chance, especially with a small sample size.

So, imagine our investigation took 500 rolls instead of 30.

Our sample could look like this.

We could conclude that there is very little variance between each outcome.

It'd be hard to justify that any outcome is significantly more or less likely than another.

We could conduct this investigation again to get this sample.

Outcomes four and six seem slightly less likely than the others, but the likelihood of each outcome remains fairly equal.

Actually, in reality, all four investigations used the same fair dice.

Meaning that each outcome is equally likely to occur.

Which of these two investigations is more representative of a fair dice?

Pause now to think about or discuss the observations for sample two and three.

The left-hand sample three certainly looks more fair.

The impact of a few extra rolls of a particular number is proportionally less impactful when the sample size is large.

However, the sample size is small in sample two and therefore those extra few rolls on a certain number makes a massive proportional difference.

Let's put some numbers to this.

In sample three, the difference in frequency between the most frequent outcome of five and the least frequent outcome of six is only a few percent.

However, in sample two, the most frequent outcome of three is 400% more likely than the least likely outcomes of one or six.

Now, that is a massive proportional difference.

But still note, even though sample three looks a lot closer to representing a fair dice there is still some evidence of variance between each outcome.

A sample could never be guaranteed to accurately represent a population or sample space.

It may happen to be a perfect representation, but this can only happen by chance.

In conclusion, the larger the size of a sample is, the more likely the sample will accurately represent a population by minimising the impact of a few anomalous results.

However, a large sample is still not guaranteed to perfectly represent a population and may still be vulnerable to variation and therefore two different large samples may still look different from each other.

Jacob and Aisha conduct their investigation for a second time.

Pause here to consider whose sample is more likely to be representative of the population.

Jacob's methodology certainly looks better, but why?

Pause here to identify the correct reasons.

Jacob's sample size was bigger at 94 versus 40 and it was a random sample versus Aisha's sample from only the netball club.

And pause here to think about or discuss what advice you could give to Aisha to make her sampling method better.

Aisha could collect a larger sample.

Let's beat Jacob by collecting one that is over 100 students.

We can also make the sample random so it's more likely to be representative of the whole population.

Okay, great stuff onto the practise task.

For question one, pause here to consider reasons for and the impact of the different samples and conclusions of Izzy's and Jun's investigations.

Okay, onto question two part a.

Pause here to explain whose data collection method was better, Lucas' or Andeep's.

And finally, part b, pause here to explain why Andeep's comment about sample size is incorrect.

Great effort and all of the thinking you've done so far about these limitations.

The answer to question 1a, different samples may always lead to different conclusions.

And for part b, under the assumption that both samples were randomly collected, then Izzy's larger sample size is more likely to represent the population.

For question 2a, Andeep's larger sample size is better.

You can identify the sample size of each sample by reading off the y-axis for each bar in each bar chart.

And for part b, whilst both samples led to similar results, the results from a sample with a larger sample size are more likely to represent the population.

The conclusions are more trustworthy, because there is less risk of influence from a small number of anomalous results.

Lucas's smaller sample size meant his results could have been influenced by these anomalous results and so are less likely to be trusted.

Yes, this wasn't the case this time, but there is always still a risk of a small sample size being greatly influenced by anomalous results.

We've referenced the influence of bias a little bit already, but how can an investigator influence the presence or absence of bias?

Well, let's have a look.

Sophia, the keen bean, is really excited to ask other students at Oakfield Academy what their opinions are for a new club that Sophia wants to organise.

Sophia is even sensible by collecting a sample from all year groups to make the sample seem more fair.

But Laura thinks Sophia's excitement might influence how people respond to Sophia's investigation.

How an investigator interacts with a person as they conduct a survey may change what sort of information people are willing to give.

For example, if an investigator looks angry or provoked the people being surveyed they feel uncomfortable and not want to give answers or give short and inaccurate answers in order to get out of there pretty quickly.

On the other hand, a friendly-looking investigator is more likely to receive accurate responses, because there's less reason for people to give false information or refuse to answer.

However, if an investigator looks too excited or enthusiastic about their investigation like Sophia was earlier then a person being surveyed, they feel uncomfortable saying anything negative, risking upsetting the otherwise very keen and happy investigator.

If all the responses are positive, because of this, then what really was the point in collecting a sample in the first place?

There are many factors that could influence the quality or quantity of response to a survey.

Many of which are situational.

These include a person's opinion about the investigator.

For example, some people may not want to answer questions given by a teenager and others may give intentionally misleading answers based on the way people look or dress.

None of these potentially discriminatory reasons are right for people to have, but sadly, they do exist and investigators have to take it into account when collecting a sample.

For this check, Sam attempts to collect a large random sample of 300 people, but ends up getting tired halfway through.

Pause here to consider which of these is a possible bias caused by Sam's data collection method?

People may not give thorough or accurate responses, because they're put off by Sam's tired demeanour during the survey.

Sophia doubles down on wanting to get people's opinions.

She says she'll only ask people who she thinks won't give dodgy responses, because of how excited she is.

Laura thinks that suggestion will make the sample very biassed.

As not everyone in the population will have an equal chance of being chosen.

An investigator may be biassed in the way that they choose their sample, even if their intention is for the sample to be fair, random, and representative, whether consciously or subconsciously an investigator may choose to survey people who look friendly and approachable or people who share similar characteristics such as age or interests as the investigator.

Someone they feel more comfortable talking to, and importantly, a person who the investigator thinks may be more likely to give answers that the investigator wants who will give data that benefits them.

On the other end, an investigator may choose not to survey people who do not look friendly, do not seem interested in participating in a survey, and who do not appear likely to give answers that the investigator wants.

Okay, back to Sam's survey.

To speed up the data collection process, Sam asks familiar faces from a nearby school.

Pause here to consider which of these may be true due to Sam's data collection method?

Sam's sample may be biassed.

In fact, it is very likely to be biassed as only school-aged people were sampled.

Sam's sample may not be representative of the whole population of Rowanwood, but it will be more representative of the opinions of younger people in the population only.

Sophia implies that people who may be dismissive of her, because of her excitement were not likely to join her club anyway.

So, what was even the point in asking them to begin with?

Laura surprisingly agrees for once.

Sophia's bias sample actually makes sense.

It's actually a sensible thing to do.

Yes, the results won't be representative of the whole population, but it will be representative of a part of the population that is most relevant to Sophia.

And so, the data will be more relevant and helpful to Sophia.

In conclusion, it is sensible for Sophia to only ask people who may be interested in her club than wasting her time asking several people who she knows will never have any interest at all.

Bias is not necessarily a bad thing.

Sometimes it certainly is, but sometimes a strategic bias can help focus an investigation on certain groups only helping any conclusions be more relevant to your investigation.

People with no interest or knowledge of your investigation may not give a helpful or informed response at all regardless of whether their response is positive or negative.

This is especially true, if your investigation is very specific or niche, unfamiliar to a lot of people, or where only certain groups within a population will have any understanding or valid opinion on what you are trying to investigate.

For example, there is no point in collecting a sample from the entire population, if a majority of the responses will be, I do not know, that won't make for interesting or useful interpretation of your data.

A biassed sample would be to only survey people who are likely to give a response other than, I don't know.

Perhaps by choosing only certain groups of people within a population.

Yes, that sample will be biassed as you are excluding people from the population, but this bias sample is helpful and will not impact your conclusion in any way other than you collect more relevant data and less useless data.

And back to Sam once more.

Remember, Sam's investigation is to answer the question, what makes the town centre great?

Pause here to consider which of these groups of people could Sam avoid surveying in a way that will benefit their investigation?

People who just moved into the area may not have enough knowledge of the area to have an informed opinion.

Whilst asking people who live in a different area may have that same lack of informed opinion or a biassed opinion, 'cause they want to make their own hometown seem better than this town.

Amazing, onto the practise.

For question one, pause here to consider whether Aisha and Alex's samples are biassed and representative of the population.

For question two, Izzy happily surveys parents from different local preschools.

Pause here to evaluate the responses Izzy may receive from the investigation.

Great work in evaluating bias and its impact on all of these investigations.

Onto the answers for question one.

Both samples are biassed as some people may have a higher chance of being chosen for the survey than others.

For part b, Aisha's sample is not representative of the whole population as only older people will be surveyed.

The opinions of young people will not be considered at all.

For question two, Izzy being polite may mean receiving more thoughtful answers, but sadly, her being a teenager may lead to discrimination against her.

Meaning some people may not want to give a response at all.

And lastly, Izzy's sample is very, very biassed, but Andeep doesn't seem to understand the context behind the bias.

Izzy's biassed sample is more relevant to her than a more random sample, as she's only asking parents of young kids who may have an informed opinion of a soft play area that their kids might use.

Great work everyone in considering bias from many different angles.

In a lesson where we have seen that different samples may lead to different conclusions, even if all other conditions are kept the same between samples.

We've also seen that variance is likely to be lower between two samples, if the sample size is larger.

We've seen how an investigator interacts with people may impact the accuracy of their responses, and that bias is not always a bad thing depending on how the investigator consciously or subconsciously selects the biassed sample.

That was a really intensive lesson on the limitations of sampling.

I appreciate your effort all throughout, but that is all for this lesson.

So, until next time, take care and have an amazing rest of your day.

I've finished the video