Lesson video

In progress...

Hello everyone and welcome to today's lesson on sampling.

Thank you for joining me, Mr. Gratton, as we use everything that we've learned about sampling to solve problems and look at sampling in context.

Pause here to familiarise yourself with some of the key words that we'll be using today.

First up, what makes for a good sample? What makes for a sample that gives you insight into the questions your investigation are based around? Well, let's find out.

An investigation asks, "What is your favourite sport?" If these 10 people in your friendship group were asked that question and you got these answers, are you satisfied with your investigation? As Laura says, "You'd get the same three answers over and over again even if you did a bigger sample, right? So what's the point in collecting a bigger sample?" Is Laura correct with her statement? Well, no, not likely.

Different samples of any size are likely to give a different set of results.

This is especially true if a different sample was collected from a different demographic of the population.

For example, asking the same question to pupils in the football team may lead to a higher proportion responding with football.

One other limitation of this sample is it's really small sample size.

It's pretty rare that a sample of 10 people is enough when the population of a whole school could be in the hundreds or even thousands.

The bigger the sample size, the wider the range of responses you are likely to get.

This is because you are more likely to get responses from a larger variety of people within the population, each with different characteristics, thoughts, or interests.

So if you have a bigger sample size, this means your sample is more likely to be representative of the population, which is usually but not always the intention of taking a sample.

And speaking of a variety of people, well, this sample did not have much variety at all.

People within a friendship group are more likely to have similar interests.

Therefore, this sample isn't really helpful in getting the opinions from everyone at school where different friendship groups may have different interests to this one.

In conclusion, larger and more random samples usually lead to more diverse responses than a smaller and less random sample.

For example, 100 people chosen randomly is more likely to include responses from a range of people with a range of interests, compared to a sample of 10 students from only one friendship group.

Speaking about these samples, let's compare these two samples.

We've got both the 10-person sample and the 100-person sample.

They were each asked the same question about their favourite sport.

The answers to the 100 person-sample is in this frequency table.

Pause here to identify the differences between these two sampling methods, as well as the data collected from them.

Some of your observations could include a difference in sample size.

100 versus 10 is a big difference.

Furthermore, you've got a random sample versus a biassed sample.

Asking a friendship group is very biassed, as not everyone in the population is a part of that friendship group, making it an unfair sample.

In the larger sample, a greater variety of sports were given as responses such as golf and cricket, which were not seen in the other sample.

And between the two samples, different sports appeared a different proportion of the time.

For example, in the more biassed sample of 10 students, 50% of the responses were tennis.

In comparison to only 24% tennis in the random sample of 100 students.

Okay, let's have a look at an investigation that covers several different parts of the statistical enquiry cycle and think about ways that we can improve upon some of its flaws.

So, a baker wants to open a bakery in Oakfield, but isn't sure whether the locals will be interested in buying the bread the baker bakes or not.

The baker designs this questionnaire to give to people.

The baker surveys people by standing outside a supermarket in a nearby town, giving the questionnaire to the first 50 people that they see.

This frequency table shows the results from the 50 questionnaires that were given out.

Pause here to think about or discuss.

Can you spot any flaws in the methodology of this investigation? Okay, let's dig deep into improving this investigation, starting with this questionnaire.

Pause here to consider which of these could be added or changed to improve this question.

Well done if you were able to choose some of or all of a, b, e, and f.

But why were these options correct? Well, a question is most effective when there is a time frame in either the question or in the responses.

The current question can be answered in many ways.

Some might interpret this as once or twice a day.

Others might see it as once or twice a month.

How can we improve it? By either changing the question or the answers to, for example, how often do you buy bread per week? Or as a response, once or twice per week.

Furthermore, try to avoid using words without a clear meaning such as a little or often.

What often might mean for one person might be someone else's a little.

Words with a definite meaning, or better yet, numerical values are much clearer than these ambiguous words.

Also, all possible responses must be accounted for.

The improvement could be, for example, an other option, giving people an option to write down their own response.

However, this comes with its own issues with people giving their own ambiguous answers.

Therefore, an option such as three or more times a week covers everything that is quite frequent.

Furthermore, currently there is not an option that says, "I do not buy bread." Always remember to add either a zero or a none option if it is applicable to your question.

And pause here to have a look at some examples of improved questions.

Right.

Next up, let's have a look at the data collection strategy.

Pause here to consider which of these could have helped improve the sample that the baker collected.

Both c and d could've been improvements.

Let's see why.

We want the sample taken to be representative of the population that you care about.

A sample is more likely to be representative if the sample is collected in a suitable location.

The baker should have collected the sample in Oakfield, not in any other location.

The baker wanted to open the bakery in Oakfield, so it would make more sense if the baker asked people who lived in Oakfield about their bread-buying habits, not people in a completely different town.

Furthermore, the sample size could have been much larger.

The more people the baker asks, the more likely the responses will be representative of the whole population.

Here's what the baker could have done to improve their sample collected.

The baker could have collected a sample by giving out questionnaires to many people in Oakfield, much more than 50 people, and also collect this sample across multiple days in order to improve the likelihood of surveying a wider variety of people.

Asking lots of people across multiple days may mean you sample different groups of people on different days.

For example, a different type of person may be in the supermarket on a weekday compared to the weekend.

And lastly, let's have a look at the problems here, looking at the results from the survey.

Pause here to consider which of these statements are sensible conclusions from the table of results.

Let's see why b and e are correct.

And the reason why is, well, not everyone the baker gave the questionnaire to responded.

The frequency table only shows 35 results, but a sample of 50 people were collected by the baker.

Therefore, the baker only had a 70% response rate.

What possible reason did the other 30% have for not responding? Well, possible reasons could include that the questionnaire, well, didn't have an option that applied to them, therefore they just chose to not answer at all.

However, we cannot know the reasons for certain.

Furthermore, any conclusions are, well, pretty useless since the sample collected wasn't representative of the population of Oakfield.

Rather, it was taken in a nearby town.

These conclusions are only ever useful if the baker is certain that the interests of both towns are similar, that they can say that the bread preferences of the other town can also apply to Oakfield itself.

And lastly, pause here to think about or discuss.

What answer would you give the baker on how they should have designed their investigation? Here are some possible bits of advice that you could give.

Pause again here to see if your suggestions match these onscreen.

Okay, great work so far on evaluating these slightly dodgy investigations.

Onto the practise task.

For this task, we'll look at seven different questions about this survey, which asks 250 students from the 1000 large population of Oakfield Academy.

The question is, what is the school's favourite shape? Pause now to consider the answers to parts a to c.

And again, pause now to answer parts d to f.

And lastly, pause now to get planning.

Explain how you would've conducted this investigation in a much better way.

Brilliant work.

Here are the answers for part a.

The population are the students at Oakfield Academy.

And for part b, the suitability of the question really was not great.

There are far, far more than four shapes in the world, even though only four shapes were given.

And none of these options were, give your own answer to this question.

So how on earth could anyone who had triangle as their answer actually answer this question at all? For c, there are quite a few ways that a sample of 250 students could have been collected, including, for example, 50 students from each year group or using a random number generator to generate a simple random sample, maybe from the register of students at that school.

The response rate was 80% because only 200 out of the 250 people surveyed actually gave an answer.

This sample is really not very representative of the whole school because anyone whose favourite shape isn't circle, dodecagon, rhombus, or hexagon won't respond or just give a random answer to the question.

Furthermore, the sample only represents 20% of the population, meaning that there is a risk of the sample not being representative of the other 80% of the population.

For part f, because the question is so incredibly badly designed, increasing the sample size will not make a meaningful difference to any conclusions that we can make for this investigation.

However, if the question was improved so that any shape could be given as an answer, then an increase in sample size could give a wider variety of shapes including niche or unusual shapes.

And finally, part g.

Here you can see an example of an improved question, one which gives more detailed options, including options to give specific types of triangles and quadrilaterals, such as a scaling triangle or a parallelogram.

Furthermore, an increase in sample size is really helpful.

400 compared to 250.

Along with ensuring that the sample is a simple random sample, perhaps using a register of students given in alphabetical order and starting at the top, assigning each student on the register with a unique number from 1 to 1000.

We've already seen some pretty shocking sampling methods, which are definitely biassed samples, but are all biassed samples bad? Well, let's have a look.

But first, what is a biassed sample? Well, it's a sample where not every member in a population is equally likely to be selected for a sample.

Some groups in the population may be more or less likely to be selected than others.

Laura says, "It's always best to eliminate bias by changing your sampling technique because a biassed sample is always bad." But is Laura correct? Well, some types of biassed sample really are just that bad.

They're either bad because they are poorly planned or because they're not representative of a population that you want to get information from.

However, some types of biassed sample can be actually really helpful as they are even more representative of the type of people relevant to your investigation, compared to a more random sample of people from the population.

For example, a brand selling frying pans wants to investigate whether people think their top-selling frying pan is an effective one or not.

A biassed sample that will be helpful to the company will be one where the sample only collects opinions from people who have used their frying pan.

Pause now to think about or discuss why this might be.

Well, there's simply no point in asking how good a product is if they've never used it before.

Pause here to consider which of these data collection methods are likely to be effective at collecting a helpful biassed sample.

People at the kitchen aisle of a supermarket are probably more likely to be using frying pans, although this is still no guarantee that they'll use any at all or this particular one.

Local chefs are particularly good since they use frying pans a lot.

And if they have something good to say, then that'll probably also apply to other people who use their pans less intensively.

Whilst it might sometimes be good to target questions towards specific groups in a population, you have to ensure that there is variety in your sample.

For example, only asking chefs and no one else would mean only asking people who use the frying pan a lot every single day.

The company could then put a positive spin on this by saying, "If it's good enough for a chef, it's good enough for you." The bias is then designed to create a certain outcome that makes the company sound much better than it might already be.

Therefore, a negatively biassed sample is a type of sample whose responses are very different to that of the whole population.

The biassed sample is not in any way representative of the population or even amongst only those in the population that use the frying pan.

Pause here to consider which of these samples are more likely to be negatively biassed samples, ones that may affect the quality of the conclusions that you can make from that sample.

Rival brands are more likely to give negative reviews, and people who have stopped using the brand may only give their negative experiences with it.

Great stuff.

Onto the practise task where we have an investigation about toothpaste.

Pause here to write down r for random, g for good biassed, or b for a bad biassed sample for each of these five types of sample.

And lastly for question two, two universities investigated whether people wanted to apply for their university or not.

Pause here to consider examples of a reliable sample, a good sample, and a bad sample for these investigations.

Okay, onto the answers.

a is random because it says 60 people were selected randomly.

b is a good biassed sample because it focuses the investigation to only people who already use the toothpaste, reducing the number of unhelpful responses of, "I don't know, I've never used it before." c is a bad biassed sample because employees at the company are more likely to give their positive responses.

Furthermore, whilst the selection for the company is random, it isn't a truly random sample as not everyone in the population is able to be a part of that sample.

And d hopefully is a good random sample because habits for cleaning teeth are unlikely to change based on location.

And this makes collecting a sample more time efficient than collecting it from multiple different locations.

e is, well, good or bad depending on whether the dentist is sponsored by that company or a different one.

If a dentist is sponsored by this company, they're more likely to give positive responses than an honest one.

And for question two, a larger sample is more likely to be reliable than a smaller sample.

An example of a good biassed sample would be to ask a large group of people across multiple colleges, as college students may be more likely to be researching different universities they might go into after they finish college.

An example of a bad biassed sample would be to only ask people who already go to the university of Elmwood.

They may be more likely to make their decision seem like a very good one by giving positive or overwhelmingly positive responses about that university.

Amazing work, everyone, on thinking deeply about a range of samples in a lesson where we have considered that larger samples are usually much, much better than smaller samples, because they open up a wider variety of responses that may make the sample more representative of the population.

Furthermore, conclusions from an investigation may be more reliable if the planning and data collection processes are very well designed.

Also, biassed samples can be actually quite useful and aren't always negative.

Biassed samples can be proportional to the population or focus on groups in the population that are more relevant to your specific investigation.

Thank you all so much for all of the effort that you've put into this lesson, as well as all throughout the learning of sampling.

That is all for me for now, so take care and have an amazing rest of your day.

I've finished the video