video

Lesson video

In progress...

Loading...

Hello everyone and welcome.

Today's lesson on sampling.

Thank you for joining me, Mr. Gratton, How's we? Use everything that we've learned about sampling to solve problems and look at sampling in context.

Pause here to familiarise yourself with some of the key words that we'll be using today.

First up, what makes for a good sample? What makes for a sample that gives you insight into the questions your investigation are based around? Well, let's find out.

An investigation asks, "What is your favourite sport?" If 10 people in a friendship group were asked this question and you've got these answers, is your investigation done? As Laura says, you'd get the same three answers over and over again even if you took a bigger sample, right? Is Laura correct? Well, not likely.

Different samples of any size are likely to give you a different set of results.

This is especially true if a different sample was collected from a different demographic of the population.

For example, you may be more likely to get an answer of football if you ask the football team what their favourite sport was.

One limitation of this sample is the sample size is really small.

It's rare that a sample of 10 people is enough when the population of a school, for example, is much, much larger than just 10 people.

The bigger the sample size, the wider the range of responses you are likely to get.

This is because you are more likely to get responses from a larger variety of people in the population, each with different characteristics, thoughts, or interests.

This means your sample is more likely to be representative of the population, which is usually but not always the intention of a sample.

Speaking of a variety of people, yeah, this sample does not have much variety at all.

People within a friendship group are more likely to have similar interests.

Larger and more random samples usually lead to more diverse responses than a smaller less random sample.

Okay, for this first check, sticking with this example, but now bringing in a different sample to compare.

Pause here to identify the differences between these two sampling methods, as well as the data collected from those two samples.

Some of the differences that you may have observed could include a different sample size.

The left hand one had a sample size of 10 versus 100 for this new simple random sample.

Furthermore, one sample was random whilst the other was biassed.

In the larger sample, more different sports were given as responses, and furthermore, different sports appeared a different proportion of the time between the two samples.

Let's have a look at an investigation that covers several parts of the statistical inquiry cycle and think about ways that we can improve some of its flaws.

So a baker wants to open a bakery in Oakfield, but isn't sure whether the locals will be interested in buying the bread that the baker bakes or not.

The baker designs this questionnaire to give to people.

The Baker surveys people by standing outside a supermarket in a nearby town giving the questionnaire to the first 50 people that they see.

This frequency table shows the results from the 50 questionnaires that were given out.

Pause here to think about or discuss.

Can you spot any flaws in the methodology of this investigation? Right, let's improve on this investigation, starting with this questionnaire.

Pause here to consider which of these could be added or changed to improve this question.

And the answers are A, B, E, and F.

Let's see why these help.

A question is most effective when there is a timeframe in either the question or in the responses.

How could we improve it? By either changing the question or answers to, for example, how often do you buy bread per week or as a response once or twice per week.

Furthermore, try to avoid words without clear meaning such as a little or often.

And also all possible responses must be accounted for such as having an other option for people to write their own answers into or an option for zero or none, such as I do not buy bread.

And pause here to have a look at some of the examples of our improved questions.

Okay, next up.

Let's have a look at their date collection strategy.

Pause here to consider which of these could have been improved in the sample that the baker collected.

And here are the answers, C and D.

Let's see why they are improvements.

Well, we want the sample taken to be representative of the population that you care about.

A sample is more likely to be representative if the sample is collected in a suitable location.

The baker wants to open a bakery in Oakfield, so make a lot more sense if the baker asked people who lived in Oakfield about their bread buying habits, not people in a different town.

Furthermore, the sample could be larger.

The more people the baker asks, the more likely the responses will be representative of that population.

Here's what the baker could have done to improve the sample that they collected.

The baker could have collected a sample by giving out questionnaires to many people in Oakfield, more than the original 50 people across multiple days in order to improve the likelihood of surveying a wider variety of people in Oakfield, not a nearby town.

And lastly, let's have a look at the problems here when looking at the results from the survey.

Pause here to consider which of these statements are sensible conclusions from this table of results.

Let's see why B and E are the correct answers.

Not everyone, the baker gave the questionnaire to, responded.

The frequency table only shows 35 results, but a sample of 50 people was collected.

The baker only had a 70% response rate.

What possible reason did the other 30% have for not responding? Possible reasons include the questionnaire not having an option that applied to them.

However, we cannot know this for certain.

Furthermore, any conclusions are, well, honestly, pretty useless since the sample collected was not representative of the population of Oakfield, rather a nearby town instead.

These conclusions may only be useful to the baker if the interests of both towns are similar, that the bread preferences of the other town can also apply to Oakfield.

However, the bread maker may need to do some further investigation to confirm whether this is applicable or not.

And lastly, pause here to think about or discuss what advice would you give to the baker on how they should have designed the investigation? Here are some possible bits of advice that you could give.

Pause again here to see if your suggestions match these on screen.

Okay, onto the practise task.

For this task, we'll look at seven questions about this survey, which asks 250 students from 1000 large population of Oakfield academy.

And the question is, what is the school's favourite shape? Pause now to answer parts A to C.

And pause now to answer parts D to F.

And one more time, pause here to get planning.

Explain how you would've conducted this investigation.

And great work everyone on your evaluation of this investigation.

Here are the answers for part A.

The population is the students at Oakfield academy.

And for part B, the suitability of the question wasn't great.

There are far more than four shapes in the world, even though only four options were given.

With none of those options being to give your own answer, how could someone who wanted to say triangle even answer this question? For part C, there are quite a few ways that a sample of 250 students could have been collected, including 50 people from each year group or using a random number generator to generate a simple random sample or a stratified sample.

And the response rate is 80% because there were only 200 responses in the frequency table.

And for part E, the sample is not very representative because anyone whose favourite shape isn't circle, dodecagon, rhombus, or hexagon, either won't respond or just give a random answer.

Furthermore, the sample only represents 20% of the population, meaning that there is a risk of this sample not being fully representative of the whole population.

For Part F, because the question is so badly designed, increasing the sample size honestly probably won't make a meaningful difference to how useful the data are.

However, if the question was improved so that any shape could be given as an answer, then an increase in the sample size could give a wider variety of shapes including some niche or unusual shapes that only one or two people may respond with.

And for Part G.

Pause here to have a look at an improved investigation.

We've seen some pretty shocking sampling methods, which are definitely biassed, but are all biassed samples bad? Let's have a look.

Remember, a biassed sample is a type of sample where not every member in a population is equally likely to be selected, where some groups may be more or less likely to be selected than others.

Laura thinks that bias should always be removed.

Is Laura correct? Well, some types of bias sample are very bad.

They are just poorly planned sampling techniques that are not representative of a population.

However, some types of by sample can be very helpful as they are more representative of the type of people relevant to your investigation than perhaps a more random sample would be.

For example, a brand selling frying pans wants to investigate where the people in general population think that their top selling frying pan is effective or not.

A biassed sample that would be helpful to them would be to only collect the opinions of people who have actually used their frying pan.

Pause it to think about or discuss why this might be.

Well, there's just no point in asking people how good a product is if they've never used it before.

Pause here to consider which of these data collection methods are likely to be effective at collecting a helpful biassed sample.

People at the kitchen aisle of a supermarket are probably more likely to use frying pans, although this is still no guarantee that they'll be using this particular one.

Local chefs are especially good since they'll be using frying pans quite a lot and if they have something good to say, then they'll probably also apply to other people who may use the pans less intensively.

However, you shouldn't restrict your sample to only one or two very specific types of people whilst collecting a sample from all sorts of people who use pans are good, you have to ensure that there is a notable amount of variety.

For example, only asking chefs would mean only asking people who use the frying pan a lot every single day.

The company could then put a positive spin on this bias by saying if it's good enough for a chef, it's good enough for you.

The bias is then designed to create a certain outcome that makes the company sound great even better than it would be otherwise.

Okay, pause here to consider which of these samples are more likely to be negatively biassed samples? One that may affect the quality of the conclusions that you can make from the sample.

Rival brands are more likely to give negative reviews and people who have stopped using that brand may only give their negative experiences.

Okay, for this practise task we have this investigation about toothpaste.

Pause here to write down R for random, G for good biassed, or B for bad biassed sample for each of these five types of sample.

And for question two.

Two universities investigated whether people wanted to apply for their university or not.

Pause here to consider examples of reliable, good and bad samples for these investigations.

Okay, onto the answers.

For A, it is a random sample.

For B, it is a good biassed sample, where C is a bad biassed sample 'cause surely the employees will only give good feedback about their own toothpaste.

D is a good biassed sample.

Whilst E, well, is good or bad, depending on whether the dentist is sponsored by that company, a different one or not sponsored at all and therefore wouldn't have any reason to give a false preference as their answer.

For question two.

A larger sample is more likely to be reliable.

An example of a good biassed sample would be to ask a large group of people in college as they're likely to be researching different universities to go after they finish college.

An example of a bad biassed sample would be asking only people who already go to that university.

Of course, they're more likely to make their decision seem like a clever good one by only giving more positive responses.

So far we've seen both unhelpful and helpful samples.

One possible benefit from a sample would be to figure out the size of a population from a sample.

Let's see how.

Okay, if given either the size of a stratified sample or the sample size of a stratum one group in that sample, then it may be possible to figure out the size of the whole population or the number of people or objects in a stratum of that whole population.

Let's put some information into a ratio table to see how this works.

A stratified sample of 100 people where 94 of those people were students.

I know that there are in total 87 staff in the whole school.

So let's see if we can estimate how many students that there are in that school.

Out of the 100 people in the sample, the remaining six were staff.

Therefore, the multiplier from sample to population is the outcome of 87 divided by the current value of six.

94 multiplied by this multiplier gives you 1,363 students.

We can apply this exact same multiplier to the total sample to get 1,450 people in total in that school.

Pause here to consider this sample of 400 stove tops and find the values of A to F in the ratio table in order to find an estimate for the total number of stove tops in this town.

The total sample size is 400, with 224 of that sample being gas burning stove tops.

In the town, there are a total of 9,944 induction stove tops.

By doing a subtraction, there are also 176 induction stove tops in the sample.

Giving one of these values as our multiplier 400 multiplied by one of these multipliers, gives you 22,600 stove tops in the town in total.

Whilst it is possible for this method to find the exact size of a population or size of a stratum, you are more likely to get an estimate instead.

Sometimes it's a close estimate and sometimes it's one that isn't particularly good.

Let's see why.

In this sample, there are a total of 65 students in the population.

There are a guaranteed 78 math students.

The multiplier uses the 78 math students divided by the sample number of 23.

An estimate for the population is 65 multiplied by this multiplier at approximately 220.

However, 220 is the result of a rounded decimal.

We can also use this multiplier to get an estimated size of each stratum in the population.

The rightmost column shows the real number of students in the year group stratified by the subject that they got their highest grade in.

Except for maths, which we were given as exactly 78 students.

So how come that there's a difference between the estimated and the real population? Pause now to think about or discuss these possible reasons.

When the original sample size of each stratum was calculated, the results were decimals, which we then rounded.

Therefore, when estimating the population from these rounded sample answers, we were using an inaccurate sample size of 23 math students to estimate the size of this whole population.

This inaccuracy was then carried forward to the rest of our investigation.

You might notice that this inaccuracy between 22.

53 and 23 is quite small.

It's less than 0.

5, but when multiplied by the size of the sample, the inaccuracy scales up into a handful of people.

Okay, for this check.

Pause here to first complete all the information in this ratio table, then choose appropriate values in the ratio table in order to calculate an estimate for the percentage of customers that are pay as you go.

Great effort on this question.

Here are the answers.

Pause here to check if all values on the ratio table and in the percentage calculation are correct.

And finally, here are the real numbers of customers in the population for each stratum.

Pause here to consider, which of these statements explains why there is a difference between the estimate and the real population? The sample was rounded and the estimated population then used those inaccurate rounded values.

Okay, onto the final practise questions.

Pause here to find estimates of populations and the size of a stratum for these two questions.

For question two, you'll need to construct your own ratio table.

And finally, question three.

By first calculating an estimate for the number of each animal in the forest, calculate the percentage difference from the true number of hedgehogs in the forest to your estimate.

Pause now for question three.

Okay, here are the answers to questions one and two.

Pause now to compare your answers to the ones on screen.

And for question three.

An estimate for the number of birds is 910 squirrels is 433, hedgehogs is 43, for the total forest population of 1,516.

The difference between the estimated and true number of hedgehogs in the forest is down to the rounding of the sample for each animal.

There is a 43.

3% difference between the true number of hedgehogs in the forest and the estimated number of hedgehogs from the sample.

And that is an amazing effort in attempting these stratified sampling questions and for thinking deeply about a range of samples in a lesson where we have considered that larger samples are usually better than smaller ones because they open up a wider variety of responses that may make the sample more representative of the population.

Also, conclusions from an investigation may be more reliable if the planning and data collection process are well designed.

Furthermore, biassed samples can be useful and aren't always negative.

Biassed samples can be proportional to the population or focus on groups in the population that are more relevant to your investigation.

And lastly, the size of a population can be estimated from a stratified sample, although there may be a difference between the estimate and the true population due to rounding during the calculation of the sample.

Thank you all so much for your effort in this lesson and throughout your learning of sampling.

That is all for me for now.

So take care and have an amazing rest of your day.