Loading...
Hello everyone, it's Mr. Millar here.
Welcome to the third lesson on statistics, and in this lesson we're going to be talking about sampling.
So first of all, I hope that you are all doing well.
Now, we're going to start this lesson, again, by having a look at this picture, this data-handling cycle.
Now, in the last lesson we talked about what a hypothesis was.
So it's a statement that we're going to use to test what we are interested in.
And we also talked about where we can look for data to test that hypothesis.
So we talked about primary sources of data and secondary sources of data.
Now, in this lesson we're going to be talking about the next stage of this cycle.
So once we've decided the hypothesis, and once we've decided where we're going to collect the data, we're now going to be thinking about sampling, which is, "Who are we going to ask? How many people are we going to ask? Who are they going to be?" That kind of question, which is really important because if we get the sampling wrong, then our data can be wrong, it can be inaccurate.
So it's really important that we get it right.
So let's have a look at the try this task.
Okay, so let's have a look.
Antoni is testing which is the most popular football team in the UK.
So maybe his hypothesis was, "Manchester United is the most popular football team in the UK." We don't know.
But anyway, that's what he is testing, and he is saying, "I will take a sample of 25 people near my home in North London." And Binh is saying, "Hm, I'm not sure this will be a fair sample." Why do you think Binh might be correct? Well, if you're thinking something like, if he's asking people in North London, the people he's asking are more likely to support teams in North London, like Arsenal, like Tottenham, and other teams, so his sample is likely to be biassed because it's not likely to be truly reflective of the most popular football team in the whole of the UK.
So he's only asking people from one area.
He's not asking people in Manchester, or Liverpool, or anywhere else.
So this is really important.
We can say that his data, Antoni's data, is likely to be biassed.
Now, bias occurs when data has not been collected fairly.
If a sample is biassed, it doesn't accurately reflect the population.
And the population, what that means, is the underlying group of people that we're interested in.
So we're interested in all football fans in the whole of the UK, not just in North London.
So this is definitely likely to be biassed, so we need to be really careful that we avoid this bias.
Okay, let's have a look at the next slide.
So as we've said, once we've decided on the hypothesis to test, we need to decide who to ask.
This is where our idea of sampling is going to come in.
The group of people we want to test our hypothesis on is called the population.
So that's what I've just said.
If we're talking about football fans in the UK, our population is going to be all football supporters in the UK.
Now, there are two ways to ask our population.
So a sample is one of them, but the other one is a census.
Now, a census is where every single member of the population is asked.
So if our population is football fans in the UK, if we were to take a census, then that would mean asking every single football fan in the UK, which, I don't know if you agree with me, but I think that might be a little bit difficult.
But anyway, censuses are very important and they have been around for thousands of years.
The first people that we know that took a census were the ancient Egyptians because when they were building those big pyramids that you might have seen before, they used a census because they wanted to gather data on every single person who lived there so that they could know how many people could help to build those pyramids.
And in the UK, the first UK census started in 1801 and we have it every single year.
So if you ask your family, they might remember the last one was in 2011.
And what happens is, they get a questionnaire and they have to fill it out, and everyone in the UK has to fill it out, each family has to fill it out.
And they ask all sorts of questions like, "Where do you live?", "How many people live in your household?", "What's your job?", "What's your income?" All those kinds of questions.
And it's really, really useful because it gives a lot of very valuable information about the UK and it helps with lots of things like planning for local services.
Anyway, that is what a census is, is when we ask every single member of the population.
And a sample, we've talked about before, is where we don't ask everyone, we ask a smaller group of the population.
So with each of these, when we choose which one to use, there are advantages and disadvantages.
Have a think.
What do you think the advantages and the disadvantages of a census and a sample might be? Okay, well, a census.
If we're asking every single person, then our data is definitely going to be accurate and it's definitely going to be unbiased.
Because we're asking every single person, and if we're asking everyone, then we're going to have a complete set of information.
It's going to be great.
But what's the problem? Well, the problem is, it's that if our population, if the people that we need to ask is really, really big, then it's going to take forever to ask everyone, and it's also going to be very expensive.
It costs a lot of money to carry out a survey on a large group of people.
Now, if our population is actually quite small, then it may not be too difficult to do a census.
It all depends on how big the population is.
But if it's, for example, football fans in the UK, then, you know, we're not going to be able to ask millions and millions of people what their favourite team is.
We will want to ask a sample in that case.
Anyway, the advantages and the disadvantages of a sample, what do you think? Well, the advantages is that, is obviously because we're only asking a smaller group of people, it's going to be much less expensive and it's going to be much quicker.
And we can do it in a short space of time and we can get some results, and if we carry out the sample, if we choose the correct sample, then it can be accurate and it can be unbiased.
But the disadvantage is that it's quite difficult to do this because if we do not collect a sample that is representative of the underlying population, it's going to be inaccurate and it's going to be biassed.
For example, in the previous example, if we were interested in football fans in the UK and what their favourite team was, and we only went to North London, then our sample is definitely going to be inaccurate, it's definitely going to be biassed because we are only asking in a specific area, which is likely to support certain teams. So really important that when we do a sample, we always try to make the number of people as big as we can because that would make it more accurate, but it must represent the underlying population so that it's not biassed.
Great.
Let's have a look at the independent task.
But before you do that, I just want to tell you a quick story about the dangers of choosing an incorrect sample.
So this guy here in the photo, you may recognise him from your history because he was actually a US president, who was president in the Second World War, for America.
He was a guy called Franklin Delano Roosevelt, or FDR for short, and he was president for a long time.
And there was an election in the US in 1936 between him, FDR, and a guy called Alf Landon.
And this poll from Reader's Digest asked 2.
4 million people, "Who are you going to vote for? Are you going to vote for FDR, or are you going to for Landon?" And 2.
4 million people answered the survey and it was the biggest survey that had ever been compiled of this kind.
And the results came back and they said, Landon is going to win big time.
He's going to win 57% of the vote, FDR is only going to get 43%.
But election day actually came, and it turned out that this huge poll, this huge survey, was massively, massively out because FDR won with 62% of the vote, and Landon only got 38%.
So what happened here? Can you think why might this poll have been so wrong? Well, if you're thinking that the sample was biassed, then yes, it was biassed.
Really well done.
And it was biassed for a very specific reason because back in those days, how they did polls was, they sent out a letter to different households, or sometimes they did a telephone survey, and in this case, they did a telephone survey.
But back in 1936 not every household had phones, they were much less popular because back then they were much more expensive.
So the people that answered this poll tended to be people that were wealthier, that had phones.
And wealthier people tended to support Landon, who was the Republican candidate, whereas poorer people, who didn't have phones and who didn't answer the survey, tended to support FDR.
So these people, the people that were poor, that didn't have phones, were underrepresented in the poll.
And when the actual result came, it was a big shock because the survey, the poll, was biassed, it did not accurately reflect the underlying population.
Anyway, it's an interesting story.
I'd recommend that you check it out if you're interested because these kind of things still happen nowadays with modern politics in the UK and elsewhere, where polls can get it wrong, so it's a really interesting topic.
But in any case, let's move on now to the independent task.
Okay, So here a couple of questions, so pause the video now to read through these questions and write down a couple of sentences on each one.
Pause the video for six or seven minutes.
Pause the video now.
Okay, let's go through these.
So the first one, we are testing the hypothesis that voters in the UK care more about the NHS than education.
Well, the population here, his hypothesis is interested in voters in the UK, so the population is all voters in the UK.
And, you know, you should definitely take a sample here because the census is going to be massive.
There are over 15 million voters in the UK.
I think it's quite a bit more, actually.
So asking all of these people is going to take a very long time, it's going to be very difficult.
So a sample would be fine.
You make sure that it is a big enough sample and you've got to make sure that it's unbiased.
So you're not asking one group of people in particular.
So you're asking people who live all around the country, who come from all different kinds of backgrounds, different kinds of jobs, different ages, etc.
Second question is a question about the local gym.
So Malcolm is asking people on a Monday morning.
And if he wants to know whether this gym is good or not and he's asking people that go to that gym, well, this is going to be biassed because people that go to that gym are more likely to say it's good, of course.
If you go to that gym, you're more likely to like it.
So he's better off asking not just people that go to that gym, but people that go to other gyms, or people that don't go to any gym at all.
Another thing I said, is that if you're asking on Monday morning, that might attract a certain type of people that go to the gym.
So you're better off asking people throughout the week so you get a fair, representative sample of people that go to the gym.
Okay, great, let's move on to the explore task.
So as a reminder, if you've been watching the lesson so far this week, which I'm sure that you have, we're talking about a designer clothes brand called "Cool Sports Inc".
They want to hire one of you to do some data analysis, so tomorrow you're going to be doing a whole big thing, where you write a pitch.
But today they are talking about samples and they had this idea in the green box.
So the sample we're thinking of taking is a survey of 50 people on the TikTok app.
So you need to tell them why their sample is likely to be biassed and what approach to sampling you would recommend instead.
So two questions: Why is the sample biassed? What would you do instead? Have a think about this and, yeah, it's going to be useful for you in terms of next lesson's task.
Anyway, that is it for today.
I hope you've enjoyed it.
I hope you enjoyed that story.
It's a really interesting topic so I'd encourage you to do a bit more research if you want to.
And thanks very much for watching.
I will see you next time.
Have a great day.
Bye-bye.