Lesson video

In progress...

Hello, I'm Mrs. Lansley.

And I'm gonna be working with you as we go through the lesson today.

I'm really hoping you're ready to try your best and make the most of this lesson.

So our learning outcome today is to be able to design and use data collection sheets for different types of data.

The Statistical Enquiry Cycle is a cycle used to carry out statistical investigation.

There are five stages.

It's an ongoing process as an evaluation may lead to a new or refined question.

And we're going to discuss the Statistical Enquiry Cycle during today's lesson.

So it's something you should have covered previously, but we will discuss it during the lesson.

So our lesson has got two learning cycles.

The first one we're going to focus on what are bad questions and then the second one, we are going to then write good questions.

So if we know what a bad question is, we hopefully can avoid it in our learning cycle too, where we are writing the good questions.

So let's make a start at looking at these bad questions.

So the Statistical Enquiry Cycle which has got five stages is on the screen, and we are really gonna be focusing on the second stage, which is collecting.

And during this stage your method is designed and completed, and it's going to include things like whether the data is primary or secondary.

How the data collection sheet will be formed, for the type and volume of data.

So really in your planning stage you'll have thought about who you're going to ask? How it's going to be collected? And in this stage you're going to do the collection.

So Sam has decided that they will ask every pupil in their school what their favourite subject is.

They design a data collection sheet that looks like this.

Write down your favourite subject.

So it's basically a sheet of paper with a question or a statement for them to do.

And that's their data collection sheet.

They're gonna ask every pupil in their school.

So what are the limitations of this data collection sheet? So just think about that, pause the video.

What are the limitations of this data collection sheet? Press play when you are ready to check what I came up with compared to yours.

So it's time consuming.

Every pupil in their school, if you imagine if Sam is in a sort of general secondary school, there's probably up to or more than a thousand pupils.

So that's gonna be quite a time consuming data collection sheets.

It's going to be very difficult to process after the collection because of how many answers there are in just a random order.

But what are the positives of this data collection sheet? So pause the video and consider that, if you're next to somebody you might want to discuss with them what are the positives of a data collection sheet that looks like this? Press play when you're ready to compare with mine.

You've got the full choice of subjects for the respondent.

It's not limiting them to Math, English, Science, PE.

They can write any subject that is their favourite.

It's also less likely to be swayed by previous answers because of it being.

It's going to be quite a busy looking data collection sheet.

You're not going to be able to really pick up whether somebody else has said, PE and you don't want to look different by putting Math.

So the data sheet might have been better if it looked like this.

So which subject is your favourite? And they've got options there.

And then there's a tally chart to complete.

So the tally chart is going to make the processing much easier after the data.

So remember that was a limitation on that previous sheet that if you had nearly a thousand or maybe more than a thousand responses to your question, you're going to have to process that data in the processing and presenting stage of the Statistical Enquiry Cycle.

And so having to dredge through all of that would be time consuming.

Whereas, if you've got them to put their mark in a tally or you place their response in a tally, then you're gonna have already sort of done a little bit of processing.

The option of the other allows all respondents to answer.

So that was one of the positives of the last data collection sheet.

That any subject could be written down by the respondent.

Here, we've got most of the sort of generic ones, but then we do have the option of other.

And so if one of those generic ones are not one of the ones that's your favourite subject, you would then say other.

Regardless of the style of data collection sheet, the question that is posed is really important.

So we know we need a data collection sheet and it depends on the sort of type of data you're collecting and the volume of data that you are collecting.

But what question you ask them is important because otherwise your data could be quite poor in terms of what you've collected.

So Andeep wishes to find out about sleep habits.

So he ask the question, "How much sleep do you get?" How would you respond to Andeep's question? Just think about that in your head.

So Lucas says, "Quite a lot actually." Alex says, "About seven hours." And Jun says, "Well, it depends on lots of factors." Did you come up with anything like their responses or did you have a different response? So do you think Andeep got the responses that he was actually expecting when he asked the question? I don't think he did.

And I think he was probably expecting more quantitative data, numerical data like Alex's.

Alex sort of gave him a timeframe of how much sleep.

So one of his issues with the question he asked was the lack of timeframe.

And maybe that was your thought when you read it.

Maybe you thought, "What do you mean how much sleep do I get? How much do I sleep get when?" So it isn't clear if you should respond about your daily sleep habits, your weekly sleep habits, your monthly, or even your yearly sleep habits.

So this is a reason this is a bad question.

It doesn't give us a timeframe.

So the responses that Andeep has could be variety of answers depending on different timeframes.

So Andeep's respondents write their answer to his improved question.

So "On average, how much sleep do you get per night?" So we now have a timeframe.

Here are some of the responses.

So 7 hours, seven hours, 420 minutes.

So what difficulties are there when it comes to the next stage of the Statistical Enquiry, the processing stage? So just think about that.

He's improved his question.

He's put a timeframe, we know we're discussing nightly sleep.

And these are three responses that he receives to his open text.

So he's got the same response, but they're written in different ways.

7 hours with the digit 7.

Seven hours using the word seven.

420 minutes using this different unit.

So different units of measures have been used, hours and minutes.

So response boxes are a way of avoiding this issue because those open text responses, if you think about the volume of data, if he was collecting it for his whole school or even his whole year, if he gets all of these different responses that actually are equivalent to each other, in the processing stage, he's gonna have to clean that data.

He's gonna have to decide which unit he wants to use, make all of them digits rather than words.

And that's gonna take time as well.

So Aisha wants to find out about how often teachers set homework.

She designs this question using response boxes.

So she's avoided the open text sort of issue.

People can respond exactly the same way but differently.

So how often do your teachers set homework is her question.

And her response boxes are very often, quite often, often, rarely, never.

Laura and Alex are in the same classes, so should respond in the same way they receive the same amount of homework from their teachers because they're in the same classes.

However, they do not.

So why might this be? So pause the video.

Again, if you're with somebody you could discuss this with them.

Otherwise, just think to yourself, why might two pupils of the same classes respond differently to this question? Well, the options, although there are response options which will help Aisha on her processing stage, they're subjective and vague.

So Laura may think the teacher rarely sets homework.

Whereas for Alex, it might feel like the teacher is setting homework quite often.

So it's going to depend on their point of view of which of those words rather than the actual amount the teacher is setting.

So response boxes using often or too little, just right, rarely, quite a bit, are ambiguous and mean different things to different individuals so should be avoided.

So being very careful about what you choose as your response boxes.

So Sophia wants to investigate the number of countries pupils of her age have visited and she designs this question.

So how many countries have you visited? 0-5, 5-10, 10-15 or more than 15 or 15 plus.

First she asks Andeep.

"I've visited seven countries." So he ticks the 5-10.

Then she asks Laura, "I've visited four countries." So she ticks the first response box, 0-5.

Then she asks Jacob.

And he completes the question like this, Why do you think he's done it like this? Why did he tick two? Again, you might want to pause the video whilst you think about it or discuss it.

And press play when you're ready.

So he has visited 10, which appears in two places, 5-10, 10-15 so he ticked both.

Response boxes shouldn't have an overlap because it causes confusion for the respondents.

They won't know which one to tick and therefore two people that travelled to 10 countries, one might tick the 5-10 box and the other might tick the 10-15 box.

Or they might tick both.

And so when you come to processing the data, you're gonna have this sort of inaccurate data.

So Laura is a member of the school council.

And the school council wants to increase the number of pupils recycling their plastic bottles.

They write this question to try and gauge how the school body view recycling.

So recycling plastic bottles is really important as it reduces the amount of landfill, reduces CO2 emissions and consumes less energy than making new bottles.

Do you agree that recycling plastic bottles is really important? Yes, no.

Is this question a fair question to ask? Pause the video, consider it, think about it.

Press play when you are ready.

So no, we would say that this is a leading question.

It starts with this statement about why recycling plastic is really important? And then it says, do you agree that it's really important? So let's have a check about looking at some bad questions.

So how could this question be improved? How much TV do you watch? And then there's an open text response line.

So pause the video and then when you're ready to move on, press play.

So A and B would both improve the question.

So adding a timeframe, 'cause how much TV do you watch doesn't tell us over what period of time.

And providing response boxes so that we avoid having multiple answers that actually mean the same thing but are written in different ways.

So the question has now been improved.

How much TV do you watch a day? So there's our timeframe and we've removed the open text and we've put response boxes.

But what is still wrong with it? Pause the video, look through it and then when you're ready, press play.

So now there are overlaps.

So if you did watch five hours of television, do you tick the first box or the second box or both boxes? There isn't an option for someone who watches less than an hour of TV per day.

Maybe you don't watch TV at all.

There isn't an option for you to respond to this question.

So we're up to the task.

So the first task for each question, identify why they are bad, if applicable.

So what's making them a bad question? And then question two, for each question from question one, write or select an unhelpful response to highlight the issue with each question, where applicable.

So pause the video, and when you've done question one and two, we'll go through the answers.

Question one, why are they bad, if applicable? So the first one, how much water do you drink? There is no timeframe.

You may have also said there's no response boxes, but the timeframe is the main issue here.

Part B, reading is proven to make you smarter.

Do you read books? That's a leading question.

It gives you something that you are going to influence your answer.

Part C, how many vegetables do you eat in a week? A little, some, enough, lots, too much, other.

The responses are ambiguous.

A little to you could be different to a little to me.

Part D, how many bedrooms are in your house? 0-2, 3-5, 6+.

There's no issues with that question.

There's no overlaps on the response boxes.

And part E, how many minutes of revision do you do per night? 0-20, 20-40, 40-60, or more than 60.

So we've got the units within the question so from a processing point of view, we are only using one unit.

There is a timeframe per night, we've got response boxes, but it's the overlap that is the issue.

Question two, you needed to write or select an unhelpful response to highlight the issue.

So you could have had quite a lot of fun with this and maybe you wanna discuss with your partner, if you're sat next to someone, what they came up with.

So part A, how much water do you drink? Loads.

Because there wasn't a timeframe because there wasn't response boxes or any kind of unit, then your response could be very varied.

None, you might say, "I never drink water." Part B, reading is proven to make you smarter.

Do you read books? So this was a leading question, you were gonna say yes or no, but one way of responding to no, I'm smart enough.

Okay, so being careful with what responses you could receive from your questions.

Part C, how many vegetables do you eat in a week? A little, some, enough, lots, too much, other.

And so it might be, "Other, it depends.

Is a pepper a vegetable?" So silly responses are not helpful, but your question might lead people to be able to answer with a sort of silly response.

D was a good question.

So there isn't an unhelpful response.

If you tick the 0-2, we know you've got 0-2 bedrooms in your house.

But E was one with the overlap.

So ticking two of the boxes isn't helpful when it comes to processing the data.

We don't know if you're in the 20 to 40 range or the 40 to 60.

Now we're gonna look at the second learning cycle, which we are going to write good questions.

We're going to try and consider all those things that make a bad question and not do that, but instead write good ones.

So Jun is writing a question to find out about time spent listening to music.

He starts with this basic question.

"On average, how many hours do you spend listening to music?" It says on average, so it doesn't need an exact answer.

People can sort of estimate.

How many hours? We've got a unit there.

Do you spend listening to music? One thing that it doesn't have, which makes it sort of a poorer question, is it doesn't have that timeframe.

So to avoid confusion with how it is responded to.

So let's start that question again.

On average, how many hours do you spend listening to music each week? So once again, the average part allows people to estimate, they don't have to worry about not being exact.

Hours is the unit that you're asking them to respond with.

And each week, so they're now thinking about their habits over a period of a week.

If this was an open text response question, there might be a whole host of answers which will make that processing more challenging.

So response boxes is a way to reduce the variety of responses.

He's improved the question once again by adding response boxes.

So on average, how many hours do you spend listening to music each week? So 0-4 hours, 5-10 hours, 11-20 hours, 21 hours or more.

So the response boxes needs have no overlap or contain emissions.

And every respondent should be able to answer the question without issues.

So we've got 0-4, if you don't listen to music at all on average, you can tick that.

If you listen to music quite a lot, then there is that 21 hours or more.

It's about how many hours.

So you might be thinking, what if I listen to four and a half hours? Well, we're trying to use whole amounts of hours on average, which is why these sort of distinct groups of 0-4, then 5-10 is fine, that we aren't sort of emitting four to five hours because we want you to say on average.

So you can decide if you're gonna say that's five hours or four hours.

So how many times have you been to the local swimming pool this month? How can this question be improved? So pause the video and think about that.

And then when you're ready, press play.

So it could be response boxes rather than an open text space.

So how many times? So it's about the amount, the frequency, not how long you are there, but how many times.

And the month, this month you've got a timeframe.

So having an open box response, people might write it with a word, they might say seven, but write seven or they might write it with a digit of a seven, which is then hard on the processing.

So if you put those response boxes in, it allows the data processing part to be more efficient.

Another check, "On average, how many hours do you spend travelling to work?" So 0-4 hours, 5-10 hours, 11-20 hours, 21 hours or more.

How can this question be improved? So pause the video and then when you're ready to check your answer, press play.

A timeframe.

We don't know on average how many hours do I spend travelling to work per day, per week, per month? Okay, so the week would be an idea of a timeframe.

So sometimes, response boxes are not always the best option.

Up till this point we've been saying actually they're better than an open text box and that's because of the data processing element of it.

But they may limit the respondents and narrow your conclusions in the outcome to your Statistical Enquiry.

And so we don't want to do that.

We don't want to narrow the responses so much that we don't actually get any fruitful evaluation at the end.

And so here is an example of maybe one where a response box doesn't support finding the information that you need.

So what is your favourite flower? Rose, azalea, iris or other? So if you use this question to find out people's favourite flower, what would you expect your results to look like? So pause the video and discuss that one.

Why is the response boxes here not a good option? The likelihood is that you are going to have many of your responses as other.

Because there are a lot of different flowers.

And so there are only three named flowers here.

So that's going to put a lot of people's response as other.

For example, maybe it's tulip, you're gonna tick other.

Or maybe it's a sunflower, you're gonna tick other.

Maybe it's a pansy, and you tick other.

It's only if your favourite was a rose, or an azalea, or an iris that you would tick one of those.

And so most people will be selecting other.

It's really reduced and narrowed the amount of responses you get.

And so if you were a marketing campaign and you were looking to use favourite flowers within your marketing campaign, it's not gonna be very helpful.

You're just gonna know not to use a rose, or not to use azalea, or not to use an iris because many people like other.

So having an open textbook for this question would allow you to get a real gauge of what flowers people like.

So here's a check.

True or false? Giving open text boxes for responses should never be used in a question.

And then justify your answer.

The processing of the data is very challenging when the responses are so varied.

Or they should be used when there would be need for many response boxes to keep the data collected open.

Pause the video whilst you decide and then press play to check.

So that's false.

We just saw that there are times when open text response boxes are much more useful than response boxes.

And that's because they will keep the data collected open, not just lead you to have other as the modal response.

So collecting the data may be done in varieties of ways and depends on many factors.

And in the planning stage of the Statistical Enquiry Cycle, you'd really think about this.

So it could be done as a digital form or a paper-based form.

It might be individual tick boxes or recorders as a tally chart.

Thinking about your data collection sheet.

And each method has got positives and negatives, which might include time, cost and practicality, which would be considered during the Statistical Enquiry Cycle planning stage.

So digital forms are becoming the more standard way to collect data.

Have there many functions built into them.

For instance, you can require a response before they can submit or move on.

That they can't just skip a question.

Whereas on a paper-based form, if they didn't want to answer that question, they just move forward onto the next question.

Digital forms have now got ways of making you respond before you can move on.

It can also restrict the number of responses to each question.

So on a response box question, it can be programmed that they can only select one answer or it might say tick all answers that apply.

Whereas again, on a paper-based form, the respondent can tick every single box or no boxes.

The data that is also collected can be processed very quickly onto a spreadsheet software.

Often these databased forms can then output the responses into a spreadsheet software, which then you can process and make your charts from.

But what negatives are there to a digital form for data collection? So just think about that for a moment, pause the video.

And then when you're ready to look at what I came up with, press play.

So one of the negatives is there are still individuals that are not confident on digital devices or have access to them.

So if you only rely on a digital form, then you are going to isolate and remove people that you could need to ask and survey.

And because of that structure that we've already spoken about as a positive where you can require a response before the respondent can submit or move on, then some respondents might just select an answer or answer just to complete the question.

It might not be a truthful or a representative response for themselves.

They just fed up and they want to move on so they just select something randomly.

So paper-based forms are the other way to collect data.

And this could be a form that individuals complete and submit.

So that could be a survey that's done through the post potentially, or it's a response that could be directly recorded onto a data collection sheet.

So it might be that you are stood in the town centre with your clipboard and you ask passerbys a question and you record their response.

Would smaller samples or larger samples better suit the individual data collection sheet? So if you are going to do one of those individual data questionnaires and people respond to it, and then submit it independently, is that better for smaller samples or larger samples? It's probably better for smaller samples because it's going to speed up the data processing stage of the Statistical Enquiry Cycle.

Because once you receive all of those questionnaires back, somebody has got to do data entry to then move on to the processing part.

So here's a check.

Data is going to be collected about toddlers' recognition of fruits and vegetables.

Would a digital form be a good way to collect this data? So pause this.

If you are with another person sat next to somebody, I would really discuss this one.

You might have differing views on this.

And then when you're ready to press play and see what I came up with, do so.

So a digital form in my opinion would not be a useful way to do this.

Because it would rely on toddlers being able to use the technology.

And there will be some that can, but not all of them.

So the data would be collected through an experiment and the results written down on behalf of the toddler.

So something like an adult sat with a group of toddlers or individual toddler showing them an image of a fruit or a vegetable.

And if the toddler knows it's a pear for example, then they would be able to write that down on a data collection sheet.

So because of the age of the individuals that are being.

The data is being collected about, then actually a paper-based form where they need to fill something in isn't very useful because they probably can't read at this age.

And a digital form would rely too much on their use of technology.

We're up to the second task of the lesson.

There's only this one question.

So Izzy wants to conduct an investigation.

She's going to collect data regarding the respondent's age, opinion on hosting the Olympic Games and which sport is their favourite to watch.

So I'd like you to design the three questions Izzy should use to collect the data.

So pause the video whilst you consider that.

Remember you are writing good questions, so think about what makes a bad question and avoid that.

Press play when you're ready to check.

So we are gonna have different answers because you are allowed to write a question to collect the data.

But here is an example response.

So the age one, what is your age? I've used response boxes.

And that's because if they are grouped, respondents will be more likely to respond truthfully.

This is one of the sort of sensitive topics about collection of data, age.

Sort of a personal factor.

And so people don't particularly like to disclose their actual age.

So the use of response boxes and grouping the data will make them more likely to be truthful.

But it's important that there are no overlaps in the group.

I've written it up to 99.

That will exclude some people in our population.

There are some individuals that have lived to 100 or more.

So it might be that you would like an open box of 100+.

The reason that we might avoid writing that though is that you would.

You know that one's gonna be quite low in terms of its responses.

It might be that you have a box that they tick if it's more than.

If it's a hundred or more and they tell you the age.

And then when you are processing the data, you could extend that last box up to the highest age.

Izzy then wanted to collect about opinions on hosting the Olympic Games.

So an example response, do you agree or disagree with the following statement? Hosting the Olympic games is too expensive.

Strongly agree, agree, neither agree nor disagree, disagree, or strongly disagree.

So because you're trying to gauge an opinion, it's important that you try and make this a very neutral question.

So not to go do you agree with the following statement, but do you agree or disagree? It allows them to do either of them.

If you say, do you agree, you might lead them to feel like they should say yes.

With the responses it's very balanced.

We've got strongly agree all the way up to strongly disagree.

So, and then there's that sort of neutral middle ground as well.

With an opinion-based question, it's important that you are trying to keep it very neutral and don't put an unintentional bias or your own opinion through the question.

And then lastly, what sport is their favourite to watch? So again, this is an example question.

You may have come up with your own.

So which Olympic sport is your favourite to watch? And I've used an open text box.

And that's because there are so many that the respondents could give.

If I use response boxes and had put maybe athletics, cycling, rowing, other.

There is a high chance that I'm gonna get lots of others because the Olympics is a multi-sport competition.

And so it's better for me to leave this an open text response that they can give their actual answer.

So in summary, for data collection, when writing question for data collection, the question needs to be clear, unambiguous and non-leading.

Questions should have a timeframe.

Non-overlapping response boxes or an open text response.

The data collection sheet should consider the type and volume of data being collected and also whether it's digital or paper-based.

Well done today.

I hope you enjoyed the lesson.

And I look forward to working with you again in the future.

I've finished the video