Lesson video

In progress...

Hello, My name is Dr.

Rowlandson, and I'm excited to be guiding you through today's lesson.

Let's get started.

Welcome to today's lesson from the unit of graphical representations of data.

This lesson is called Constructing Scatter graphs, and by the end of today's lesson, we'll be able to construct scatter graphs from data presented in a number of different ways.

Here are the keywords that we'll be using in today's lesson.

Bivariate data is data that has two variables, where each data point for one variable has a corresponding data point for the other variable.

It's data that comes in pairs.

A scatter graph is a visual representation of bivariate data, and it can be useful when determining the nature of the relationship between the variables.

An independent or explanatory variable is a measure that is predicted to have an effect on another variable.

And a dependent or response variable is a measure whose value is predicted to be affected by an explanatory variable.

This lesson will have two learn cycles, with the first learn cycle focusing on plotting scatter graphs, and the second learn cycle focusing on deciding which axis to use for each variable.

To begin with, let's start with plotting bivariate data on a scatter graph.

Here we have Alex, Andeep, Izzy, and Sam.

And they compete in a sports competition.

Then they plot the long jump distances on the graph.

Here's Alex's long jump distance.

Here's Andeep's.

Here's Izzy's.

And here's Sam's.

This is a pretty basic graph, but we can still see a few things from it.

What observations could we make from this data? Pause the video, have a think about what observations you can make, and press play when you're ready to continue.

Some observations we could make are things such as Sam jumped the furthest and in this case, Alex jumped the least far.

You can also see that Izzy jumped 5.

1 metres, and Andeep jumped around about 5.

4 metres, for example.

There's quite a few things we can gather from a simple graph like this.

They also plot the high jump distances on another graph.

So, we have Alex's high jump distance here.

Here we have Andeep.

Here's Izzy.

And here is Sam.

Again, it's another simple graph, but what observations could we make from the data this time? Pause the video, have a think, and press play when you're ready to continue.

Once again, it's a basic graph, but we can see a few things from it, such as who jumped the highest in this case? It was Sam.

Who jumped the least high? It was Izzy.

And that Andeep jumped maybe around about 165 metres, centimetres in this case.

They then look at both graphs side by side so we can see each person's high jump distance, and each of the same people's long jump distance as well.

Now we can see both sets of data side by side.

What connections could be made between the two sets of data? Pause the video, have a think, and press play when you're ready to continue.

We can see that Sam has jumped the highest and also jumped the furthest, followed by Andeep in both cases.

So, there may be a connection there about people who jump further may also jump higher as well.

But it doesn't follow that pattern perfectly because we can see that Alex jumped the third highest and Izzy jumped the fourth highest.

But when it comes to long jump, Izzy jumped the third longest and Alex jumped the fourth longest.

But it does seem to be some connection there perhaps.

In this scenario, each person has two values of data that can be combined together to make a pair of bivariate data.

So, for example, Alex has a long jump distance of 4.

8 metres and a high jump distance of 148 centimetres.

Those two pieces of data can be combined together to make a pair of bivariate data.

And that's what bivariate data does.

It compares the values of two variables by pairing each value of one variable with the value of another variable.

Long jump and high jump, for example.

And whereas previously we saw these two pieces of data plotted on two different graphs, bivariate data can be represented together in the same graph by plotting a scatter graph.

Let's look at that now.

Here we have Alex's bivariate data.

It's long jump distance and his high jump distance.

And on the scatter graph, we can see we've got the horizontal axis, which says long jump in metres, and the vertical axis are for high jump in centimetres.

And we can plot Alex's bivariate data on that scatter graph in the same way that we plot coordinates.

We wanna go along 4.

8 metres for long jump, and then we're going to go up 148 centimetres for high jump, and in the place where those two bits intersect we can plot Alex's data.

And here's Andeep's bivariate data.

His long jump distance was 5.

4 metres, high jump distance was 165 centimetres, so we can plot Andeep's data there.

Here we have Izzy's long jump distance of 5.

1 metres, and high jump distance of 122 centimetres.

We can represent Izzy there.

And Sam's long jump distance was 5.

7 metres and high jump distance was 186 centimetres.

We can represent Sam there.

Now we can see the bivariate data represented on the same graph.

What observations could we now make from this data? Pause the video, have a think and press play when you're ready to continue.

It looks like people who are good at long jump are also good at high jump.

We've got Sam, who has the greatest long jump distance and high jump distance, followed by Andeep.

And we've got Izzy and Alex, who are both jostling for third and fourth place.

So, it looks like there might be a connection between long jump distance and high jump distance, but we've only got these four people to base it on.

It might just be that that just happens to be the way it fell for these particular four people.

So, before we can decide whether or not there really is a connection between long jump and high jump, how could we be more confident about these observations? Pause the video, have a think, and press play when you're ready to continue.

One way that we can explore this connection further, and be more confident that a connection does exist between these two variables is to collect some more data and see if it follows a similar sort of pattern.

So, that's what they do.

They explore the connection further by adding more people's data to the scatter graph.

We've got Aisha, whose long jump was 5.

9 and high jump was 152 centimetres.

There's Aisha.

We've got Jun, long jump 5.

1 metres, high jump 141 centimetres.

We've got Sophia, whose long jump distance was 5.

2 metres, high jump 134 centimetres.

We've been representing each person so far by plotting a picture of their face on a Scatter graph, but I'm starting to have some problems with this.

What problems could there be with plotting pictures of faces on scatter graphs? Pause the video, have a think and press play when you've got an idea.

One problem with using faces is that it's not very accurate to read.

We can see Alex's face on that scatter graph, but which part of his face represents the precise point of his data? It's difficult to tell, now we can't see the numbers, whether he got 145, 150 for his high jump.

So, it's not very accurate to read.

Also, we can see now we've got quite a few pieces of data, but the pictures are starting to overlap.

The more data we plot in here, the less clear it's going to be to see, and it's going to be more and more difficult to read, and it's going to get really, really overcrowded with more data.

So, while it was quite nice to plot the faces, on scatter graphs to begin with, we can see the problems with it.

So, what other things could we use to plot the points? Well, there tend to be a few different ways of plotting data on scatter graphs.

But two of the most common ways tend to be plotting points as dots on a scatter graph or plotting points as crosses on a scatter graph, where each dot, or cross represents a pair of bivariate data.

And if we add more data now to these scatter graphs, we can see that even though there's a lot of data on those graphs, it's still quite clear where each dot is and it's not becoming too overcrowded straight away.

So, bivariate data can be represented on a scatter graph by plotting points.

And we'll be using dots as our points for the remainder of this example.

Each point on this scatter graph represents a pair of bivariate data, which in this example means it represents one of those people.

For example, this point here represents Alex's data because it's 4.

8 across on the long jump and it's 148 up on the high jump.

This dot here represents Andeep, and this dot represents Sam.

Data can now be added to this scatter graph by plotting points.

For example, for Jacob, we can see Jacob's long jump distance was 5.

9 metres and his high jump distance was 135 centimetres.

So, we can plot Jacob's point there.

And we can do the same for Laura.

Laura's long jump was 5.

3 across the horizontal axis and 173 up the vertical axis.

We can plot Laura's data there.

Okay, let's check what we've learned so far.

Here we have a scatter graph with various pieces of bivariate data on, and we've got Jun's data in the top corner.

Jun's long jump distance was 5.

1 metres, and high jump distance was 141 centimetres.

Which point on that scatter graph represents Jun's data? Is it A, B, C, or D? Pause, have a go and press play when you're ready for the answer.

The answer is A.

We can see that point A is 5.

1 across on the horizontal axis, and 140 and a little tiny bit up on the vertical axis.

Let's check that in another way now.

Here we have a table with three people's bivariate data on and we have a scatter graph.

And one of the points in the scatter graph is highlighted.

Which person's data is highlighted in a scatter graph? Is it Izzy, Aisha, or Sophia? Pause the video, write down the name of the person and press play when you're ready to continue.

The answer was Aisha.

We can see that because it's 5.

9 across on the horizontal axis, and it's 152 up in the vertical axis.

Okay, over to you now for task A.

This task contains two questions and here's the first one.

The table shows weather data from Bradford for five months in the year of 2021.

In the table, we can see one set of data is about the total rainfall.

That's how much rainfall fell within a single month, and that's in millimetres.

And the other data is about how many hours of sunshine were visible that month in total.

And we've got the months from February down to October.

In this task, you need to plot data from that table on the scatter graph.

Pause the video, have a go, and press play when you're ready for question two.

And here's question two.

Jacob runs a series of long-distance races.

For each race, he records his time and the elevation of the course.

And we can see the table below.

It's got two sets of data elevation, which is in metres, and how much time it took in seconds.

So, whereas in the examples earlier, each pair of bivariate data represented a different person, in this question, each pair of bivariate data is for Jacob, but each one represents a different race.

Your job in this question is to plot Jacob's data onto the scatter graph.

Pause the video, have a go, and press play when you're ready for the answers.

Well done with that one.

Here's what question one should look like after we've plotted a scatter graph.

And here's what question two should look like after we've plotted a scatter graph.

Great job so far.

We're now on to the second learn cycle, which is all about deciding which axis to use for each variable.

Here we have Lucas.

Lucas conducts an experiment where he times how long a kettle takes to boil for different volumes of water.

And here's the data from his experiment.

Across the top row, we have the volume of water, which is in litres, and goes from 0.

25 litres all the way up to 1.

5 litres.

And the bottom row, we've got the amount of time it takes for the kettle to boil in each case, and that is measured in seconds, with the lowest being 34 seconds and the highest being 237 seconds.

Based on Lucas's results, what observations could Lucas make from this data? Pause the video, have a think, and press play when you're ready to continue.

One thing that Lucas might think is the volume of water appears to be affecting the amount of time it takes for the kettle to boil.

We can see for the low amounts of water, 0.

25 and 0.

5, there's a low amount of time, 34 and 59.

But for high amounts of water, 1.

25 litres and 1.

5 litres, they're the highest times, 188 seconds and 237 seconds.

So, it looks like there might be a connection there.

In some cases, with bivariate data, but not all cases, in some cases with bivariate data, we might predict that one variable is affecting the values of the other variable.

In these cases, we can describe the variable that is predicted to be affecting the other one as the explanatory variable because we are predicting that changes in this variable explain the changes in the other variable.

The variable that is being affected is the response variable because we are predicting here that this variable is responding to changes in the other variable.

And it looks something like this, where we are predicting a cause and effect.

Changes to the explanatory variable cause changes to the response variable.

In this situation, the volume of water appears to be affecting the amount of time it takes for the kettle to boil.

So, we are suggesting here that the explanatory variable is the volume of water in the kettle and the response variable is the time it takes for the kettle to boil.

So, we could use the phrase explanatory variable and response variable or alternatively, one variable could be described as independent and the other as dependent.

In these cases, it is predicted that the value of the dependent variable is being affected by changes to the independent variable.

So, it looks like this.

Got the independent variable.

And when that changes, it causes a change to the dependent variable.

The values of the dependent variable depend on the values of the independent variable, but not the other way around.

So back to this example with the volume of water, and the time it takes the kettle to boil.

We could argue that Lucas can choose how much water he puts into the kettle.

Therefore, that is an independent variable.

And the time it takes for the kettle to boil seems to depend on how much water he puts in it.

Therefore, that variable is the dependent variable because the time it takes for the kettle to boil depends on the amount of water in the kettle.

Let's look at another example of this.

Sophia and Aisha are analysing some bivariate data about ice cream sales in a day and the maximum temperature for each day.

They're trying to decide out of the two of those, which is the dependent variable and the independent variable.

So, we've got ice cream sales and maximum temperature.

Which one do you think is the dependent and independent variable? Pause the video, have a think and press play when you're ready to continue.

Let's consider what it would mean if it was this way round, where ice cream sales are the independent variable and the maximum temperature for the day is the dependent variable.

This would suggest that if a vendor sells more ice creams, then the day would get hotter because the temperature depends on how much ice cream is sold in a day.

That doesn't really seem right.

When we think about it this way round, where the ice cream sales are the dependent variable and the maximum temperature is the independent variable, then this would suggest that if the day is hotter, then a vendor may sell more ice creams. That seems more plausible than the other way around.

In the example so far, it's been pretty clear which thing is affecting the other thing.

So, which is the independent variable, which is the dependent variable.

But this is not always clear.

Sometimes it's not clear which one is the independent variable, and which one is the dependent variable.

For example, wind speed and the amount of rain.

Does the wind speed affect the amount of rain, or does the amount of rain affect the wind speed, or do they not affect each other in any kind of way? So, it's not always the case that two variables are connected at all, and even if it does seem to be that the variables are connected it doesn't necessarily mean that one is having an effect on the other.

It could be that they're both affecting each other or that there is a third variable that is affecting both.

Bivariate data is not always a case of cause and effect.

So, in these cases, we may avoid defining variables using terms such as independent or explanatory, and dependent and response.

Let's check what we've learned then.

So, which two terms describe a variable that is predicted to be affected by another variable? Is it A, dependent variable, B, explanatory variable, C, independent variable, or D, response variable? Choose two.

Pause the video, have a go, press play when you're ready.

Our answers are A, dependent variable, and D, response variable.

Which two terms describe a variable that is predicted to be affecting another variable? Same responses again.

A, dependent variable.

B, explanatory variable.

C, independent variable.

And D, response variable.

Pause, choose two options and press play when you're ready for the answers.

The answers are B, explanatory variable and C, independent variable.

And let's check that now in another way.

An athlete times how long it takes to complete obstacle courses of varying length.

They record course length in metres and time to complete in seconds.

Which is the dependent variable in this case? Is it A, the course length, B, the time to complete, or in this case, is it C, neither of them are dependent variables? Pause the video, choose one, and press play when you're ready for the answer.

The answer is B, the time to complete.

As the length of the course gets longer, it's natural to take longer to complete the course.

Back to our example now with Lucas and his experiment.

Lucas wants to plot his data onto a scatter graph.

He's got data about volumes of water and about time.

And he's wondering which data should I label on each axis.

There is a convention behind this.

That is when plotting bivariate data consisting of an independent variable and a dependent variable, the order of the axis is important.

The horizontal axis is used for the independent variable and the vertical axis is used for the dependent variable.

So, in this case, we've got the volume of water in a kettle is the independent variable.

We should put that on the horizontal axis.

And the time it takes for the kettle to boil is the dependent variable.

We should put that on the vertical axis.

So, Lucas has plotted his scatter graph and he's plotted his points, but there are two missing.

Why are the two points missing from this scatter graph? Pause the video, have a think about what problem Lucas is having and press play when you're ready to continue.

The problem is that the axis for time only goes up to 150, but there are two times which are greater than this.

So, Lucas hasn't been able to plot 188 or 237 on his scattergram.

So, when plotting scatter graphs, as well as deciding which axis to use for each variable, we also need to choose a suitable scale.

So, Lucas tries again, and this time he's managed to plot all his points in a scatter graph, but they're all bunched together.

Why is there so much unused space in a scatter graph? Pause the video, have a think about what the problem is this time, and press play when you're ready to continue.

The problem this time is that the axis go to numbers that are far beyond the limits of Lucas's data.

So, while it's good to be able to capture all the points in the graph, it's good to be able to make use of as much space as possible on the scatter graph, so it's clear to see what's going on.

So, what scale should Lucas use? We can see that the maximum data for water volume was 1.

And we can see across the horizontal axis, there are three large squares.

Therefore, if we use an interval size of 0.

5, we'd have 0.

5, one, 1.

That would fit on the graph nicely.

And the maximum amount of time, it was 237 seconds.

Now, there are three large squares on the vertical axis.

I don't think we could have 237 right at the top, but if we have an interval size of 100, then we've got 100, 200, 300.

That's enough to capture that highest point, but not so much that there's lots of wasted space above it.

So now the scatter graph is ready to plot the data, and it looks a bit like this.

Let's check what we've learned there.

In the table, we have bivariate data representing course length and time to complete.

Which variable from this data should go on the horizontal axis of a scatter graph? Is it A, course length, B, time to complete, or is it C, either? Pause the video, have a go, press play when you're ready for the answer.

The answer is A, course length.

That is the independent variable.

Therefore, it should go on the horizontal axis.

Here we have a graph.

We've got maximum temperature in degrees Celsius going across the horizontal axis.

And we've got amount of ice cream sales in pounds of the vertical axis.

True or false, the axis have been labelled correctly.

Choose true or false and then justify your answer.

Here are your options.

A, the dependent variable goes on the vertical axis, and B, the independent variable goes on the axis.

Pause, have a go and press play when you're ready for the answer.

The answer is True.

And it's true because the dependent variable does go on the vertical axis, which in this case is the ice cream sales.

The ice cream sales depends on the temperature of the day.

In this question, we have one column from a table of bivariate data, and we can see the vertical axis of the scatter graph to represent that data.

And the question is, which would be the most appropriate scale for the vertical axis in this data? Would it be A, going up in 20s, B, going up in 50s, or C, going up in 200s? Pause the video, make a choice, and press play when you're ready for the answer.

If we went up in 20s, we'd have 20, 40, 60, 80, we wouldn't cover all the data in the table.

If we went up in 50s, we'd have 50, 100, 150 and 200.

That would cover all the data in the table.

I went up in 200s, we'd have 200, 400, 600, 800.

That would also cover all the data in the table.

However, all the data will be within the first interval because the highest data point is 189.

So, we have three big squares of wasted space.

Therefore, the most appropriate scale would be B, going up in 50s.

Okay, over to you now for task B.

This task contains two questions.

In question one, you're presented with pairs of bivariate data in A, B, C, and D.

And for each pair of bivariate data, we need to determine which one is the independent variable, and which one is the dependent variable.

And in these cases, there will always be a dependent and independent variable.

So, for example, Ice cream sales we can mark as dependent and maximum temperature of the day would mark as independent.

So, pause the video, write dependent and independent next to each one and press play when you're ready for question two.

And here's question two.

The data on the table shows ages and heights of a sample of six children.

We need to plot this data on a scatter graph.

And this time we need to do the whole thing ourselves from scratch.

We need to decide which axis to place each variable on.

We need to choose a scale for that axis, and then we need to plot the points.

Pause the video, have a go, and press play when you're ready for the answers.

Well done.

Let's go through the answers now for questions one.

In 1A, we've got the amount of water in a kettle and the time it takes for the kettle to boil.

We've seen this example already.

The first one is independent.

The second one is dependent.

In 1B, we have the time it takes to complete a running course and also the steepness of the running course.

It's likely that when you have a really steep running course, it's going to slow you down.

So, the steepness of the running course is the independent one and the time it takes is the dependent one.

In C, we've got the age of the child, and the height of the child.

It's likely that as a child gets older to begin with, they'll get taller and taller.

So, the age of the child is independent, and the height is dependent.

And D, the number of floods reported in a month and the amount of rainfall in the month.

Well, the more rainfall there is, the more likely it is for there to be a flood.

So, the rainfall is independent, and the number of floods is dependent.

And in question two, we need to label the axis.

We'll put age across the horizontal axis because that one is the independent variable and height across the vertical axis because that's the dependent variable.

We then need to choose a scale.

It seems appropriate to go across in fives on the ages because then we go up to 15 and that's just above 11, which is the maximum age in the data.

And it seems appropriate to go up in 50s in the height because it goes up to 150 in that situation.

And that just about captures our highest height, which is 138.

However, other scales can work as well.

And then we plot our points, and it looks a bit like this.

Whatever scale you used, you should still have the same shape, but it may be more or less clear depending on how bunched together your points are.

Great work today.

Here's a summary of what we've learned.

We've learned how to plot bivariate data onto a scatter graph.

And data that is presented as a paired list can be represented on a scatter graph, as can data presented in a table.

We've also learned which way round the axis go.

We've learned that the independent variable goes on the horizontal axis, and the dependent variable goes on the vertical axis.

Well done.

I've finished the video