video

Lesson video

In progress...

Loading...

Hello there and welcome to today's lesson.

My name is Dr.

Rowlandson and I'll be guiding you through it.

Let's get started.

Welcome to today's lesson from the unit of Graphical Representations of Data with Scatter Graphs and Time Series.

This lesson is called Checking Understanding of scatter Graphs.

And by the end of today's lesson, we'll be able to plot and interpret data points on a scatter graph.

Here are some previous keywords that we'll use again during today's lesson.

So you may wanna pause the video if you want to remind yourselves what any of these words mean before press and play to continue.

This lesson contains two learning cycles.

In the first learning cycle, we'll be plotting scatter graphs.

And then the second learn cycle, we'll be interpreting scatter graphs.

Let's start off with plotting scatter.

Here we have two tables that contain data taken from the Met Office about the weather during a year in Stornoway.

The table on the left has the months February, April, June, August, and October, and it shows the total rainfall, which is measured in millimetres.

And the table on the right has the same months, but it has a total amount of sunshine that was recorded for that month in hours.

Now, Sofia says, "I wonder if these two sets of data are related." Well, the two sets of data can be combined to make bivariate data.

Bivariate data compares the values of two variables by pairing each value from one variable with a value from the other variable.

And we can see that with this table here.

Each month has two pieces of data, the rainfall and the amount of sunshine.

For example, February we can see we have 16 millimetres of rainfall and 87 hours of sunshine.

Those two pieces of data together combine to make a pair of bivariate data.

Sofia says, "I want to see if there's a relationship between these two variables, but it's not very easy to see by simply looking at the numbers in the table.

So I wonder if there's an easier way to see what's going on with this data." Well, bivariate data can be represented together in the same graph by plotting a scatter graph.

And one thing we'll need to do before we start plotting our scatter graph is consider what the scale will be on each axis.

When we look at the total rainfall, the minimum rainfall was 16 millimetres and the maximum amount of rainfall was 131 millimetres.

So Sofia says, "A sensible scale for rainfall could be from zero to 140 millimetres," because that will cover all the numbers.

For sunshine, the minimum duration of the sunshine was 87 hours and the maximum was 249 hours.

So Sofia says, "A sensible scale for sunshine duration could be from zero to 300 hours." So here we have our two axes for our scatter graphs, the horizontal one showing the total rainfall going from zero to 140, and the vertical one showing total sunshine going from zero to 300.

Each pair of bivariate variate data can be plotted as a pair of coordinates by marking a point or a cross on a grid.

Now, as we're going to plot this on a computer, we'll use points.

But if you're doing it by hand with a pencil, for example, you might find it easier to be accurate by drawing crosses instead.

Let's plot this data together now.

For February, we can see I had 16 millimetres of rainfall and 87 hours of sunshine.

So I'm going to go across on the horizontal axis 16, and up on the vertical axis 87 to plot a point here.

For April, it had 131 millimetres of rainfall and 200 hours of sunshine.

So that's across 131 and up 200.

It's here.

For June, I'm going to go across 38 and up 219.

And then for August and for October.

And there we have our five points of bivariate data plotted as a scatter graph.

Now let's remember why we plotted this in the first place.

Sofia wanted to see whether or not there was a relationship between these two variables.

She says, "There doesn't appear to be any visible relationship between these two variables, but based on the conclusion on just five data points doesn't seem very reliable.

I wonder what it would look like if I used more data." So let's do that now.

This scatter graph shows data taken from the Met Office about the weather in Stornoway for every month from 1941 to 2022.

So there are lots and lots of data points now, and we can see a much clearer picture appearing.

Using more data can make any potential relationships between the two variables easier to see.

What can you see from this data? Can you see any kind of pattern or shape appearing in the points? Does it tell you anything about the relationship between rainfall and sunshine? Pause the video while you think about that and press play when you're ready to continue.

Well, let's hear what Sofia says.

She says, "I can now see a relationship between the variables." It looks like the more rainfall there is in a month, the less sunshine there tends to be in that month.

And we can see that because if we first look at some of the points that have low amounts of rainfall, we can see a lot of those have high amounts of sunshine duration.

They are quite high up on the vertical axis.

And as we go further and further to the right on that horizontal axis, the points tend to be getting lower and lower as we go.

So it looks like the more rainfall there is, there tends to be less sunshine.

Now there are a few exceptions to the rule.

We can see, for example, there are a few points clustered in the bottom left-hand corner.

Those points have lower amounts of rainfall and low amounts of sunshine, but generally most points seem to be following that pattern.

So let's check what we've learned.

Here we have a table with some more data about the weather for four months, March, June, September, and December.

The data from the table is represented in the scatter graph we can see here.

And each point is labelled A, B, C, or D.

Which point represents the data for March? Pause the video while you write down a letter and press play when you're ready for an answer.

The answer is point A.

We can see that March has 26 millimetres of rainfall and 78 hours of sunshine, and that's what point A shows.

Which month is represented by point C? Pause the video while you write down a month and press play when you're ready for an answer.

The answer is June.

Point C is 101 across and 247 up, which is what the data is for June.

Now you might not be able to see exactly what position that point C is in, but if you compare it to the table, you can see that June is the closest to it.

So let's talk a little bit more about scatter graphs now.

A scatter graph represents bivariate data visually by plotting each pair of data as a point or a cross on a grid.

And it can be useful when determining the nature of the relationship between the variables.

There might be a relationship, there might not be, but the scatter graph will hopefully help you see that.

Now, sometimes it may appear that one variable is affecting the other variable, and in those cases, we may describe those variables as an independent variable and a dependent variable.

An independent or an explanatory variable is a measure that is predicted to have an effect on the other variable.

And a dependent variable or a response variable, as it's sometimes called, is a measure whose value is predicted to be affected by an explanatory variable.

So the independent variable has an effect on the dependent variable, or you could say the dependent variable is affected by the independent variable.

Now, earlier we saw this scatter graph, which is about the amount of rainfall and the amount of sunshine that is recorded within a month.

For this data, it would seem most reasonable that the amount of rainfall per month affects the duration of visible sunshine.

The sun is always shining, we just sometimes can't see it because it's covered by clouds.

Therefore, the amount of rainfall we get would probably have an effect on the amount of sunshine we can see during that month.

So in that case, it seems most reasonable that the independent variable or the explanatory one would be the rainfall and the dependent or response variable would be the amount of sunshine duration.

In cases where it is clear that one variable is dependent on the other variable, that should affect which way round we plot on our axes.

The independent variable should be plotted on the horizontal axis and the dependent variable should be plotted on the vertical axis like we can see here with the scatter graph, we have the independent variable, which is the total rainfall on the horizontal axes, and we have the dependent variable, which is the total amount of sunshine on the vertical axes.

Now, making this decision isn't always easy, and Sofia says, "What if I don't know which variable depends on the other?" It's not always clear.

I wonder what this data would look like if I labelled the axes the other way around.

Let's take a look at that now.

Here we have the same data presented on two scatter graphs, and the only difference between them is the way that the axes are labelled.

With the one on the left, the sunshine duration is on the horizontal axes and rainfall on the vertical axis, and on the scatter graph on the right, it's the other way around.

We can see here that generally they look very similar.

There is a difference between them, and that is that switching the axes on the scatter graph causes the data points to be reflected in the line y equals x.

But the same relationship can still be seen.

We can still see that months with low amounts of sunshine tend to have high amounts of rainfall, and months with high amounts of sunshine tend to have low amounts of rainfall.

And we can see that with both of these graphs.

So if you are worried about getting the axes the wrong way around, try not to worry too much, because even if you get it wrong, it shouldn't have too much effect on the way that you interpret the data.

But if you have established one of the variables as being independent and the other one as being dependent as a convention, the independent variable should be plotted on the horizontal axis and the dependent variable on the vertical axis.

So in this case, the one on the right would be the more conventional way to plot it.

So let's put that into practise.

Here we have Alex and Aisha who are doing a data investigation about how house prices and people's income vary in different areas of the UK.

They collect data from the Office of National Statistics, that's the ONS, for each region in the UK in 2023.

Now, I wanna stress that Alex and Aisha haven't gone around and collected that data, the data was available already on the website page.

They're using the secondary data.

Now, one of the variables they use is the median income per person in each region, and that is measured in pounds.

And the other variable is the median house price for each region, and that is also measured in pounds.

They're trying to decide which variable to plot on each axes.

Alex says, "I think the house prices in a region would affect how much people earn.

So I think the house prices is the independent variable and should go on the horizontal axes." Here we have Alex's scatter graph with the median house prices along the horizontal axis and the median annual income of the vertical axis.

However, Aisha doesn't agree with that.

She says, "I think the average income in a region would affect the house prices in that region.

So I think the income is the independent variable and that should go on the horizontal axis." So here is Aisha's scatter graph with the median annual income on the horizontal axis and the median house prices on the vertical axis.

It's still got the same shape, but the axes are plotted the other way around.

So here we have our two ideas from Alex and Aisha.

Alex thinks that the house prices are the independent variable and it affects how much people earn.

And Aisha thinks the income is the independent variable and it affects how much a house costs in a region.

Who do you agree with out of those two ideas? Pause the video while you think about it and press play when you're ready to continue.

So, I wonder what are your thoughts.

You may have had a particularly strong idea one way or the other, or you may have not had any idea at all and be completely unsure about which one is affecting the other.

And if you are unsure, that's absolutely fine.

The chances are you probably haven't had a lot of experience working with house prices and annual income data.

And if that's the case, it might not be clear to you which one is affecting the other.

Sometimes it may be clear to you which variable is the independent variable and which one is the dependent variable.

However, sometimes you may not be sure which variable might be affecting the other, especially if you do not have sufficient experience or knowledge in the subject matter.

If you are unsure in these situations, try and think about which variable seems the most likely to you to be affecting the other variable and have that as your independent variable and plot that on the horizontal axis.

But don't worry too much about getting it wrong because if you plot the axes the other way around, it shouldn't affect your interpretations of the data too much.

So let's hear an explanation from Aisha now about what she thought.

She says, "I think that people with higher incomes would have more money to spend on houses, so it would seem more likely that the average income in a region would affect the house prices in that region rather than the other way around.

So let's plot income on the horizontal axis as the independent variable." And it looks something a bit like this where we have median annual income on our horizontal axis as the independent variable, and we have the median house prices on the vertical axis as the dependent variable.

And you may have noticed another slight difference with this scatter graph, and that is that these two measures are now in thousands of pounds.

And that means when we are writing the numbers on our scale, we don't have to write 20,000, 25,000 and 30,000, which can be quite long numbers to write.

We can just put 20, 25, and 30 because the axis' title says that those numbers represent thousands of pounds, 20,000 pounds and so on.

So let's check what we've learned there.

Which two terms describe a variable that is predicted to be affected by another variable? You've got four options to choose from, and two of them are correct.

Pause the video while you make your choices and press play when you're ready for answers.

The answers are a, the dependent variable, and d, the response variable.

Which two terms describe a variable that is predicted to be affecting another variable? Same options to choose from.

Pause the video while you make your choices and press play when you're ready for answers.

The answers are b, the explanatory variable, and c, the independent variable.

Okay, it's over to you now for Task A.

This task contains one question and here it is.

Here we have a table of data that shows travel data from the Office of National Statistics about the average distance that people travel to work and the total number of bus journeys in 2023 for different regions around the UK.

Now, the distance that people travel to work is labelled as the commuting distance on the table, and that is measured in kilometres, and the numbers you can see in the bus journeys there, they're all quite small numbers, they're all in the hundreds, but they represent millions of bus journeys.

So for example, where it says 113, that means 113 million bus journeys.

What you need to do is represent this data on a scatter graph.

Pause the video while you do this and press play when you are ready to see an answer.

So let's see how we got on with that then.

Here is a possible answer for how your scatter graph might look.

Your first decision was to decide which way round those axes should be labelled.

It seems most reasonable to assume that commuting distance would be the independent variable and the number of bus journeys in that area would be the dependent variable.

So we've labelled it that way round.

Another decision you would've had to have made while plotting this would be how to scale your axes.

Now, you may have used a different scale to what you can see on the screen here, but whichever scale you used, you should still get a similar shape to those data points on your scatter graph to what you can see here.

Okay, well done so far.

Now let's move on to interpreting scatter graphs.

Here we have a scatter graph that shows data from the ONS, the Office of National Statistics, about income and house prices for each region in England and Wales in 2023.

Each point in that scatter graph represents a different place in either England or Wales.

The horizontal axis shows us the annual income and that's in thousands of pounds.

And the vertical axis shows us the median house price in each region, which is also in thousands of pounds as well.

And you may notice in the bottom right-hand corner of this scatter graph, there is an icon which contains three chain links.

And that's not something you need to draw when doing a scatter graph.

In fact, don't draw that.

What that icon means is that in the slide deck, you can click on the image for the scatter graph and load up an interactive Desmos version of that same scatter graph.

We can zoom in, zoom out, move around, and explore the data in more depth.

So wherever you see that icon, you can click on the image to load up a Desmos version.

So let's now look at this scatter graph together and try to interpret what the positions of each point tell us about the data in that region.

And let's start by looking at the median house prices for each region.

Here we have a cluster of data points that are at the lowest point on this scatter graph.

Can we think what that must tell us about the median house prices in these regions? Well, Lucas says these data points show the regions that have the lowest median house prices, and then we have this point way up here, the highest point on the scatter graph.

What does that tell us about the median house price in that region? Well, that means this data point shows the region with the highest median house prices, and then we also have the median annual income going across the horizontal axis.

So this point here is the furthest to the left.

What does that tell us about the median annual income in that region? Well, Lucas says this data point shows the region with the lowest median income, and then we have this point far to the right.

This point shows the region with the highest median income.

And one thing we can't see from a scatter graph, the way it is now, is where these regions are in the UK, whether they are up north or down south or on the west coast or in Wales or in England or where.

But what we can do with a scatter graph is explore subgroups of data and label them differently so we can see which belongs to which.

Here we have a scatter graph, which is the same data, but just for regions in the south of England and the midlands of England in 2023.

And we can see that we have some points which are plotted as green circles and some plotted as black crosses.

The green circles represent regions that are in the south, and the black crosses represent regions that are in the midlands.

And once again, you'll see that this image is hyperlinked to a Desmos version.

So if you have access to slide deck, you can click on that and you can take a look at different regions in the UK, whichever ones interest you.

What could we do with this data now we can see it as two separate subgroups? Lucas says we can make comparisons between regions in the south and the midlands using each variable.

Let's do that together.

Lucas compares the median annual income for regions in the south and the midlands of England.

That means we're looking here at the horizontal axis and what we can see here is there's not a lot of difference between where they are positioned in the horizontal axis.

There's a little bit of difference, but not very much.

Lucas says for the majority of the data points, the median incomes are similar in the south and the midlands.

So Lucas then compares the median house prices for the regions in the south and the midlands of England.

In other words, how high up they are on the scatter graph.

And what we can see here is that the majority of the points for the south tend to be higher than the majority of points for the midlands.

Not all of them.

Some of them are the way around, but generally overall, the points for the south tend to be higher up on a scatter graph than the ones in the midlands.

So that means houses in the south tend to be more expensive than houses in the midlands.

So let's check what we've learned there.

Here we have a scatter graph, which is all about jobs.

Each point on this scatter graph represents a different occupation.

The horizontal axis shows us the mean number of hours worked per week in that occupation.

And the vertical axis shows us the mean hourly pay for the occupation in pounds.

And the data is taken from the Office of National Statistics for 2023.

You'll see that four of the points are labelled A, B, C, and D.

And my question to you is which point represents the occupation that tends to work the fewest number of hours per week? Pause the video while you write down a letter and press play when you're ready for an answer.

The answer is point A.

We can see that because it is the furthest to the left on this scatter graph? Which point represents the occupation that tends to work the highest number of hours per week? Pause the video while you make a choice and press play when you're ready for an answer.

Point C, we can see that because it's the furthest to the right.

Which point represents the occupation that has the highest mean hourly pay? Pause the video while you write a letter and press play when you're ready for an answer.

The answer is point B, and we can see that because it is the highest point on this scatter graph.

Okay, it's over to you now for task B, this task contains three questions, and here is question one.

We have a scatter graph that shows data taken from the ONS about the area size and the population of regions in England and Wales from 2021.

And you have three questions to answer about this scatter graph.

If you want to, you can click on the image for this scatter graph to load up a Desmos version and that allows you to zoom in and zoom out on different parts of it, but you don't necessarily need to do that.

Either way, pause the video while you do it and press play when you're ready for question two.

And here is question two.

This time we have a scatter graph which shows weather data taken from the Met Office about two regions in the UK, Lerwick and Eastbourne.

Each point represents a month from 1941 to 2022.

And you can see those points are labelled differently depending on whether it's for Lerwick or Eastbourne.

You have two questions to answer about this scatter graph.

Pause the video while you do that and press play when you're ready for question three.

Question three, this time we have a scatter graph that shows data from the ONS about house prices and annual income.

And each point represents a region in England and Wales.

And the points we have here are for regions in the south of England and the regions in the north of England.

And what you need to do is write two comparisons between the regions in the north and the south of England based on this data.

Write those comparisons as sentences describing which variables you are talking about each time.

Pause the video while you do this and press play when you're ready to go through some answers.

Let's see how we got all these questions in.

With question one, we have our scatter graph about the area and population of regions around the UK.

And for part a, we had to circle the data point that shows the region with the highest population, that is this data point here.

And for part b, we had to circle a data point that shows a region with the greatest area, that'll be this point here.

And for part c, draw a box around some points that show regions with a large area and a low population.

So we're looking to go further to the right on the horizontal axis, but low on the vertical axis.

That'll be these points here.

You might have included some other points or maybe have not included all these points, but generally we're looking at this area of the scatter graph.

And then question two, we have a scatter graph about the weather in Lerwick and Eastbourne.

In part a, it said which region tends to get less sunshine? And justify your answer.

Well, the answer is Lerwick.

Now, justification is that the majority of its points are to the left of the graph compared to the points for for Eastbourne.

And which region tends to be warmer? Justify your answer.

Well, that'll be Eastbourne, and we can see that from the data because the majority of the points are higher up the graph compared to Lerwick.

Not all of them, but the majority are.

And then question three, we have our scatter graph about house prices and annual income for regions in the south and north of England.

And we have to write two comparisons.

We'll compare each of these variables one at a time.

So we can say that income tends to be higher for regions in the south, and we can say that house prices tend to be more expensive for regions in the south as well.

Or you might phrase those in the other way around.

You might have said lower in the north and less expensive in the north.

Fantastic work today.

Let's now summarise what we've learned in this lesson.

Bivariate data compares the values of two variables by pairing each value from one variable with a value from the other variable.

And bivariate data can be represented together in the same graph by plotting a scatter graph, and by plotting a scatter graph, we can explore relationships or any potential relationships between those two variables.

When we plot a scatter graph, we need to think about which way round we label the axes.

The independent or the explanatory variable is placed on the horizontal axis, and the dependent or response variable is placed on the vertical axis in cases where it's clear which one is which, but if it's not clear, don't worry about it too much.

We have explored what happens if you switch those around.

By placing the independent variable on the other axis, it reflects the data points in the line y equals x, but the overall relationship is still displayed.

Great job today.

Thank you very much.