Lesson video

In progress...

Hello, my name is Dr.

Rowlandson and I'm thrilled that you're joining me in today's lesson.

Let's get started.

Welcome to today's lesson from the unit of graphical representations of data.

This lesson is called Interpreting Scatter Graphs.

And by end of today's lesson, we'll be able to interpret scatter graphs using data points that are shown.

In this lesson, we'll be introducing two new keywords.

The first word is interpolation, which is the process of estimating unknown values that lie inside the range of existing data.

And the other word is extrapolation, and this is the process of estimating unknown values that are outside the range of existing data.

We'll be exploring these words in more detail later in the lesson.

This lesson contains two learning cycles, with the first learning cycle looking generally at the shapes of scatter graphs to get a sense of any trends that they might show in the data, and the second learning cycle looking at using scatter graphs to make predictions about the data.

But to begin with, let's start by observing trends in scatter graphs.

Here we have a scatter graph that shows data taken from the Met Office about the weather in Heathrow from each month from 1941 all the way to 2022.

Across the X axis, it shows us the total sunshine duration, which is measured in hours.

That is the total number of hours that sunshine was visible for that month.

Up the vertical axis, we have the mean of daily maximum temperatures in degrees Celsius.

That was measured by recording the maximum temperature for each day of a month and finding the mean of those temperatures.

And each dot on the scatter graph represents a month in Heathrow.

We can see that the points in this scatter graph aren't just randomly all over the graph.

They seem to have a little bit of a shape to 'em.

So based on that, what observations could we make from this data? And does there appear to be a connection between the two variables based on this scatter graph? Pause the video, have a think, and press play when you're ready to continue.

Let's explore these questions in a bit more depth now.

We can interpret what the points in different sections tell us about the variables for those months.

So for example, in this section in the bottom left at a scatter graph, we can see the points in that section aren't very far across on the horizontal axis and they're not very far up on the vertical axis.

That means they must have had low amounts of sunshine, which is on the horizontal axis, and also low maximum temperatures, which is on the vertical axis.

The points in this top left section here, these points aren't very far across in the horizontal axis but they are higher up on the vertical axis.

That means they must have had low amounts of sunshine but high temperatures.

The points in this top right hand section here, those points are much further across on the horizontal axis and they're also high up on the vertical axis as well.

So that means they have high amounts of sunshine and also high temperatures.

The points in its bottom right hand section here, these are far across on the horizontal axis, but they're not very high on the vertical axis.

That means the months represented by these points have high amounts of sunshine and low temperatures.

It seems like most of the data points in this data are in these two zones here, the zone in the bottom left where there is low amounts of sunshine and low temperatures, and the zone in the top right where there are high amounts of sunshine and high temperatures.

For most of the months, they tend to be either low sunshine and low temperatures or high sunshine and high temperatures.

There also appears to be an upward trend.

The points get higher and higher as they go further and further across.

The upward trend suggests that months with more sunshine tended to be warmer than months with fewer hours of sunshine.

Here is another scatter graph now.

This scatter graph shows some other data for the Met Office about weather in Heathrow for each month from 1941 to 2022.

This time the horizontal axis shows us the total rainfall in the month, and that's measured in millimetres, and the vertical axis shows us the total sunshine duration, which again is measured in hours.

The points in this scatter graph make a slightly different shape to the previous one.

So what observations could we make from this data? And does there appear to be a connection between the two variables this time? Pause the video, have a think, and press play when you're ready to continue.

This data appears to have a slight downward trend that suggests that months with more rainfall tend to have fewer hours of sunshine.

We can see that because the points tend to be sloping from the top left of the graph down to the bottom right of the graph.

However, it's not quite as pronounced as a shape as in the previous scatter graph.

So while this implies there may be some connection between total rainfall and total sunshine duration, the trend is weaker than the previous example and the connection between those variables is less clear in that case.

In this scatter graph, we have the horizontal axis shown as the total rainfall and the vertical axis is shown as the mean of daily maximum temperatures.

And again, the shape of the pointiness graph appears to be a bit different to the previous two.

What observations could we make from this data? And does there appear to be a connection between the two variables? Pause the video, have a think, and press play when you're ready to continue.

In this case, there does not appear to be a trend in the data.

It doesn't seem to be sloping upwards or sloping downwards or taking any kind of really clear shape.

Therefore, there is not a clear connection between the two variables.

We can't necessarily deduce that there is a connection between the total rainfall for a month and the mean of the maximum temperatures for that month.

Okay, let's check what we've learned so far.

Look at a scatter graph where we have the horizontal axis shown as the mean of daily minimum temperatures and the vertical axis shown as the number of days with air frost, and we can see the points for each month.

Which of the statements below best describe the trend in this scatter graph? Here are statements.

A, warmer months tend to have more days with air frost.

Statement B, warmer months tend to have fewer days with air frost.

Statement C, there does not appear to be a connection between the variables.

Pause the video, make a choice between those three statements, and press play when you're ready for an answer.

The answer is B, warmer months tend to have fewer days with air frost.

We can see that because as we go across on the horizontal axis, the point seems to be getting lower and lower in the vertical axis.

In this scatter graph, we can see the horizontal axis shows us total rainfall in millimetres and the vertical axis, again, shows us the number of days with air frost for each month.

Which statement best describes a trend in this scatter graph? Here are options.

Statement A, months with more rainfall tended to have more days with air frosts.

Statement B, months with more rainfall tend to have fewer days of air frost.

Statement C, there does not appear to be a connection between the variables.

Pause the video, make a choice, and press play when you're ready for an answer.

The answer is C.

There does not appear to be a connection between the variables in this scatter graph.

That's because the points don't really have much of a shape.

They're not trending upwards.

They're not trending downwards.

They just tend to be all over the place really.

It does not seem that as the total rainfall increases, the number of days of air frosts either increases or decreases very much at all.

So it does not appear to be a connection between the variables.

In the previous examples, the scatter graphs have represented data for a single location, Heathrow.

However, we can also plot multiple locations on the same scatter graph.

And by using different colours or different shapes for our points, we can see which points belong to one town and which points belong to another town.

And doing this, trends and data can sometimes distinguish one group from another.

So for example, in this scatter graph, the points for Heathrow, which is in the south of England, are labelled with black circles, and the points for Lerwick, which is on an island off the north of Scotland, are labelled with green crosses.

Based on this, what observations could we make from this data now? Does there appear to be any differences between the data for the two towns? Pause the video, have a think, and press play when you're ready to continue.

Heathrow tends to have warmer maximum temperatures than Lerwick.

We can see that because the points in Heathrow in the black circles get much higher up the vertical axis than the points for Lerwick.

We can also see that Lerwick has had lots of months with fewer hours of sunshine than Heathrow.

We can see that because the points represented by green crosses for Lerwick in this case tend to be bunched up quite a lot in that bottom left-hand corner.

There are lots of points that are not very far across on the horizontal axis.

Therefore, those months had very few hours of sunshine.

Whereas we can see that there are lots more points for Heathrow that are further across the horizontal axis.

Based on those observations, we can solve problems like this one.

A new data point records 220 hours of sunshine and a mean maximum daily temperature of 11 degrees Celsius.

However, imagine the data point didn't have the label next to it for which town it's from.

Based on those numbers, which town is it most likely to be from? Is it most likely to be from Heathrow or from Lerwick? Pause the video, have a think, and press play when you're ready to discuss this together.

If we look at where this data point would be on a scatter graph, it would be 220 across on the horizontal axis and 11 up on the vertical axis, so it would be in this zone here.

And we can see that most of the data that is already plotted around that area tends to be for Lerwick.

In most cases for Heathrow, when there are 220 hours of sunshine, the temperatures tend to be much higher.

Most of those points are above 15 degrees Celsius.

Whereas for Lerwick, we can see about round when it's 220 hours, most of the points seem to be around the 11 degrees Celsius mark.

So is it likely, but not certain, that our new data point is for Lerwick.

Okay, let's check how well we've learned that.

On the scatter graph, we have the horizontal axis shown total sunshine duration and the vertical axis shown the mean of daily maximum temperature.

A new data point records 150 hours of sunshine and a mean maximum daily temperature of 18 degrees.

Which town is it most likely to be from? Is it Heathrow or Lerwick? Pause the video, write down the names of one of those towns, and press play when you're ready for an answer.

If we look at a scatter graph for where 150 hours of sunshine and an 18 degrees Celsius would be, we can see in that area, there are lots of black circles representing Heathrow, but there are no green crosses representing Lerwick.

Therefore, it is likely that our data point is for Heathrow.

And here's another one.

A new data point records only 10 hours of sunshine in the month.

Which town is it most likely to be from? Heathrow or Lerwick? Pause the video, write down the name of one of those two towns, which one you think it is, and press play when you're ready for an answer.

Let's look at 10 hours of sunshine on the horizontal axis.

All the points that are above 10 hours of sunshine are representing Lerwick.

Therefore, a data point with 10 hours of sunshine is more likely to be from Lerwick than from Heathrow.

And another one.

A new data point records a mean maximum daily temperature of 22 degrees Celsius.

Which town is this most likely to be from? Heathrow or Lerwick? Pause the video, write down the name of the town, and press play when you're ready for an answer.

Let's look at where 22 degrees is on the vertical axis.

If we go across from there, we can see that all the data points previously have been from Heathrow.

Therefore, a new data point of 22 degrees Celsius is more likely to be from Heathrow than Lerwick.

Okay, over to you now for task A.

This task contains two questions.

Here's the first question.

The scatter graph shows data for the Met Office for weather in Stornoway for each month from 1941 to 2022.

And there are four statements written, A, B, C, and D.

Read each statement, look at the scatter graph, and then decide whether the statement is likely to be true or false based on the data that you can see.

And for each one, write true or false.

Pause the video, have a go, and press play when you're ready for question two.

Here is question two.

Here we have a scatter graph for two towns, Heathrow and Lerwick.

The horizontal axis shows the mean of the daily maximum temperature and the vertical axis shows us a total rainfall.

And this time, we have four questions about the data.

For parts A, B, C, and D, look at the scatter graph and write down which town you think is the correct answer.

Pause the video, have a go, and press play when you're ready for answers.

Well done, let's go through answers to question one.

We need to write true or false for each statement based on the scatter graph.

Statement A, months with more rainfall tended to have less sunshine.

This appears to be true.

We can see that because there seems to be a downward trend, whereas the amount of rainfall increases on the horizontal axis, the height of the points on the vertical axis tend to decrease.

Statement B, months with more sunshine tend to have less rain.

That is also true.

We can see that points that are higher up the vertical axis tend to be closer to zero on the horizontal axis.

So points with lots of sunshine tend to have less rainfall.

Statement C, some months recorded no sunshine.

That was false.

We can see that if we look along the bottom of the graph.

We can see that there are no points that go as low as zero on the vertical axis.

And statement D, all the months that had more than 150 millimetres of rain also had less than 200 hours of sunshine.

That is true.

If we look at 150 millimetres on the horizontal axis, we can look at all the points and see that they're all below 200 on the vertical axis.

Let's now go for the answers to question two.

In the graph, which town tends to get more rainfall? That is Lerwick.

We can see that because lots of the points for Lerwick tend to be higher up the vertical axis than the points for Heathrow.

Which town tends to get warmer? That is Heathrow.

We can see that because lots of the points of Heathrow tend to be further across on the horizontal axis than the points for Lerwick.

A new data point records five degrees Celsius and a hundred millimetres of rain, which town is it more likely to be for? It's Lerwick.

You can see that by looking at five degrees Celsius and seeing where the points are around a hundred millimetres and whether they are from Heathrow or Lerwick, and there's more of 'em from Lerwick.

And statement D, a new data point records 20 degrees Celsius and 100 millimetres.

Which town is it likely to be for? That is Heathrow.

We can mostly see that because there are no data points for Lerwick that are 20 degrees Celsius.

Great work so far.

Let's now go onto the second learning cycle, which is using a process called interpolation to make predictions based on the data.

Here we have a scatter graph where the points appear to have an upward trend.

And that means as the value of the horizontal axis increases for a point, it tends to be the case that the value of the vertical axis also increases as well.

In these cases, trends and bivariate data can be used to make predictions about one variable based on the other variable.

And when predictions such as these are between the values that are inside the range of existing data, this process is called interpolation.

The word interpolation starts with prefix inter, which tends to mean either inside or between.

For example, internal tends to mean inside something.

So where we can see that all the points here lie between zero and 310 on the horizontal axis and zero and 30 on the vertical axis, interpolation is estimating values inside that data range.

Here's an example.

The scatter graph shows data from the Met Office about the weather in Heathrow for each month from 1941 to 2022 with the horizontal axis shown as the total sunshine duration in hours and the vertical axis shown as the mean of the daily maximum temperature in degrees Celsius.

What range of maximum daily temperatures might predict from month with 200 hours of sunshine? Let's solve that together now.

If we look across at 200 on the horizontal axis and look at the points that are above 200 so far, we can see that most of the points tend to be within a short range.

They tend to be between 15 degrees Celsius and 25 degrees Celsius.

Not all of them, but a lot of them tend to be there.

Therefore, if we had another month with 200 hours of sunshine, we could probably expect the mean of the maximum daily temperature to be somewhere between 15 degrees Celsius and 25 degrees Celsius.

Let's do another one.

The mean maximum daily temperature for a month was 10 degrees Celsius.

What range of sunshine duration might we predict for this month? So this time, let's look at 10 degrees Celsius on the vertical axis and look at the points that tend to be in line with 10 degrees Celsius.

We can see that there are a lot of points that are bunched together, and those points are mostly between 35 hours and 125 hours.

Not all the points, but a lot of the points.

Therefore, we could predict that a month that has 10 degrees Celsius as its maximum daily temperature mean would have somewhere between 35 and 125 hours of sunshine.

Okay, let's check what we've learned there with that.

Here we have a scatter graph with total sunshine duration along the horizontal axis, a mean of maximum daily temperatures going at the vertical axis.

What range of maximum daily temperatures might we predict for a month with 50 hours of sunshine? Your options are A, somewhere between five degrees Celsius and 10 degrees Celsius, B, between eight degrees Celsius and 15 degrees Celsius, and C, between 12 degrees Celsius and 22 degrees Celsius.

Pause the video, choose a range, and press play when you're ready for the answer.

The answer is A.

We'd expect to be somewhere between five and 10 degrees Celsius, which we can see by going to 50 on the horizontal axis and looking up and seeing how most of the points tend to be between five and 10.

Not all of them, but most of them.

What range of maximum daily temperature might we predict for a month with 100 hours of sunshine? Same options again.

Pause the video.

Press play when you're ready for an answer.

This time the answer is B.

We'd expect between eight degrees Celsius and 15 degrees celsius, which we can see we go across to a hundred, and most of the points seem to be between eight and 15.

What range of sunshine duration this time might we predict for a month with a mean maximum daily temperature of 20 degrees? Here's your options.

A, between 30 to 120 hours.

B, 120 to 200 hours.

Or C, 190 to 270 hours.

Pause video, choose a range, and press play when ready for answer.

The answer is B.

We'd expect between 120 and 200 hours of sunshine from a month with a mean maximum daily temperature of 20 degrees.

Let's look at this scatter graph again, but zoom out a little bit.

Laura is using the data to predict how warm it could get if a month had a larger duration of sunshine.

She thinks if next June has 450 hours of sunshine, then we would expect the mean maximum temperature to be between 40 degrees and 50 degrees.

Can you anticipate any problems that there could be with Laura's prediction? Pause the video, have a think, and press play when you're ready to continue.

The main problem with this prediction is that the data so far has not observed any months with more than 310 hours of sunshine, so it's very difficult to predict how temperatures would respond beyond this point.

We don't know whether the temperatures would continue to increase in the same way they have done so far or whether the temperatures might increase still but perhaps at a slower rate than they've done so far, or whether perhaps the temperatures might reach a bit of a maximum level.

When we make predictions that are outside the range of existing data, this process is called extrapolation.

Extrapolation starts with the prefix extra, and that extra prefix usually refers to outside.

So in this case, we can see that the majority of the data values are below 300 on the horizontal axis.

So to make predictions that are above 300 is outside our range of pre-existing data.

And that's what extrapolation is, estimating values outside the range of data we already have.

The problem with extrapolation is that it's less reliable than interpolation because it involves speculating about values that are beyond what has previously been observed.

Therefore, we can't rely on predictions made through extrapolation, as well as we can rely on predictions made through interpolation.

So we have a contrast between two processes here.

Interpolation is estimating values that are inside the range of data that we already have, and extrapolation is estimating values outside the range of data that we already have.

Let's check what we've learned there with that.

What word describes a process of estimated unknown values that are inside the range of existing data? Pause the video, write down the word, and press play when you're ready for the answer.

The answer is, interpolation is a process of estimating unknown values that are inside the range of existing data.

So what word describes a process of estimating unknown values that are outside the range of existing data? Pause video, write down the word, and press play when you're ready for the answer.

The answer is extrapolation.

That is a process of estimating values outside the range of existing data.

And true or false.

Extrapolation is as reliable as interpolation for predicting unknown values in bivariate data.

Choose whether you think that statement is true or false, and then justify your answer with one of the two options below.

A, the trend in the data will continue in the same way for values beyond what has previously been observed.

And B, it is difficult to predict whether a trend will continue in the same way for values beyond what has previously been observed.

Pause the video, choose true or false and a justification, and press play when you're ready for an answer.

The answer is false because it is difficult to predict whether a trend will continue in the same way beyond what's been observed.

It's over to you now for task B.

In this task, we have one question.

We have a scatter graph that shows the mean daily minimum temperature across the horizontal axis and the mean daily maximum temperature of the vertical axis.

Given the following minimum daily temperatures, estimate the range of maximum daily temperatures, and you've got three parts to that question.

Pause video, have a go, and press play when you're ready for an answer.

Well done, let's go through this together now.

In part A, when the minimum temperature is 10 degrees Celsius, we'd expect the maximum temperature to be between 16 and 20 degrees Celsius.

Part B, when the minimum temperature is five degrees Celsius, we'd expect the maximum to be somewhere between 10 and 15 degrees Celsius.

And when the minimum temperature is zero degrees Celsius, we'd expect the maximum to be between five and eight degrees Celsius.

Wonderful work today.

Let's summarise what we learned in this lesson.

General trends may be observed from scatter graphs by looking at whether or not the points form some kind of clear shape and if that shape is going upwards or going downwards for example.

When there is a trend, scatter graphs can be used to estimate a second value based on a known first value.

And when we do that within the data range we already have, it's called interpolation.

However, if we do that outside the data range we already have, it's called extrapolation.

And extrapolation may not be valid as it's outside the data sets.

Well done.

I've finished the video