Lesson video

In progress...

Hello, my name is Dr.

Rolson and I am happy to be helping you with your learning during today's lesson.

Let's get started.

Welcome to today's lesson from the unit of graphical representations of data with scatter graphs and time series.

This lesson is called "Problem solving with scatter graphs and time series." And by the end of today's lesson, we able to use our understanding of various scatter grafts and time series in order to solve problems. Here are two previous keywords that are going to be used frequently during today's lesson.

So you may want to pause the video if you want to remind yourselves what these words mean before press and play to continue.

This lesson contains two learning cycles.

In the first learning cycle, we're going to focus on identifying errors in graphs.

We'll take the role of a reviewer who looks at previous work and inspects the data to make sure everything is at least correct within the data itself.

And then the second part of today's lesson, we'll take the role of a fact checker, someone who looks at claims are made based on data and critiques them.

But let's start off with identifying errors in graphs.

Here we have Sam, and Sam has plotted a time series graph using data from the Office of National Statistics, the ONS.

The data is about the number of people in England that use the internet on a daily basis.

So we can see the horizontal axis has the years and the vertical axis has the number of people who use the internet daily, and that number is in the million.

Sam looks at the graph and makes a conclusion from it.

Sam says, the number of people using the internet on a daily basis has decreased most years.

Is Sam correct here? Pause the video while you think about it and press play when you're ready to continue.

In this case, Sam is not correct.

Let's take a look at why.

Sam has scaled the horizontal axis with the years in descending order rather than ascending order, it goes from 2020 on the left to 2008 on the right.

Now this can make it look like the number of daily internet users has declined from one year to next because as we read the graph from left to right, the data points are going further and further down the graph in most cases.

So it's understandable why Sam has made this incorrect conclusion because the graph has been presented in such a way that it makes it look like something is declining.

Sam says, let's now see what it looks like when I list the years in ascending order instead.

Sam redraws a time series graph with the years in ascending order along the horizontal axis.

So now it goes from 2007 on the left side of the axis to 2020 on the right side of the axis.

And what we can see more clearly here now is that as we read the graph from left to right, as we normally do with graphs, the data points tend to be getting higher and higher in most cases.

So Sam says, it is now clear to see that the number of people using the internet on a daily basis has increased most years.

Here we have Jacob, Jacob plots a time series graph using data from the ONS, the Office of National Statistics.

And the data is about the cost of renting a house in Yorkshire.

Along the horizontal axis we have the years, and the vertical axis, we have the average monthly rent, which is in pounds.

And Jacob looks at his graph and makes a conclusion.

He says, renting a house costs less than half price in 2019.

Maybe there was a sale on or something.

Is Jacob necessarily correct here? Pause the video while you think about this and press play when you're ready to continue.

It seems likely that the reason why Jacob's attention has been drawn to the data of 2019 is that it stands out from the rest of the data, it seems to be an outlier.

When a graph presents an outlier, it can be helpful to check the original data again, so let's do that.

When we look at 2019 on the graph, it says the average monthly rent is 260 pound a month, but in the original data table it says it was 620 pounds per month.

So it looks like in this case, the point was plotted incorrectly.

This is what the point should have looked like if it was plotted correctly.

And as you can see, it's fitting more now with the rest of the data.

So if you spot an outlier in a graph, it may be worth checking the original data because it could be an error like it was here.

Here we have Aisha.

Aisha plot a time series graph using data from the Office of National Statistics, the ONS, about the number of bus journeys in the UK.

So we have the horizontal axis shown as the years, and we have the vertical axis shown as the number of bus journeys taken, and that is in the billions.

Aisha looks at her graph and draws a conclusion.

She says, I must have made an error when plotting the data for 2020.

Is Aisha necessarily correct? Pause the video while you think about this and press play when you are ready to continue.

Well, we can see why Aisha might think this because it looks like there is a outlier in the data for 2020.

And when a graph presents an outlier, it can be helpful to check the original data again.

Let's do that again here.

So in the year 2020, it looks like the number of bus journeys on the graph is about 1.

7 billion.

And if we look at the table, it also says 1.

7 billion as well.

So in this case, this point was not an error, it was plotted correctly.

And sometimes an hourly can just be an unusual result.

In this case, the unusual result was probably due to the fact that in 2020, the UK locked down in response to a pandemic, which meant that travel restrictions were in place and people weren't necessarily using the buses as much as normal.

And then we can see it increases again the year after.

So as we've seen with the last two examples, when a graph presents an outlier, it can often be helpful to check the original data again, because one hand it could be an error and that error needs to be corrected, on the other hand, the outlier could be plotted correctly but is just an unusual result and it may be something interesting to pursue further in your data investigation.

So let's check what we've learned.

True or false, when a graph contains an outlier, it means an error has been made.

Is that true or is it false? Pause the video why you choose, and then press play when you're ready to see some justifications.

The answer is false.

So, what's the reason why? Here are two justifications you can choose from.

Pause the video while you make a choice and press play when you're ready for an answer.

It's false because an outlier could be a point that is plotted correctly but is just an unusual result.

So, true or false? When a graph does not contain any outliers, it means that no errors have been made.

Pause the video why you choose either true or false, and then press play when you're ready for second part of this question.

The answer is false.

So, what's the reason why? Here are two justifications to choose from choose.

Pause the video while you choose and press play when you're ready for an answer.

It's false because a point could be plotted incorrectly while appearing to fit with a trend of the rest of the data.

Okay, it's over to you now for task A.

This task contains two questions, and in these questions you are gonna be taking the role of a reviewer or a proofreader, someone who looks at a report which includes data and checks to make sure that all the data and graphs are correct within that report.

Here's question one.

We have a time series graph that shows data from the ONS, the Office of National Statistics.

The data is about the number of people who took trips abroad each year from the UK.

And you've got a table at the bottom which presents the original data as well.

Now, two points have been plotted incorrectly on this graph and what you need to do is locate those two points and write down the years for them.

Pause the video while you do this and press play when you are ready for question two.

And here is question two.

You have another time series graph, and Andeep is trying to draw some conclusions based solely from this time series graph.

Could you please write down at least two problems with this graph that could cause Andeep some difficulties? Write this down as sentences, pause the video while you do this and press play when you're ready to work through some answers.

Okay, let's see how we get on with that.

So in question one, we had to locate the points that are plotted incorrectly on this time series graph.

A good place to start with this could be by looking for any outliers first.

Now, these could be errors.

They might not be errors, but they could be.

So let's check those first.

We can see that there's an outlier in the year 2010 and 2020.

If we check those outliers against the original data, we can see that the data for 2010 was plotted incorrectly, it should be 30.

4 but it looks more like it's 40 point something.

So that one's incorrect.

But the data for 2020, that one is plotted correctly.

So we found one error with year 2010.

To find the second error, we will need to work through the rest of the data points.

It might be that an error has been made, but it still looks like it fits with a trend for the rest of the data.

If you did that, you'd find your error in 2006.

The point for 2006 has been plotted too low, but not so low that it stands out too much from the rest of the data.

So our final answers are 2006 and 2010.

And then question two, we had this time series graph and you had to write down at least two problems with this graph that could cause difficulties for Andeep.

Well, one problem is that the scale on the horizontal axis is written in descending order.

Now, Andeep might have spot this and it might not cause a problem, but it could cause a problem if it's not spotted straight away 'cause it looks like that the data is decreasing from one year to the next rather than increasing in most years.

Another problem is that the scale on the vertical axis is not regular.

As we go from 0 to 200 on that vertical axis, the intervals go up in 50s.

But then from there onwards, from 200 up to 400, we can see that each major interval is 100, and this can cause some problems when interpreting the data 'cause it might look like the data has changed more than it has from one year to the next or changed less than it has from one year to the next.

What other problems could there be with this graph? Another one could be that the vertical axis is not labelled.

So he may not know what the context for this data is you might not know what this graph is about.

Also, it looks like there is an outlier in the year 2013, or at least something unusual going on with the data during that year.

Andeep cannot check whether 2013 is an error or an unusual result without access to the original data.

He might have access to that, but we don't know for certain from this case here.

Great work so far.

Now let's move on to the second part of today's lesson where you're going to take the role of a fact checker or a verifier, someone who looks at claims that are made based on data and critiques the validity of those claims. So let's get started.

Here we have Sophia and Alex.

Sophia is editing her school's newspaper, the Daily Oak.

She reviews an article that Alex has written with the headline below, most people spend around five hours a week working abroad.

Hmm, that's an interesting headline.

Sophia says, "What? This doesn't seem right.

What evidence have you based this claim on?" And Alex says, "I've got data that supports his headline." So what data might Alex present to support his claim? Pause the video while you think about this.

What do you expect Alex's show that will back up his claim? And then press play when you're ready to continue.

So let's hear from Alex now who's going to try and convince us why this headline is true.

He justifies his claim with data.

Alex says, here is a time series graph showing the number of UK visits abroad each year.

We can see we have the horizontal axis shown as the years and the vertical axis shown us the number of people who went abroad each year, and that is in the millions.

He then says, "As you can see, the numbers are pretty steady until the year 2020, when it suddenly decreases." Alex then says, "We can compare this to a time series graph shown the average number of hours people work per week." Again, we can see a time series graph shown as years from 2005 onwards, but this time the vertical axis shows the mean number of hours worked per week by full-time workers.

Alex says, "This also has a sudden decrease in the very same year, in 2020.

The sudden decrease in each variable at the same time shows that they are clearly related.

Therefore, people must be spending part of each week working abroad." So Alex has put forward a pretty convincing argument using data.

But what is wrong with Alex's claim? Pause the video while you think about this and press play when you're ready to continue.

Well, Alex may be onto something when he says that these two sets of data are related in some way because we can see that they both drop at the same time in the same year.

But when two sets of data are related, it does not always mean that changes in one variable is causing a change in the other.

Sometimes two sets of data may be related due to a third variable.

For example, an event may occur which has an effect on multiple sets of data.

In this case, we can see that both the UK visits abroad and the number of hours worked on average by full-time workers both dropped in the same year, that year was 2020.

Both of these sets of data seems like they were affected by a national lockdown that happened in the year 2020, which was written in response to a pandemic.

So let's go back to Sophia and the Daily Oak.

She's entering her school's newspaper and she reviews an article that Lucas has written with the headline below.

The headline says, calling your baby Daniel will reduce how much they use the internet.

Sophia says, "You can't just make stories up for our newspaper, Lucas." But Lucas says, "I haven't made it up, and I've got the data to prove it's true." What data do you think Lucas might present to support his claim? Pause the video while you think about this and press play when you're ready to continue.

Lucas justifies his claim with data from the Office of National Statistics.

Here we have a time series graph with years going across the horizontal axis, and the vertical axis shows the number of newborn babies named Daniel each year, and that's in the thousands.

Lucas says, as you can see from the times series graph, the number of newborn babies named Daniel has decreased every year since 2008.

Meanwhile, the number of people who use the internet on a daily basis has increased nearly every year since 2008.

And we can see that from this time series graph, the horizontal axis shows us the years and the vertical axis shows us the number of people who use the internet on a daily basis in the millions, and it looks like the data is increasing from one year to the next.

Lucas also says, here's a scatter graph where each point represents a year.

So on the horizontal axis, it has a number of newborns named Daniel, and the vertical axis it has a number of people using the internet on a daily basis.

And as we can see here, it shows a very clear negative correlation.

So what Lucas's data seems to show us here is that for each year when there are fewer babies named Daniel, there seems to be more and more people using the internet on a daily basis.

So this seems like a pretty convincing argument put forward with this data.

But what is wrong with Lucas's claim? Pause the video while you think about this and press play when you are ready to continue.

Well, these two data sets appear like they might be related.

However, when two data sets appear related, it does not always mean that a change in one variable is causing a change in the other.

In fact, sometimes two sets of completely unrelated data may have a correlation that is purely due to chance.

And this can happen when lots and lots of different data sets are analysed until an interesting result is eventually found.

In other words, if you just keep looking through time series data after time series data after time series data and just keep searching and searching and searching, the chances are you will find two sets of data where one of them is continuously increasing while the other one is continuously decreasing during the same years.

And those two sets of data may not be the slightest bit related, it's just happens that one's each year and the other one is decreasing, and that's what's happened here.

Between the two years that Lucas has chose, the number of daily internet users tended to increase most years.

And then when it came to baby names, there are some baby names that increased each year and there are some baby names that decreased each year, and it happens that Daniel is a name that has decreased each year between those two years.

And that would've meant this would've had a negative correlation.

Now, you may think that the claims that Lucas and Alex made in the newspaper headlines were a bit silly, and they were silly, but they are based on reasoning that could appear to be convincing to anyone who doesn't think carefully enough about what that data is really showing.

And you do sometimes find claims made by people or in the media or on the internet that are based on a similar reasoning to what Alex and Lucas have used.

So it's always worth being cautious and being critical.

So let's check what we've learned.

You've got four statements, A, B, C, and D.

Which statements are true for bivariate data with correlation? Pause a video while you choose, and it could be more than one, and then press play when you're ready for an answer.

The answers are B, C, and D are true.

When bivariate data has correlation, one variable might be affecting the other variable or both variables might be affected by a third variable, or the variables might be unrelated and just correlate by pure chance.

Okay, it's over to you now for task B.

This task contains two questions where you take the role of a fact checker or a verifier, someone who looks at claims made based on data and critiques the validity of those claims. Here's question one.

You have a newspaper headline which says, calling babies Arthur increases house prices.

Now this headline isn't just completely made up, it's based on those graphs in some sort of way.

So in part A, you need to first get to the bottom of where this headline has come from.

How has somebody use this data to come to that conclusion? So write two observations from the graphs that seemingly support the headline.

And then part B, explain why the headline is not necessarily correct.

Pause the video while you do this and press play when you are ready for question two.

And here is question two.

The time series graph shows data from the ONS, the Office of National Statistics, about the number of people in England who use the internet on a daily basis.

An Jun looks at his data and makes a claim.

He says, by the year 2040, there'll be over 80 million people in England using the internet on a daily basis.

So once again, Jun hasn't just made up this claim based on nothing, he's based on something.

So on part A, what has Jun basis this claim on? Precisely how did Jun reach this conclusion? And in part B, explain why his prediction may not be correct.

Pause video while you work through this and press play when you're ready for some answers.

So let's see how we got on with that here.

We had the newspaper headline that said, calling babies Arthur increases house prices.

And we got the data that supposedly backs that up.

In part A, you have to write two observations from the graph that seemingly support the headline.

Well, you could have said, the number of babies named Arthur and the mean house price both continuously increase from 2009 to 2019.

You can see this from the two times series graph that continuously increase from 2009 to 2019.

And you could also have said that the two variables have a clear positive correlation, which you can see from the scatter graph.

So why is the headline not necessarily correct? Well, the two variables might happen to both be continuously increasing during the same period of time.

The mean house prices tend to increase from one year to the next, so that's happened from 2009 to 2019.

And as for baby names, some names tend to get more popular over a certain period of time, some tend to get less popular.

And if you look for enough time series data about baby names, you will find at least one where it happens to be increasing between those same years as well.

So the correlation may simply be due to chance.

And then question two, we had a time series data that shows the number of people in England who use the internet on a daily basis.

And Jun says that by 2040, there'll be over 80 million people in England using the internet on a daily basis.

Now, Jun hasn't just made this up, he has based it on something.

What's he based it on? Well, it looks like in the years that has shown this time series graph, the number of people using the internet on a daily basis has increased pretty much each year.

And so far, it looks like it's going at approximately constant rate.

So it looks like Jun has continued that trend onwards.

He's extrapolated upwards into the year 2040 and use that to read off the figure of 80 million.

So why is this prediction not necessarily correct? Well, the trend may not continue to increase at the same rate in the future.

Otherwise, the number of people using the internet on a daily basis in England would exceed the current population of England.

Fantastic work today.

Now let's summarise what we've learned.

The theme of the lesson has been about critiquing the validity of claims that are based on data.

A claim could be invalid based on an error in the graph or an error in the interpretation.

Errors and graphs may lead to incorrect conclusions, and not all errors are obvious to spot straight away.

Sometimes an error might be obvious if it presents itself as an outlier.

But not all outliers are errors, sometimes an outlier could just be an interesting result.

Time series graphs and scatter graphs are widely used in the media and they can be misinterpreted.

Sometimes they are outrageous claims, like the claims you've seen in this lesson today.

Outrageous claims can be easy to spot, but sometimes an incorrect claim may seem plausible based on the data that it's presented.

So it's important to be critical and careful when drawing conclusions from data or when considering the claims made by others and whether or not they are true.

Great job today, thank you very much.

I've finished the video