Lesson video

In progress...

Hello, my name is Dr.

Rowlandson, and I'll be guiding you through today's lesson.

Let's get started.

Welcome to today's lesson from the unit of numerical summaries of data.

This lesson is called Statistical Problems with Data Presentations, and by end of today's lesson, we'll be able to choose an appropriate representation to explore a statistical problem.

Here are some keywords that you may be familiar with, and we'll be using again in today's lesson.

This lesson contains two learn cycles.

In the first learn cycle, we'll be focusing on how to present data clearly, using a graph or other representation.

And a second learn cycle will be focused on choices about what representations are used to illustrate our points.

But to begin with, let's start with presenting data clearly using graphs.

Here is a data handling cycle, the steps that someone may take when they are conducting an investigation involving data.

Now, visual representations of data, such as graphs, can help us during a couple of stages of this data handling cycle.

As the researcher, it can help us make sense of the data, by presenting it in a way that allows us to see things that are difficult to see when we just look at the big list of numbers in our data.

But also, graphs can help us communicate things about our data to others, and help others make sense of the data as well.

And we'll be looking at these two elements of visual representations during today's lesson.

If one of the benefits of using a graph or other visual representation is that it allows you or other people to make sense of the data by seeing things, then it's important that it's presented in a way that makes it clear to see what the data is showing.

That means using the available space effectively.

That can help make elements of the graph clear to see and interpret accurately.

To use a space effectively, we need to sometimes think about the scale on our axes, because the scale on our axes can drastically affect how a graph looks.

There is sometimes a question about whether or not you should start a scale on an axes at the number zero.

Well, there's no hard or fast rule about that.

It really depends on the graph you're using.

For some graphs, it is important to start the scales on zero, because it represents the data more accurately when the axes start at zero.

But for other graphs, starting at zero can actually cause some problems. For scatter graphs, starting each axis at zero can cause the points to become clustered together into a small space.

Here's an example of that.

We've got a scatter graph and both axes start at zero, but the problem is, all of the data points on both variables are quite high up that scale.

On the horizontal axis, all the data points are over 500, and all the data points on the vertical axis are over 100.

Therefore, all those points in that scatter graph are in that tiny amount of space there.

Now one of the benefits of using a scatter graph is that it allows you to look at the shape of the data, and see how any kind of relationships or associations might be playing out within the data.

But here, because all the points are so close together and overlapping each other, we can't really see the shape of the data very well at all.

Therefore, starting the scales of a scatter graph away from zero can sometimes make more use of the space and illustrate correlations more clearly.

For example, if I take this same data in a scatter graph, and present it again, using scales that don't start at zero, it looks like this.

These are the same points and the same data that is in the left hand scatter graph, but here we started the axis at 500 on the horizontal one, and 100 on the vertical one, and we can see now very clearly there is a positive correlation between these two variables.

Sometimes, when a graph such as a scatter graph does not start its axes at zero, it uses an axis break to highlight that this scale does not start at zero.

For example here, on this scatter graph, we can see on these axes, we've got these little zigzag lines like this.

These are called axis breaks, and they indicate that that scale on the axis starts somewhere above zero.

You don't always see these, and sometimes you do, sometimes you don't.

It really depends on who made the graph, and particularly what tool they use to make it, because sometimes this function is available to use on some computer tools, and sometimes it's not.

So you might see it, you might not, but it's good to recognise when it's there.

For bar charts, not starting the vertical axis at zero can distort how large one bar is, compared to the other.

For example, this bar chart here might represent which cars people use, whether petrol or diesel, from a sample of people taken.

And it looks like, in this bar chart, that there are twice as many people using petrol cars as diesel cars.

Well, that's because the scale does not start at zero here.

The scale actually starts at 50.

And if we think about those numbers now, we've got 60 people use petrol and 55 people use diesel.

That's not twice as many people using petrol as diesel.

So not starting at zero does not accurately represent this data very well.

It kind of distorts how these two bars compare with each other.

When we do start at zero, it gives a much more accurate picture of how these two groups compare.

We can now actually see, it's not twice as many people using petrol as diesel.

It's only a few more people using petrol than diesel in this particular set of data.

Another thing to consider in order to make a graph really clear for someone to understand, is about labelling axes.

We need to remember that sometimes, when we present graphs to other people, we're not present with them while they are looking at the graph.

They might be looking at the graph on the page of a book or on a slide on a computer screen, and you might not be there to explain what all the parts of the graph mean.

Therefore, labelling axes can help a reader understand what the data is about, and what each part of the graph represents.

For example, the bar chart here shows weather data from 2021 taken from the Met Office.

Looking at this bar chart, we can see we've got a scale going from zero to 40, and we can see we've got two bars where one bar is higher than the other.

Is it clear what data this bar chart is showing? Just pause the video and have a little think about that for a second, and then press play when you're ready to continue.

No, it's not clear what this bar chart is showing.

We don't really know what the context of this is.

We know it's weather data for 2021, but we don't know whether it's about rainfall, or whether about temperatures in the day, or numbers of days with frost, or wind speed.

We don't know what each bar represents.

Is one bar autumn and the other bar winter? Or is one bar a town and the other bar another town? Are these different countries? It's difficult to make much sense of what this bar chart is showing with the information that it presents to us right now.

So with that in mind, how could this bar chart be clearer to someone looking at it? Pause the video, have a think, and press play when you're ready to continue.

This bar chart could be much clearer if we labelled each of the axes.

Now we've labelled the axes, we can make much more sense of what's going on here.

One bar represents Sheffield, and the other bar represents Tiree.

So we've got two different locations, one for each bar.

The vertical axis says days with airfrost.

So we can see now that Sheffield had around 36, 37 days of airfrost.

It would be more helpful if the vertical axis said whether that was per month or per year.

We can probably guess that this is per year because there aren't 35 days in a month.

So just that little bit of extra information helps us make much more sense of what this bar chart is showing us.

Okay, let's check what we've learned there so far.

True or false? The scales of a scatter graph should always start at zero.

Is it true or false? And choose a justification.

Pause the video, make your choices, and press play when you're ready for answers.

The answer is false, because not starting at zero can make it clearer to see connections between the variables.

Okay, it's over to you now for task A.

This task contains two questions, and here is question one.

The graph shows data from the ONS.

That stands for the Office of National Statistics.

From the ONS about the population of two towns in 2019.

Izzy says, "Cheltenham has twice as many people as Mansfield." There are two questions to consider about what Izzy has said there.

Pause the video, write a sentence or two for each question, and press play when you're ready for question two.

And here is question two.

The table shows data from the ONS about five towns in 2019.

We can see the scatter graph is next to it, has these points plotted.

It's got the top end of the scale as well for each axis, and it tells you what the axes are representing.

What it doesn't have is the rest of the scale, nor do we know where those scales start at.

So looking at the table, and looking at the graph, complete the scale on the axes on the scatter graph.

Pause the video, have a go at this, and press play when you're ready for answers.

Well done with that.

Let's now work through question one together.

Izzy says, "Cheltenham has twice as many people as Mansfield," and part A asks you, "Explain why Izzy might think this, based on the bar chart." Izzy might think this because the bar for Cheltenham is twice as long as the bar for Mansfield.

Therefore you may think that Cheltenham has twice as many people as Mansfield, just looking at the bars.

"Explain why this is wrong." Well, the scale does not start at zero.

There are 110,000 people in Mansfield, and there are 120,000 people in Cheltenham.

120,000 is not twice as much as 110,000.

So that's why the conclusion is wrong.

And now let's go through the answers to question two, where we had to complete the scales on the axes of this scatter graph.

We can see the highest value on each axis, but we don't know where either of the scales start.

One way that we could approach this could be to look at the point with the lowest value in each of these variables to get a sense of where these scales might start, and then filling in the intervals in between.

And our answers would look a bit like this.

Well done so far.

Let's now move on to learn cycle two, which is choosing how to present data to illustrate a point.

During mathematics lessons, when you are first learning how to plot a particular type of graph, you know which type of graph to plot because that's what the lesson's about.

That's what you're practising.

When you're conducting a data investigation of some kind, you may need to make those decisions yourself, and choose which type of graph to use.

The usefulness of a graph can depend on the type of data you're working with, and also the conclusion that you're trying to illustrate to somebody else.

In other words, you want to choose a graph that helps you see what it is you want to see about the data, or helps someone else see what it is you want them to see about the data.

So the choice of graph may be down to what it is you're trying to analyse, or what message it is you're trying to communicate to others.

Here are some examples of graphs that you may be familiar with already.

Bar charts, pie charts, scatter graphs, and line graphs.

There are plenty other types of graphs available, but let's just focus on these four for now.

What could be the advantages and disadvantages for using each type of graph? We're gonna explore this question shortly, but just pause the video first, have a think about this, and then press play when you're ready to continue.

Now, bar charts and pie charts can be useful for data that involve frequencies of different categories.

If we imagine that we take a sample of people in a town and ask them which type of car they use between petrol, diesel, and electric, and this is the data that we get presented in the table on the left here.

We could present that data either using a pie chart, like we can see in the middle here, or a bar chart, like we can see on the right.

And both of those types of graph represent this frequency data pretty well, and give a good sense of what's going on in this sample.

Each of these graphs still have the benefits and disadvantages.

For example, in which graph do you think makes it easier to compare the frequencies of petrol and diesel? Is it easier to see that comparison between petrol and diesel in the pie chart or in the bar chart? With the bar chart, you can see quite clearly that the bar for petrol is higher than the bar for diesel, but in the pie chart, it's not quite as easy to see which of those sectors is bigger than the other.

There are also comparative bar charts.

Comparative bar charts can be used to compare data from two different groups when the total frequencies are roughly equal.

If you imagine that same data investigation again about types of cars, but if we did it in two towns.

We collected a sample of data from town A, and sample of data from town B.

If we represent this data on a comparative bar chart, we've got purple bars for town A, and teal bars for great town B, and those bars are next to each other each time, so we can make direct comparisons between one town and another.

For example, we can see that town B use fewer diesel cars than town A, because town B's bar for diesel is smaller than town A's bar, and we can see quite clearly as well that town B used more electric cars than town A as well.

Now this works pretty well, because the total frequencies for each town are roughly similar.

But, comparative bar charts can be less useful for making comparisons between data sets that have drastically different frequencies.

For example, let's imagine that the data looks like this now.

In town A, the total frequency is around about 100.

So only 100 people are asked in this survey.

But in town B, around 2000 people are asked.

Well, if we represent this data on a comparative bar chart, it looks like what you can see on the right now.

Every single one of those bars of town B is much bigger than the bars of town A.

And that's not really an indication of the choices of car types between these two towns.

It is simply because so many more people in town B were sampled than in town A.

So of course the bars are all gonna be bigger.

Pie charts, on the other hand, can be useful when making comparisons between data sets that have different total frequencies.

And that's because a pie chart shows the frequencies in proportion to the amount of data in each set.

For example, here is that same data that we saw previously on the comparative bar chart, now presented in a pair of pie charts.

Now here it doesn't matter that these sample sizes were so drastically different between town A and town B, because each sector on that pie chart, it represents the frequency of that category in proportion to the sample size.

And that's the key thing here.

For example, we can see that the proportion of cars that use diesel is less in town B than it is in town A, because the sector for diesel in town B has a smaller angle than the sector for diesel in town A.

So, what about line graphs? One thing that line graphs can be useful for is showing us how a variable changes over time.

For example, the table here shows us weather data taken from the Met Office.

And we can see that we've got months February, April, June, August, and October, and we can see the total rainfall for those months in millimetres.

If we represent this data on a line graph now, we've got the months going across the horizontal axis, that's time, and the vertical axis shows us the amount of rainfall each month, and what we can see is, as the year progresses, we go from having a lot of rainfall to not very much rainfall, to then more rainfall again later in the year.

It illustrates the changing of that variable over a space of time.

And what about scatter graphs? Well, scatter graphs can be useful for showing whether a relationship exists between two different variables in data that is bivariate.

In particular here, we're focusing on when we have two variables going on at the same time, and we wanna see whether they are connected or related or associated in some sort of way.

For example, here we have a table that shows data from the Met Office about weather, but this time we have two variables.

We've got rainfall is one of our variables, and amount of sunshine in hours is the other variable.

If we wanna plot both of these on the same graph, that's where a scatter graph can be helpful.

We can have total rainfall across one axis, the amount of sunshine across the other axis, and now by plotting this, we can see here that there is a connection between these two variables.

It looks like they have negative correlation.

As the amount of rainfall increases in a month, the amount of sunshine decreases.

Scatter graphs, however, are less useful for data involving frequencies of different categories.

If we go back to our petrol, diesel, electric car scenario from earlier, how would you plot this on a scatter graph? It probably wouldn't be very useful to do that.

Whereas a bar chart or a pie chart would be much more appropriate for this data, particularly because it's frequencies.

Let's now consider the pros and cons of scatter graphs and line graphs a little bit more, by looking at this weather data from the Met Office about Sheffield and Stornoway for each month in 2019.

We have a scatter graph on the left, where the horizontal axis shows us the total sunshine duration in hours.

That's how many hours sunshine was visible for in each month.

And the vertical axis shows us the mean daily maximum temperature in degrees Celsius.

That was calculated by taking the maximum daily temperature each day in the month and find the mean from.

And on the line graph, we have the months going across the horizontal axis, and two being February, four being April, six being June.

And then the vertical axis shows us the mean daily maximum temperature again.

So the same variable is on that vertical axis on each graph.

We have two locations, Sheffield and Stornoway, which are represented on each one.

Sheffield is represented by purple crosses, and we can see on the line graph those purple crosses are joined up with a purple line.

And Stornoway is represented by the teal spots.

Again, on the line graph, those teal spots are joined up with lines.

Look at these two graphs now, side by side.

Which graph shows most clearly that Sheffield is often warmer than Stornoway? Would you say that's the scatter graph, or the line graph? Pause the video, have a think about this, and press play when you're ready to continue.

Arguably, the line graph shows this most clearly.

Now in both of these graphs, the temperature is represented by how high those points are, 'cause the temperature is the vertical axis.

On the scatter graph, what we can see is there are some points which are higher for Sheffield than Stornoway, but there are also some points for Stornoway which are higher than they are for Sheffield.

That's because they are different months, and we can't really see what the months are on the scatter graph.

Whereas on the line graph, we can see for each month, pretty much every single month, we can see that the purple cross is higher than the teal spot.

So we can see that Sheffield's temperature is warmer than Stornoway's temperature for most of the months on that line graph.

So then, which graph shows most clearly that warmer months have more sunshine? Pause the video, have a think, and press play when you're ready to continue.

In this one, the scatter graph shows it more clearly, 'cause what we're looking for here is a relationship between two variables.

Warmer months having more sunshine.

We're looking for how the temperature and the amount of sunshine connect with each other, and that's what is shown on the scatter graph.

We've got both of those variables represented together, and we can see how they're connected.

Whereas the line graph only has one of those variables in, so we can't really see how that variable connects to sunshine.

Okay, let's check what we've learned there.

True or false? A bar chart is always clearer than a pie chart.

Choose true or false, and then one of these justifications.

Pause the video, make your choices, and press play when you're ready to continue.

The answer is false.

A bar chart is not always clearer than a pie chart.

Sometimes it is, but not always.

And that's because you can compare the proportions of categories between data sets of different sizes more easily with a pie chart, which may be useful in some situations.

Which graph is the most useful for illustrating a connection between two variables in bivariate data? Is it A, a bar chart, B, a line graph, C, a pie chart, or D, a scatter graph? Pause the video, make a choice, and press play when you're ready for an answer.

The answer is D, a scatter graph.

And that's because you use one axis for each variable, and then you plot it and you can then see how they are connected.

Which graph is most useful for illustrating how a single variable has changed over time? Pause the video, make a choice, and press play when you're ready for an answer.

The answer is B, a line graph.

That's because you can use your horizontal axis to represent the passing of time, and then see how the heights of the points change over that time.

Some representations of data can be useful to illustrate the context of that data.

Pictograms, for example, are very similar to bar charts, but they use pictures instead of bars to illustrate what the data is about.

For example, if I show you this pictogram here, without giving you any more information about the data, could you guess what the context of this data might be in this pictogram? Pause the video, have a think about this, and press play when you're ready to continue.

One example of what this context could be is number of bikes sold per month by a shop.

We don't know that for sure, but we can probably guess that this data is about numbers of bicycles per month, in some sort of way or other.

This same data could be shown with a bar chart.

It kinda does the same thing.

The length of the images represents the frequency of each month, but what the pictogram does is it gives us an instant sense of what the context of this data is about.

It's about bicycles.

But pictograms are less useful when you need to divide pictures up precisely to represent frequencies accurately.

For example, if I wanted to present that 17 bikes were sold in month four, this would be difficult using this pictogram, because I would have to slice that bike up accurately to represent 17.

That's probably not really easy to do.

So what other type of graph could be used instead of this one? Pause the video, have a think, and press play when you're ready to continue.

Well, this data could be represented by using a bar chart or a pie chart to show the frequencies for each month, or because it's the passing of time, month one, two, three, four, a line graph could show that pretty well as well, and be a bit more accurate, 'cause it'll have a scale.

Another way that we sometimes see data presented visually is through something called infographics.

Infographics are sometimes used to present data in a way that illustrates its context.

Now these are not strictly graphs, but they can help people visualise data.

We often see them used in media that aims to try and present data to the public in a way that gives them a very quick sense of what the data is about, and what the data says, as well.

For example, take a look at this infographic.

Without giving you any more information about this data, I wonder if you can guess what the context for the data in this infographic might be.

Pause the video, have a think, and press play when you're ready to continue.

Now, we don't know for certain precisely what this infographic is about, but some guesses we might have could be things like, it could be the percentage of people who said they would recommend product A and the percentage of people who would recommend product B in some kind of survey, for example.

The fact we have pictures of people in this infographic give you a a quick sense of, this is data taken from people.

It's not about animals or plants or things like that.

It is some kind of data taken from people, and that might be an opinion survey.

And the shading of the people on these two infographics give you a sense of more people like product A than like product B, or whatever it was they were asked.

Here's another infographic.

It is based on data taken from the ONS, Office of National Statistics, about Leeds in 2015.

Without giving you any more information, can you think, what could this data be about? And what conclusions does this infographic try to show? Pause the video.

Have a think about these two questions and press play when you're ready to continue.

Looking at this infographic, we can see there is black smoke coming out of a factory, a house, and a car.

We can make a good guess that this is probably about greenhouse gas emissions, or something to do with pollution in some kind of way.

And it's making comparisons between industry, homes, and transport.

And what conclusions does it show? Well, one conclusion we can take is that most emissions in Leeds in 2015 came from transport.

Now, we can see that by looking at the numbers, but by presenting that data as smoke clouds of different sizes, it makes that conclusion stand out much more clearly and instantly through a visual means.

Now, the smoke clouds are different sizes, and the sizes of images in infographics are sometimes scaled to match the numbers, but not always.

We need to remember, infographics are not strictly graphs, they are just visual ways of communicating data, but accuracy is not always the key within infographics.

Sometimes they are, but not necessarily.

Therefore, sometimes images are not to scale, as the purpose is simply to highlight which numbers are bigger than others, and get across the context of those numbers as well.

Whereas a bar chart could be more useful when accuracy is important.

We may wanna use a bar chart to present this data to a scientific audience, because accuracy might be a key thing in those scientific discussions, and we might wanna use infographic to present this data to the public, where they're not necessarily as concerned about accuracy.

They just want to kind of get a bit more of an instant sense of what the message is they need to take away from the data.

Let's check what we've learned there.

The infographic below is based on data from the Met Office, about Lowersoft in 2008.

What do you think the data could be about? Pause the video, write something down, and press play when you're ready for an answer.

This data is about the amount of rainfall during each season in Lowersoft.

Starting with the lowest, sort these seasons according to the amounts of rainfall.

Pause the video, have a go, and press play when you're ready for an answer.

And our answers are winter has the least rainfall, then summer, then spring, and then autumn.

Now you might have got that by looking at the numbers, 119, then 144, or you might have got it just visually, by looking at the sizes of the water droplets instead.

Okay, over to you now for task B.

This task contains three questions, and here is question one.

The table shows weather data from the Met Office for Cardiff in 1995, and you need to plot a graph to illustrate how the duration of sunshine changed during the year.

The question doesn't tell you which type of graph to draw.

You need to make that decision yourself, and choose the most appropriate graph to illustrate what the question is asking you to illustrate.

Also, once you've chosen the graph, you then choose your scales and label your axes to make your graph as clear as possible.

Pause the video, have a go at this, and press play when you're ready for question two.

And here is question two.

We have data again from the Met Office for Cardiff in 1995, but the data on the table is a bit different.

This time it says, plot a graph to illustrate the relationship between sunshine duration and maximum temperature.

Again, you need to choose the most appropriate type of graph.

You need to label your axes clearly, and also choose an appropriate scale that makes it clear to see.

Pause the video, have a go at this, and press play when you're ready for question three.

And here is question three.

Once again, we have weather data from the Met Office, but this time it is for four locations in 1995.

I need to plot a graph to illustrate how the total number of days with airfrost differs between these four locations.

Once again, you need to choose the most appropriate type of graph for illustrating this point clearly, and label it all clearly as well.

Pause the video, have a go, and press play when you're ready for answers.

Well done with that.

Here is what question one may look like.

The choice of graph here is a line graph, because we're trying to illustrate how the duration of sunshine changed during a year.

Across the horizontal axis, we've got months labelled from one to 12, and then up the vertical axis we've got sunshine duration, and then the data's plotted like you can see on the screen.

Here is question two.

The choice of graph here was a scatter graph because we're trying to illustrate a relationship between two variables.

Sunshine duration, and maximum temperature.

You may have labelled your axes the other way around.

One thing to bear in mind with a scatter graph is it's usually convention to have your independent variable along the horizontal axis, and your dependent variable on your vertical axis, so we'd probably assume that the amount of sunshine probably affects the temperature, so that's why it's this way around.

And in question three, the choice of graph here was a bar chart.

That's because we have frequency data, the number of days of airfrost in each one, and also we wanna make comparisons between these four locations, so by presenting on bars, we can see very visually, without looking at any numbers, that Durham has the most days of airfrost.

And Ballypatrick has the fewest number of days with airfrost.

Wonderful work today.

Here's a summary of what we've learned in this lesson.

The theme of the lesson has been about choosing appropriate visual representations and presenting them clearly in order to either be able to analyse data and make interpretations out of it as a researcher, or communicate your data to others in a way that makes it clear for them to make interpretations from.

Therefore, graphical representations should effectively represent the data.

The scales and the axes should be chosen carefully, to give a clear and also accurate representation of that data as well.

Some graphical representations are easier to understand than others, so we should always bear that in mind, especially when we're communicating data to other people, but also, an appropriate graphical representation should be chosen to clearly illustrate the conclusion it is that you want to communicate to other people.

I've finished the video