Lesson video

In progress...

Good day to you all.

I'm Mr. Gratton and welcome to another maths lesson.

In today's lesson, we will look at different statistical representations as a means of comparing two data sets.

Pause here to have a quick look at the definitions of statistical summary, central tendency and spread or dispersion.

Okay, first up, let's have a look at comparing summary statistics to statistical graphs, but why should we use summaries and graphs? Well, because trying to understand and compare large data sets of raw data is often impractical, so using summary statistics or statistical graphs helps us to interpret and explain what a dataset is trying to show.

And using both numerical and graphical summaries together, we'll build a much better picture of dataset, especially a large dataset compared to using one over the other.

And here's an example of that.

Alex wants to know how tall he is compared to the other 11 year olds at his school.

He looks at the raw data of the heights of all of the other 11 year olds.

Is it easy to understand how tall or short he is just by looking at a massive bunch of numbers? No.

However, using both numerical statistics and a graph will help him a lot.

He is 140 centimetres, which we can now see is larger than the modal and median height using the numerical statistics, and he's very much in the mix of the majority of students as seen by the graph.

Today's lesson we'll focus on usually the more the better.

Using only one summary statistic gives very little insight into how a dataset is distributed.

For example, a range of seven tells you very little beyond how varied the data are.

It says absolutely nothing about the scale or the value of the data that you're dealing with, just that the minimum and maximum values vary by seven.

This shows that point very clearly.

Each of these data sets has a range of seven, and yet their distributions all look so different from each other.

But what if we had a second summary statistic? The mode.

Bimodal at 41.

5 and 45.

It becomes so much clearer which bar chart represents this data now.

This bar chart is the only one that is bimodal, so it must be the correct one.

The other two were not bimodal and so could not match the summary statistics given.

Whilst more summary statistics is usually better, some summaries are a lot less useful at representing the data it was calculated from.

For example, the mean of 43.

4 represents a part of the dataset with only a small number of data points near the dip of the two bimodal peaks, so isn't the most useful summary statistic to describe this dataset with.

A range of seven and bimodal are far more representative at trying to describe this dataset.

Whilst always important to consider and still very useful, the mean and median are less likely to be representative measures when a dataset is bimodal, especially when trying to compare the summary statistics to the graphical representation of the same dataset.

The mean can also be affected by outliers or extremely small or large data points.

On this bar chart, there are no outliers.

The mean is therefore relatively close to the peak of the dataset.

However, the moment outliers are introduced, the mean moves much further away from those more high frequency values.

The mean moves further away from the peak.

So, the further away the mean is from the more frequent data points, the less representative the mean is when trying to describe a graphical representation of the data data.

Onto the median.

The median of 53 is the middle value of an ordered dataset, but on a bar chart, this means that the total heights of the bars to the left of the median should be approximately the same as the total heights of the bars to the right of the median.

In this example, the total height of the bars to the right of the median is 37, whilst to the left it is 38.

This shows that the median is a very good description of the centre of a dataset.

Okay, to figure out which graphs represent a dataset described by several summary statistics, it is best to first rule out the graphs that it cannot be, as sometimes this will be easier to spot.

A summary statistic will be revealed one at a time on the screen.

After each statistic is revealed, rule out the graphs it cannot be until only the correct graph remains.

First up, the mean is 27.

For non bimodal data or graphs that don't have outliers or extremely large or small values, the mean is more likely to be in the area around the peak.

Which of these bar charts do not have 27.

4 in the region around the peak? It is clearly not B.

The mean has to be within the range of the dataset, and this dataset only goes between 40 and 50.

27.

4 is clearly not in this range.

And for C, 27.

4 is too far away from that peak of the dataset and it doesn't have any noticeable outliers and so isn't a sensible location for the mean.

Next up, the mode is 26.

The mode is the most frequent data point.

Graphically, this is represented by the tallest bar on the bar chart.

Both A and D have 26 as the mode, so we cannot rule out either.

And finally, the range is 10.

The range is the difference between the largest and smallest data points.

D'S range is far too big, so A is the only valid graph for this summary statistic.

Therefore, graph A is the only valid graph that is representative of all three of these summary statistics.

We can do the exact same thing for dot plots as their properties are very similar to bar charts.

So the mode is 42.

It most certainly cannot be the top one as the mode here is 74 and the dataset doesn't even include 42.

The mean is 44.

Both of these remaining two are still valid.

Next up, the range is 20.

The range of the left dot plot is 36, which is far too high.

The right dot plot is the only one that matches all three of these summary statistics.

Okay, quick check.

Which of the mean, median, mode and range is incorrect for this bar chart? Pause here to consider each of these four summary statistics.

Well done if you spotted that the correct answer was the median.

The median is the central value when a dataset is put into order of size.

It is impossible that the maximum value in a dataset, one with a low frequency that is, can ever be the median.

However, the maximum value in a dataset can be the median if and only if that value is also the modal value in the dataset.

Okay, next up, which of these summary statistics is applicable to bar chart A, bar chart B, both or neither? Pause to match the bar charts to each summary statistic and put a tick in each correct box.

Well done if you've got these as the answers.

Last check for this cycle.

A summary statistic will appear on screen one at a time.

Write down the name of the dot plot that you eliminate after each statistic appears because the dot plot will no longer match with that statistic.

Three summary statistics will appear and I'll pause briefly between each one.

First up.

Mode of 15, and mean of 16.

Which one graph is still applicable because it is represented by all three of these summary statistics? Well done if you spotted that B was the correct answer.

Okay, onto the practise.

Put a tick in each row for each bar chart that can be represented by the given summary statistic.

Pause to do this.

Question number two.

Each summary statistic matches with the two bar charts.

For each summary statistic, write down the correct value and the names of the two classes that it represents.

Pause to give this question a go.

For question number three, place a tick by each letter if all summary statistics in the section above the four letters accurately represents the bar chart.

Pause to give this question a go.

Pause here to have a look at all the answers for question one.

And pause here to have a look at the answers for question number two.

And finally for question number three, pause here.

Amazing work if you've got any of those questions correct.

In our next cycle we will be looking at completing graphs when given summary statistics.

Laura and Lucas decide to carry out an investigation about the number of students who use the library after school.

Laura asks whether there'll be the same number from each year group, whereas Lucas hypothesises that the proportion of each year group will change throughout the year.

Laura makes a plan.

They will collect data on certain days throughout the school year, and compare which year groups appear and which year groups do not appear as frequently.

The data Laura collects for September has a range of nine, but the bar chart is incomplete.

Can she accurately complete the bar chart with the information that she has? Well, there are 40 students in total.

By adding the heights of each bar, she can calculate how many students are already on the bar chart and how many are missing.

In this case, three students are missing from the bar chart.

Laura also says that the range of nine means that the highest year group must be year 10.

Year 11 must have a frequency and bar height of zero.

We know as long as year 10 gets at least one of the three remaining students, the other two students can be distributed between years nine and 10 in any way.

So for example, all three could go into year 10 or two could go into year 10 and one in year nine, or one could go into year 10 and the other two in year nine.

There are multiple solutions, so she would need more information in order to find the single correct distribution of students.

In March, Lucas collects data on the 64 students in the library, but again, he loses some data.

He knows that the dataset is bimodal and more year sevens were in the library than year sixes.

Our approach to filling in the gaps in Lucas's data is the same as before with Laura, by adding the heights of each bar.

46 is the number of students he has already represented.

64 takeaway 46 equals 18.

18 students have not been represented yet on this bar chart.

The data are bimodal.

This means that either both year six and year seven have an equal frequency that is greater than 10, or either year six or year seven has a frequency of 10 and the other has a lower frequency.

These are the only two conditions that would make this dataset bimodal.

However, it also says that year seven must have a higher frequency than year six, meaning that year six and seven cannot be bimodal with each other.

Therefore, year seven must have a frequency of 10 to become bimodal with year 11.

Since there were 18 students missing and 10 were in year seven, the remaining eight will go with year six.

This is the only possibility, so this is the fully correct bar chart.

Okay, onto some checks.

Laura collects data on the 70 students in the library, but again loses some of that data.

She knows that the range of the data is eight.

Which year groups have data not currently shown on the bar chart? Pause now to consider what the range represents on a bar chart.

So, the range of the bar chart is eight and the highest aged year group is 11.

So 11 takeaway eight is three.

Year three is the youngest year group that should be included.

So years three, four, five, and six are missing from this bar chart.

However, all or most of the missing results could be from year three, so it is possible that some of years four, five, and six are already correctly represented with a bar height of zero.

Next question.

She also knows that the data are bimodal with modes of year six and year 11.

How tall should the bar representing year six be? Pause here to consider what the mode represents on a bar chart.

If year 11 had a frequency of 16, then year six should also have a frequency of 16.

Next up, Lucas remembers how many pupils were in year three and year four, so adds that information to the bar chart.

By reading the frequencies of year three and year four, find out how tall the bar representing year five should be.

Pause now to perform some calculations.

Okay, the total height of all of the bars is 66.

We know that there were 70 students in total, so 70 take away 66 means a height of four for year five.

Well done if you got that right.

Okay, onto the next set of practise tasks.

Alex collects and loses some of the data he collects on the 58 people in his local town centre.

He knows the data has a range of six days and the modal number is two days.

Pause now to consider this information, answer these questions and complete the bar chart.

Finn does something similar.

He asks 46 people, with the data being bimodal, and that more people visited seven times per week than two times per week.

Pause here to answer these questions and complete the bar chart.

Here are the answers for Alex.

He must leave zero days blank as the range is too small to account for that zero days column.

18 people are yet to be represented and so that 18 must be split up between days one and two.

One must have at most a value of six, whilst two must have a minimum value of 12.

For Jun, 25 people are yet to be considered.

The information given leads to three and four days being bimodal at exactly 12 each.

If either had 11 frequency, then two days would have a frequency of two, breaking the criteria that had a lower frequency than seven days.

A skill that may be new to you is the idea of sketching, not accurately drawing graphs from statistics to quickly gain an idea of the distribution of a dataset.

There are many reasons for this, including not knowing all of the values or frequencies, not being sure of the scale of the axes, only knowing some of the summary statistics of a dataset or simply it is used for planning before representing the full results on an accurate graph later on.

A graph sketch does not need to look perfect, but it should still follow the rules of what a graph, such as a bar chart, should and shouldn't look like.

Using only the summary statistics of mode as 40 and range as seven and labelling the minimum and maximum values on the x axis, sketch two different bar charts.

Good practise still means each bar must be of approximately the same width and still have gaps between each bar, preferably of the same width too.

The height of the bars that you draw do not need to be perfectly accurate, but preferably in proportion with each other.

And step one, draw a set of axes, then plot a bar and label it if you consider it to be an important value, such as the minimum, maximum or a summary statistic like the mode.

Draw the rest of the bars in a way that does not break any of the other summary statistics.

In this example, the other bars must be lower than the bar height of the 40 because the 40 is the mode.

Also, make sure that you have the correct number of bars by looking at the range.

So here's a second different bar chart that fits with these summary statistics.

Okay, but what makes these sketches are not fully accurate bar chart? We'll notice that the Y axis is not fully numbered.

The x axis is only numbered with important values, such as the mode, the maximum, and the minimum values.

The bar heights may not be drawn perfectly to scale, but the tallest bar should represent the modal value for example.

However, if this bar was moved further away from the rest, the range may look larger than it is.

This is why it is important to keep the bar and the gap widths as equal sized as possible.

Sometimes sketching a bar chart can be a little long-winded.

Dot plots are vertical line graphs are quick ways of sketching if you want to get an idea of a distribution.

What will this dot plot look like for this data? This considers the range as 20 take away nine is 11.

The bimodality at 12 and 18 have the most dots in that column, and the median is 14 as there are the same number of dots to the left and the right of 14.

The same application works for a line graph like this, but can be even quicker as you can just draw a line rather than individual dots.

Both the dot plot and the vertical line graph show the above information but are still very different in their distribution.

Okay, let's do a demonstration.

I will sketch a vertical line graph with the information on the left.

After each step, follow my example for the information on the right.

Step one, draw your axes.

The height can be anything suitable, but the length should be a minimum of double the range of your dataset plus two extra squares.

In my example, the range is 10, so double the 10 to get 20, then plus an extra two to get 22.

This is always a sensible minimum for your x axis.

Step number two, plan the location of key values on the x axis.

For example, consider where the modal value will be as well as the minimum and maximum values.

I choose my minimum value to be eight.

Because the range is 10, my maximum value is going to be 18.

Each two squares will be one vertical line, so count up two squares 10 times and at the end location, label it 18.

Along the way, also mark accurately in intervals of two, any other important values, such as the mode of 15, and the other important point of 12.

After you've done the x axis, step three is to draw the vertical lines or dots or bars, whatever type of graph that you're going to draw.

First of all, draw on the modal value, nearly as high as your y axis, just to make sure that no other bar or line goes any higher than it.

You don't want that because this is the mode.

On a sketch, it is optional to label anything on the y axis, but I want to do this for this particular example because it mentions that the data point of 12 has to have half the frequency of the mode, 15.

Therefore, if the bar or the line has a height of eight for the mode, then I know that the line has to have a height of four for this data point, 12.

Step four is to complete the rest of the graph, making sure that the minimum and maximum values are correct and no other line or bar goes higher than the modal one.

And here's an example of what yours might look like.

Okay, onto some practise on sketching graphs.

For part A, write down a calculation mentioned in the demonstration previously to show why the x axis is advised to be at least 16 squares across for this particular set of summary statistics.

After that, complete Jacob's vertical line graph, then draw your own that still satisfies these summary statistics.

Pause now to do that.

For question number two, sketch two different dot plots with one being bimodal, where the median is 14.

Pause now to do that, and remember there is no one correct answer because many graphs can have a median of 14.

Onto question number three.

Sketch a bar chart using this information and using the axis I have provided for scale.

Pause now to give that a go.

Finally, sketch a dot plot of this information.

You must draw this from scratch, including drawing a set of appropriate axes.

Pause now to give that a go.

Amazing effort.

For question one, you take the range of seven, double it to get 14, then add on an extra two.

That is why 16 is a suitable distance for the x axis.

On Jacob's drawing, the line at 80 must be the highest, but it can take on any height higher than the others.

Well done if you drew your own vertical line graph that has a range of seven with the longest line at 80.

And for question number two, here are some examples of what your dot plots may have looked like.

For question number three, here's an example of what your bar chart may have looked like.

As long as it starts on two, ends at seven and has the tallest bars at four and seven, it will be an accurate graph.

Here's an example of question four.

Zero should be on its own, with not many other data points close to it.

The most dots should be at 10, the mode, with more dots less than 10, then greater than 10 to help make that mean slightly lower than the mode.

Amazing work on a new and challenging skill.

You have persevered on a lesson where we have compared statistical summaries to graphs and completed incomplete bar charts using statistical information.

We have sketched graphs from scratch using this statistical information in order to get a quick understanding of the distribution of a dataset.

That is all for today's lesson.

I hope to see you soon for some more maths fun.

But for now, have a nice day.

I've finished the video