video

Lesson video

In progress...

Loading...

Hi everyone, I'm Mr. Gratton.

Welcome to another maths lesson where I'll be your teacher today as we use different measures of central tenancy and spread to compare between at least two datasets.

A statistical summary is a set of statistics that sum up the properties and features of a dataset.

It may contain the mean, median, mode, and range.

If I refer to a summary statistic, that is one of the measures that we will use in a statistical summary.

Pause here to take a look at the definitions of central tenancy and spread or dispersion.

The first cycle will involve using one summary statistic to start comparing between different datasets.

And that's actually a true purpose of a summary statistic.

Using the mean or range for example, gives some insight when analysing one dataset, but it is much more insightful when comparing two datasets or two different populations.

Here are the test scores for Aisha and Sam.

Let's look at each summary statistic one at a time.

Starting with the mean.

Sam's mean is higher than Aisha's, therefore Sam's mean average score or typical score is higher.

We can say that on mean average, Sam scored five marks higher per test than Aisha did.

On the other hand, both of their medians are similar.

Therefore, both of their most central or middle scoring tests were of a similar value.

The median is very good at summarising these middle tests rather than focusing on the tests they did very good at or very bad at.

But Aisha's modal score was much higher.

This means that Aisha's most common scores were higher than Sam's most common scores.

This is easy to spot 'cause Aisha's score of 86 appears twice whilst Sam's score of 73 appears twice.

And lastly, Aisha's range is much greater than Sam's range.

The range is a different measure from the other three we've looked at.

It measures how spread out or varied a dataset is compared to the mean, median, and mode which looked at central tendency.

The bigger the range, the more varied the dataset.

Alternatively, the smaller the range, the more consistent the dataset.

Therefore, Aisha had a bigger variety of scores whilst Sam had a more consistent set of scores.

Here is a summary of the four summary statistics that we have just discussed.

Okay, here are three quick checks for understanding, Which of these statements are true for the mode of Sophia and Andeep's statistical summaries? Pause to look through all of these options.

The answer is B.

The mode looks at the most frequent values to which both Sophia and Andeep's were pretty similar, at 77 and 76.

Similar again, but this time for the mean.

Pause to see which of these three statements is true using their statistical summaries.

The answer is C.

The mean references mean average or typical values.

In this case, Sophia's typical score is nine higher than Andeep's typical score.

If you add up all of Sophia's scores and share them equally, each score would be nine higher than if Andeep did the same to his set of scores.

Lastly, the range.

Pause now to choose the most representative statement from the statistical summaries.

The answer is A.

The range describes variety.

And Andeep had a much greater variety in his scores than Sophia did.

Lucas used dot plots to compare his test scores between year seven and year eight.

Lucas's mean score increased from 63 to 70.

This means his typical score increased over the two years.

On each test in year eight, he's expected to score seven more marks than the equivalent test in year seven.

The median will look at his middle scoring test rather than the tests he did best or worse at.

His middle scores which are here, as you can see, he improved by 10 marks between the two years.

For the mode, we can look at the tallest column of dots.

In year seven, there was one such column at 75, whilst in year eight, there are two at 55 and 75.

This means Lucas's data is now bimodal.

Lucas's most frequent score of 75 hasn't changed, but now he also has a second equally common score of 55.

And finally, the range.

Lucas's range has decreased from year seven to year eight.

This is shown by a smaller distance between the lowest value dot and the highest value dot on each dot plot.

This means Lucas has become more consistent over two years, which is especially impressive since his mean score has also increased.

Lucas's scores therefore seem to have become better and more consistent across the two years.

Onto the next check for understanding.

Which of these statements is accurate for the range of these two dot plots? Pause to evaluate these statements.

The answer is A.

This is because there is more variation in class A because there is a bigger distance between the smallest value and the largest value dot on the dot plot.

Onto the mode.

Pause here to evaluate which statement or statements about the mode are correct.

Well done if you spotted that there are two correct answers, A and B.

For A, modal frequency means the number of dots or data points on that modal value.

The modal score of class B was 55 compared to 49 in class A, but they both had a modal frequency of three, because there were three dots or data points representing that value.

Right, time for some practise.

For question one, which statement is correct for the median and which one is correct for the range? The other three statements are not correct for these two summary statistics.

Pauses now to read through all five options.

Here are Lucas's scores in English over two years.

Calculate and interpret these two datasets.

Pause now to calculate the mean.

This time, we have Lucas's science practical scores.

Using only the mode, write in as much detail as you can, whether Lucas has improved or not.

Pause to compare his results.

For question number four, Alex, Jun, Izzy, and Jacob all receive their end-of-term test scores.

Each student wants to pick only one summary statistic that best summarises and represents their scores to impress their parents.

Each summary statistic can only be used once.

For example, the mean can only be used by one student.

By calculating the mean, median, mode, and range for each of the students, pick the summary statistic that you think they should use to represent their scores.

Pause here and good luck matching the summary statistic to the students.

Here are the answers.

For question one, A was the median whilst D was the range.

For question number two, Lucas's median or typical English score increased by three marks across the two years.

For question number three, Lucas's most frequent scored increased from 10 to both 14 and 17.

This means he has vastly improved his science scores.

And here are the statistical summaries for all four of the students in question four.

The rightmost column shows the summary statistic that each student should have chosen.

Pause here to match all of the information on screen to all of the information that you have calculated.

Using one summary statistic to compare two datasets is good, but using multiple summary statistics will help build a more well-rounded comparison of the datasets, especially when the distribution of each one is very different from the other.

For example, these dot plots show the number of goals of two football players.

The mean number of goals for both Mary and Frank is three per match, but each dot plot looks dramatically different from each other.

So how can the mean number of goals be the same? By using a second summary statistic, in this case the range, we can look at their scores in more detail.

Mary had a lot of variance in the goals that she scored.

Her range was seven but Frank's range was only three.

Meaning he more consistently scored around three goals whilst Mary sometimes scored a lot of goals, but other times didn't score many at all.

By using a third summary statistic, we can help support this claim.

Mary's modal score was zero whilst Frank's bimodal scores were three and four.

Using more summary statistics help make a better comparison between two datasets.

Comparing the dataset with just the mean gives a little insight into the typical number of goals scored, but without the context of how varied each match can be.

By comparing the ranges of the two people in addition to the mean will provide us more detail this time on the variance between the two people.

By using a further measure of central tendency such as the mode, this will help us justify even further the assessment that we've made with the mean and the range.

Onto some checks, two footballers, Laura and David played in 10 matches.

The range of each player was five goals.

Which of these statements is correct? Pause to consider which statements are definitely true with this limited information.

The answers are B and D.

The range shows variation.

An equal range means an equal amount of variation.

However the range gives no insight into the mean or any other of the averages.

And it certainly does not tell you anything about the number of goals scored or the distribution of their dot plots.

Next check.

Sticking with David and Laura, which of these statements are correct interpretations of their dot plots? Pause for time to compare the dot plots and these statements.

The answer is B.

The only correct statement about the mode is David's modal score is three compared to Laura's modal score of two.

By considering each dot on each dot plot as a data point, calculate the mean for each football player and then select the correct statement about the mean.

Pause to give time to do this.

And the answer is A.

Laura's typical score is half a goal higher than David's.

This means that on average, Laura will score half a goal more per match than David.

Lastly, for these checks, here is a statistical summary of Laura and David's goals.

Which of these conclusions are sensible? Pause to look through the statistical summary table and choose any of these sensible conclusions.

And the answer is all of them.

Even if we use many summary statistics to compare two datasets, the comparisons we make are ultimately up for different interpretations by different people.

This is especially true if the results are reasonably close like with David and Laura's scores.

It is a very important skill to be able to communicate the comparisons you make with the summary statistics that you have calculated.

In which of these two businesses did its employees work a great number of hours per week? In your comparison, it is best to always use at least one measure of central tendency, the mean, the median or the mode, and one measure of spread, in this case, the range.

Here is a model explanation communicating all of the summary statistics that you could find for these two datasets.

Whilst the average typical number of hours worked for people in Data Incorporated is higher with a mean of 33.

9 compared to 32.

3, Stats & Co.

had less variance in the number of hours worked with a range of six compared to 17.

In conclusion, I think that Data Incorporated works a greater number of hours per week, especially since their modal number of hours worked is also higher at 37 compared to 34.

This is a model explanation comparing these two datasets, because we have a measure of central tendency, the mean in this case, we have a measure of dispersion, the range, and in the conclusion we have a third statistic, in this case the mode, to support the interpretation that you've already done.

Final few checks.

This table of statistical summaries shows the number of hours worked by the employees of two different businesses.

By selecting three of these statements create a well-structured comparison of these two businesses.

Pause here to give yourself time to read through these six statements and consider which combination makes the most sense for these two statistical summaries.

The correct combination of statements is A, C, and D.

However, the conclusion would be even more effective if a further statistic, the mode, was used in addition to the other two.

Onto the final set of practise questions, select the correct combination of three statements that compare the results of Ella and Wayne.

Pause now to give yourself some time to look through all six of these options and choose the most representative three.

For question number two, by looking at the statistical summary of these two locations, complete each sentence using either the words Oxford or Stornoway or the correct number or calculation using the summary table above.

Pause now to fill in all of those gaps.

And similar again for question three.

Complete the sentences for parts A, B, C, and D using both these dot plots.

I'll pause four times, once for each question that will appear on screen.

Pause now to answer question part A.

And pause again for question part B And again for part C, pause now.

And here is part D.

Pause for this last part of the question.

By using the raw data for both Tiree and Valley, calculate the mean, median, and mode in order to complete the statistical summary table.

Afterwards, interpret each pair of summary statistics by making a comparison between Tiree and Valley.

Pause now to give yourself some time to calculate all of these summary statistics and make those interpretations.

For question number five, now's your chance to create a high-quality comparison of locations A and B from scratch by interpreting both of the dot plots.

And by using at least two summary statistics, write a few sentences comparing which location had the higher monthly temperature.

Pause now to articulate and write your comparisons.

Here are the answers.

A, D, and E are the correct statements and conclusions.

For question number two, Stornoway had the lower mean days of air frost.

Oxford had the higher range.

And Oxford had in conclusion, more air frosts supported by a higher median.

Question three A, the means were equal at around 12.

5 to 13 minutes each.

Part B, class B had the much higher median of 13 compared to nine.

Class B also had a slightly higher modal time.

For part C, class A had a much higher range showing more variation at 31 minutes compared to four minutes.

And in conclusion, class B took longer on average due to the higher modal and median results.

Here are the summary statistics and interpretations for Tiree and Valley.

Pause here to give yourself some time to compare your interpretations to the ones on screen.

And finally, here is a sample response for question five.

Pause to compare your responses for each of the following.

The mean and the median, pause now.

And pause now for the mode and range.

Pause now for the conclusions.

Very well done on getting through a very analytical and communication heavy lesson based on calculating statistical summaries and interpreting what they mean in the context of comparing between two or more datasets.

Where using more summary statistics will help build a better but never full picture of these datasets.

Thank you so much for joining me in today's lesson.

I hope to see you again for another maths lesson, but for now, have a nice day.