video

Lesson video

In progress...

Loading...

Hello everyone and welcome to this exciting lesson on numerical summaries with me, Mr. Gratton.

Thank you for joining me in a lesson where we will be comparing different data sets shown in a variety of different diagrams and representations using measures of central tendency and spread.

Pause here to check the definitions of some of these representations.

First up, let's compare different representations of the same data set.

It is sometimes possible to compare equivalent information between two different representations of a data set or we can use one representation to complete a different incomplete representation of the same data.

Comparisons may be possible between frequency tables.

Some of which may be grouped, bar charts, pie charts, and stem and leaf diagrams. It is a little bit more tricky for one data set to be shown from one of these representations and also on a scatter graph, so we will not be covering scatter graphs as well in this lesson.

Pause now to spot the two pairs of representations that represent the same data set.

Can you spot similarities between the two matching representations that allows you to put them into pairs? This pie chart and bar chart may show the same data.

Both of these show how frequent four different data are, engine, chassis, aero, and wheels.

And the E for engine is the mode for both charts because the sector of the pie chart and the bar on the bar chart are the largest.

We can spot its similarities.

However, we cannot say for certain that both these graphs represent the exact same data set unless more information is provided or we are told explicitly that they do represent the same data set.

In comparison, this grouped frequency table and stem and leaf diagram may also both show the same data.

This is because both of these show numerical data split into six groups.

And importantly the number of data points in each group match between them both.

But again, it isn't possible to say for certain that they both represent the same data set.

This is especially true because we do not know the exact values each row of the frequency table represent.

For example, in that top row of 40 to 50, we do not know if these two values are 49 or not.

And furthermore, because the stem and leaf diagram does not have a key, we don't actually know if 4, 9 represents 49 or something else such as for example, 4.

9.

Okay for this check, both of these charts do represent the same data.

Pause here to answer these two questions.

And the answer is more than 25% of students walked to school.

This was more easily answered using the pie chart as the sector for walk was greater than a right angle, greater than 90 degrees and therefore was greater than 25% of the whole pie chart.

And pause again here for this next check about two graphs that show the same data.

The modal class is 1 to 1.

9 centimetres and this can be found fairly easily from both the pie chart and the stem and leaf diagram.

Ah, but this question is ever so slightly different, not the modal class, this time an exact mode.

Pause here to see if you can identify the mode from the data and with which diagram was it possible.

The mode of 1.

5 centimetres is only possible from the stem and leaf diagram as it is the only diagram that shows the raw data out of the two.

The pie chart only shows the groups of data.

If you know for certain that two representations show the same data set, then it is possible to use one representation to complete another.

The bar chart is incomplete but represents the same data as this completed pie chart.

We can use a protractor to measure the size of the angles on the pie chart.

This will help us analyse relationships between different subgroups of data.

Using the pie chart shows us that the sector for tea is 150 degrees and here are the other angles.

Notice how coffee is half the angle of water.

This means the coffee on the bar chart will be half the height of water at 3 rather than 6.

The same applies for juice being half the height of tea at 5 compared to 10.

For this check, using the information given, how many squares tall should the bar on the bar chart be to represent milkshake? Pause now to have a look at both of these diagrams and consider your answer.

Milkshake has an angle double that of fizzy.

Double of 6 squares is 12 squares.

For this more challenging check, we have a stem and leaf diagram that correctly shows the number of people aged between 20 and 29.

There are nine data points in that row on the stem and leaf diagram.

Using the angles given on the pie chart, how many data points are missing from the row of the stem and leaf diagram for 30 to 39-year-olds.

Pause now to do this question.

On this pie chart, the angle representing 20 to 29 is three times the size of the angle representing 30 to 39.

Therefore there is 1/3 the number of data points in the group for 30 to 39 compared to the group 20 to 29.

If there are nine data points in the 20 to 29 group, then there should be 1/3 of nine, which is three data points in the 30 to 39 row.

One data point is already given, so there are two data points that are missing.

Using the information in the pie chart again, how many data points are missing from the row of the stem and leaf diagram for 40 to 49-year-olds? Pause now to consider this.

In the pie chart, the angle for 40 to 49-year-olds is also 72 degrees.

Therefore, there are also three data points missing from the 40 to 49-year-old row.

And finally pause here to think about or discuss using only the pie chart, is it possible to fully complete the stem and leaf diagram? And the answer is no.

Whilst we can figure out the number of data points in each group, we cannot figure out the exact values of these data from only the pie chart.

Great thinking and analysis so far.

Onto the practise questions.

For question one.

There are five questions to do with this stem and leaf diagram and pie chart that both show the same data.

Pause here to write down the answers to the first three of these questions in the correct column that shows which representation you used to answer that question.

And pause again here for parts d and e.

For question two, both the bar chart and the stem and leaf diagram show the same data, but both are incomplete.

The 60 to 69 bar of the bar chart is correct, but some of the data in the 60s group of the stem and leaf diagram is missing.

No other row of data on the stem and leaf diagram is incorrect.

Knowing that the modal value of this data set is 69, pause now to complete both diagrams. Okay, for the answers to question one, the modal class is 1,000 to 1,900.

This can be found from both the stem and leaf diagram and the pie chart.

The modal amount is 900.

This can only be found from the stem and leaf diagram.

The group of earnings that make up 25% of the data is 0 to 900.

Whilst you can find this out from the stem and leaf diagram, it is much easier using the pie chart because the zero to 900 sector is exactly 90 degrees, exactly one more person earned between 2,000 to 2,900 pounds per month compared to 3,000 to 3,900 pounds per month.

We could figure this out using only the stem and leaf diagram.

And lastly, the range is 3,000 pounds and can be found by subtracting the largest data point and the smallest data point from the stem and leaf diagram.

Okay, for question two, pause here to check the heights of your bars with the ones on screen.

And furthermore for the stem and leaf diagram, the modal value of 69 needs a total frequency of 4, as the value 42 already appears three times.

Therefore a leaf of 9 needs to appear three times on the 60s row of the stem and leaf diagram.

We've already looked a little bit at the mode and the range, but what other summary statistics can we find and from which representations? How simple finding summary statistics such as the mean, median, mode, and range can be will vary depending on the representation we have to find the summary statistic from.

So each of these representations shows the number of goals per match the Oakfield Academy were able to score in a season.

And from these representations the mode and the range can be found.

The mode is one goal per match as the tallest bar largest sector and highest frequency all show a modal value of 1.

Furthermore, the range is three goals per match found by subtracting these two values from all three representations.

However, finding the median from a pie chart is challenging and finding the mean from a pie chart is absolutely impossible at least without knowing further information such as the total frequency or the frequency of one data value.

We can find the median from both the bar chart and the frequency table by considering running totals using the frequency of each datum.

From the frequency table, we can create a third column that shows the running totals like so.

However, from the bar chart, it'll help to first of all write down the frequency of each bar like this.

The running totals can then be calculated left to right like this.

Notice how the other same running totals as the frequency table since both show the same data.

The median can be found by taking the maximum running total and adding one then dividing by two to give the 13th data point in the data set when ordered.

We identify the first running total greater than this position to give a median of one goal per match.

Pause here to think about or discuss which representation do you think was easier to calculate the median.

Next up, we can also find the mean from both the bar chart and the frequency table.

We can do this by considering the product of the number of goals and its frequency.

For the frequency table, it is as straightforward as multiplying the datum to its frequency in that third column.

However, for the bar chart, we can find the products by multiplying the value of each bar found on the x-axis and the height of the bar, the value on the y-xis.

The sum of the products is 33.

This can be found in a similar way from both representations.

The mean is therefore 33/25, where 25 is the total frequency or the total heights of all the bars in the bar chart, giving you 1.

32 goals per match.

Pause here to think about or discuss which representation do you find easier to calculate the mean? For this check, pause here to calculate the mean of this data set represented as a frequency table.

To calculate the mean, you need a product column with these values and then the total of the products and the total frequency are 118 and 64.

118 divided by 64 is 1.

84, the mean number of days of snow in a month.

And for this check, make a decision, calculate the median for this data set using either the table or the bar chart.

Pause now to do this.

We can first of all consider the running totals from either representation.

The median is at the 32.

5th data point and is therefore two days.

It is also possible to calculate summary statistics from representations that show grouped data such as this grouped frequency table.

Pause here to think about or discuss possible advantages and disadvantages that you might have come across before when calculating summary statistics from grouped data.

Both representations can show the modal class.

So in this case we've got a modal class of 1.

1 to 1.

2 in the stem leaf diagram, but 1.

2 to 1.

25 in the grouped frequency table.

Notice how both modal classes are different.

This is because each representation groups the data into intervals in different ways.

On the other hand, only the stem and leaf diagram can show an exact modal value in this case of 1.

21 kilogrammes.

This is because a stem and leaf diagram also shows raw data whilst the grouped frequency table only ever shows grouped data.

And lastly, an exact mean can be found from a stem and leaf diagram, in comparison to a grouped frequency table for which only an estimate for the mean can be found.

To find the exact mean from a stem and leaf diagram, we have to go through a really long-winded process of finding the sum of all of the data points in this example at 35.

35.

The mean is therefore 35.

35 divided by the 30 leaves, the 30 data points in this tree, giving 1.

178 as the exact mean.

Whilst still a lot of work, finding the estimate of a mean from a grouped frequency table is slightly more efficient and can be done by considering two extra columns of data, the midpoint column, and the product column for the product of the frequencies and the midpoint.

We then need to find the sum of all the products and the sum of all of the frequencies.

In this scenario, the estimated mean of 1.

18 is not too far away from the exact mean of 1.

178.

For this check, pause here to find the range of the data from the stem and leaf diagram.

The exact range is 0.

34 kilogrammes.

Pause here again to find the maximum possible range of the data from the grouped frequency table.

The maximum possible range is 0.

35 kilogrammes.

We can find this by subtracting the upper bound of the maximum interval and the lower bound of the minimum interval.

Pause here to think about or discuss why there is a difference between the two ranges.

The median class from a grouped frequency table is 1.

15 to 1.

2.

Pause here to identify whether the true median lies in this interval or not.

The true median can be found from the stem and leaf diagram and is 1.

185.

So yes, the true median does lie in the interval from the group frequency table.

Great work, let's find some summary statistics for different data sets in this practise task.

Pause here for question one where you have to find the mean, median, mode, and range from both of these representations.

And for question two, complete this table of summary statistics using either exact values or estimates or intervals using both the stem and leaf diagram and the grouped frequency table.

Pause now to do this.

Great work.

Pause here to compare the values in your frequency table to the one on screen.

And note the mean is 1.

22 and the mode is 1.

For the same data set, the median is 1 and the range is 3.

And for question two, pause here to compare all of the information you wrote down in the table of summary statistics and compare it to the one on screen.

So far we've looked at summaries from different representations of one single data set, but what about comparing different data sets shown to us in different ways? You may have to compare different data sets represented in different ways because you've collected secondary data about a population over time or about two different populations from two different sources where each source has presented their findings using different graphs or tables.

In order to compare these data sets, you'll have to identify summary statistics from both representations.

For example, we have two data sets.

One showing data from Oakfield and the other from Rowanwood.

Each one shows the number of hours worked per person per week rounded down to the nearest whole hour.

Your investigation is to figure out in which location did people work longer hours, Oakfield or Rowanwood? Pause now to think about or discuss how would you answer this question.

The main way is to compare the same summary statistic or multiple summary statistics across both data sets.

Whilst it is possible to make comparisons between the two samples, any summary statistics about Rowanwood will either be imprecise or an estimate due to the data being given in a grouped frequency table rather than as a representation that shows individual data values instead.

For these two data sets, pause here to find a mode or modal class for each sample.

For the stem and leaf diagram, the mode is 32.

But you can also find the modal class, the longest row of data at 30 to 40.

On the other hand, the modal class of Rowanwood is 26 to 34.

Okay, let's now use these summary statistics.

Pause here to consider which of these statements are appropriate interpretations of the modes and modal classes that we calculated.

Both Rowanwood and Oakfield have similar modal hours worked.

However, it is impossible to say for certain which location had a higher modal hours worked because of the modal class being the only way of representing the mode for Rowanwood.

Next up, the mean.

Given that the mean of Oakfield is around 31.

9 hours, pause here to find an estimate for the mean of Rowanwood.

The missing values are as follows and then the mean is approximately 31.

3 hours rounded to one decimal place.

Let's now use the means that we calculated.

Pause here to consider which of these statements are appropriate interpretations of the mean and estimated mean that we calculated.

Whilst Oakfield did have a higher mean, both were pretty similar.

Furthermore, comparisons are not perfect since Rowanwood's mean is only an estimate.

In order to justify whether the means of 31.

9 and 31.

3 are similar or not in this context, calculation of the range to consider how spread out each data is will also be very helpful.

The greater the range or variety of results, the less significant this 0.

6 difference actually is.

As we saw some estimated summary statistics can lead to helpful comparisons.

However, the way any extreme data or outliers are handled may make comparisons for data represented in different forms incredibly challenging.

For example, in these two locations, there was exactly one day which was impacted by torrential rain.

This decreased the number of butterflies that could be recorded.

In which location can you see the outlier more efficiently? The dot plot shows the outlier crystal clear at 12 butterflies.

However, whilst the large interval size of 0 to 50 in the group frequency table may suggest extreme values, it hides the fact that there is only one extremely small data point.

From the data alone, we could assume that there were many data points that were significantly lower than the more central values.

Sticking with this butterfly context, pause here to find the mode or modal class of these two data sets.

The mode is 64 butterflies as seen from the dot plot whilst the modal class from the frequency table is 0 to 50 butterflies.

Now that we know the mode or modal class, pause here to think about or explain whether a comparison between the modal number of butterflies in each of these two locations is sensible.

It is not very sensible.

This is because the interval containing the modal class is much, much wider than all other intervals in the grouped frequency table with a class width of 50 compared to the 5 of the other intervals.

And lastly, pause here to think about or discuss one suggestion for how the data of Elmsgrove could have been collected differently to make our comparisons more helpful.

The investigator could have changed the size or number of intervals they grouped the data into on the grouped frequency table.

And especially they could have evaluated how they handled the 0 to 50 interval and how it showed that single outlier.

For example, they could have represented the outlier as a separate data point rather than making an interval that is disproportionately large compared to the other intervals just to include one extreme datum.

Brilliant.

For question one of this final practise task, match the statistical diagram or table showing samples A, B, and C to the summary statistic below.

And for question two, come to a conclusion for which sample shows the highest number of people per day.

Pause now to do both of these questions.

And for question three, come to a conclusion over which bookshop sells more books per day and explain any limitations to your conclusions.

Amazing work.

Here are the answers to question one.

Sample A is summary 3, B is summary 1, and C is summary 2.

For question two.

Sample B shows the highest number of people per day due to the mean and median being higher than the others, whilst also being quite consistently higher due to the range being less than or equal to the other ranges.

For question three a, the statistical summary for Oakbooks is a mean of 39.

4, median of 41, mode of 49, and range of 26.

Whilst for Elm Reads, we have an estimated mean of 39.

9, a median class of 25 to 30, a modal class of 30 to 90, and an average expected range of 50.

Both Oakbooks and Elm Reads have summaries that justify a conclusion that shows that they sell more books than the other one.

Pause here to have a look at one possible explanation that concludes that Elm Reads sells more books.

And finally pause here to look at the limitations that impact how confidently you can make the conclusion that Elm Reads sells more books.

These limitations include explaining how results from the grouped frequency table are imprecise if given as an interval or estimates if using midpoint of those intervals.

Take a deep breath everyone.

Well done on all of this intense analysis in a lesson where we have seen the information including summary statistics can be found from data sets shown in different ways.

And that we can use one representation of a data set to complete a different representation of the same data set.

We have delved deep into comparing two different data sets given in different representations and how limitations to our comparisons can reduce the confidence of any conclusions.

You all deserve a well on break after that lesson.

But I hope to see you soon for another challenging maths lesson.

I'll be Mr. Grattan, so take care and goodbye.