Loading...
Hello and welcome.
Thank you so much for joining me, Mr. Gratton, for today's numerical summaries lesson, where we'll be looking at calculating the median, mode and range from a grouped frequency table.
Pause here to have a look at the definition of the word frequency.
An interval is all the values between two points.
These values can either be discreet or continuous.
And the interval can sometimes be represented using a pair of inequality symbols.
We'll be looking at the definitions of modal class and median class throughout the lesson.
First up, let's have a look at understanding group frequency tables and how we can find the modal class using one.
For small data sets, the mode is fairly easy to spot.
But for larger data sets, there are too many points for us to easily identify the most frequent one.
A frequency table is a much neater way of representing this data, where the modal value is the datum with the highest frequency.
The datum four appears 18 times more than any other datum.
But blimey, this data set is even more huge than the previous one, but Andeep suggestion seems quite sensible.
Let's also represent this data set as this frequency table.
Aaah, it's correct.
The frequency table is still so big that we can't even see all the rows.
And if we can't see all of the data, then we can't possibly find the mode itself.
This is an interesting suggestion from Andeep.
Rather than representing one data per row, how about grouping the values so that the table has fewer rows? Something a little bit like this.
It looks quite familiar to a frequency table, but a little bit more complicated with all of the inequality symbols.
This is called a grouped frequency table.
Rather than showing the frequencies of individual values, it shows the frequency of an interval or group or class of values.
This allows us to represent data sets with a massive variety of values and a frequency table without needing to have dozens upon dozens of rows.
X just represents the value of any data point within an interval.
This row shows the frequency of values that are greater than or equal to 40.
Notice how X is greater than or equal to this lower bound.
Whilst 50 is strictly larger than X, meaning X can never ever be 50 in this particular row of data, but it can be a number ever so slightly less than 50, such as 49 or 49.
9.
We can say that this row represents all values that are greater than or equal to 40, which are also less than 50.
The data in an interval can either be discreet or continuous, but if the data are only integers, this considers only the values 40, 41 all the way up to 49.
These 10 numbers appear a total of 39 times in the data set.
Some of those numbers may not appear at all, whilst others may appear more frequently than other ones.
From a grouped frequency table, we do not know the exact value of any data point in the data set or its frequency.
All we can interpret from that row of information is that a combination or group of these numbers, the numbers within this interval, appear a total of 39 times.
The modal class is the interval with the highest frequency.
This means that there are more data points in this interval than in any other interval shown on this group frequency table.
To find the modal class, you identify in the frequency column the highest frequency, in this case, 47 is a higher frequency than 34, 41, 25, et cetera.
It is then the interval with that highest frequency that is the modal class.
The interval with a frequency of 47 is the interval 50 to 60.
This is a second representation of a group frequency table for the same set of data.
However, it only shows the frequency of integer values.
It does not accurately represent continuous data, whilst the representation with the inequalities works for both discreet and continuous data.
Pause here to think about or explain why.
Okay, for this first check, pause here to identify the modal class of this data set.
Remember, a modal class is given as an interval, not a frequency, and definitely not as a midpoint of an interval.
Whilst verbally, saying an interval is 110 to 150 is absolutely fine, when written, it must be written in the exact same form as seen in the table in its inequality form in order to be crystal clear on whether it is the lower bound or the upper bound that is inclusive and exclusive to that interval.
Okay, for this next check, for this integer-only data set, which of these are the possible integers within the modal class of the data set? Pause here to consider all five options.
The modal class is 23 to 28, but the lower bound is excluded and the upper bound is included, so the possible integers do not include 23 but it does include 28.
And finally, here's a bit of problem solving.
The modal class of this data is 15 to 20.
Pause here to suggest possible values for what a and b could be.
A can be any frequency greater than 82, which is currently the highest frequency for the interval 25 to 30.
B is not the frequency of the modal class, so any frequency less than your written value of a is also fine.
So far, we've only looked at group frequency tables whose intervals are all the same size.
This does not need to be the case.
Intervals can be of any size and can vary within a data set.
We can make a table look much more efficient by showing consecutive lower frequency intervals combined together like so.
Here we have three intervals with a total of only five data points.
Does the grouped frequency table really need to be that long? No.
We can combine these three intervals into one large interval of 80 to 110.
The size of each interval is a decision made either during the planning stage of data collection before the data is collected, this can be due to a question being given in questionnaire form or data being collected as a tally, or when trying to represent raw data appropriately in a table.
If this is the case, there is much more informed flexibility over what the size of each interval can be.
It is important to note that the modal class is not a definitive thing.
Different groupings of the same data set may still result in different modal classes.
Here, the modal class is 20 to 30.
Whilst here, it's in a small interval of 30 to 35.
Whilst here, it's in the massive interval of 50 to 110.
Pause here to think about or discuss which of these groupings do you think best shows this data set? Knowing that the modal class may change depending on how you group the same data set, pause here to write down two different modal classes, one for each table that represents the same data set.
The two modal classes are zero to 40 and 40 to 60.
And pause here to think about or discuss how to explain how you know the modal class cannot be either of these two intervals even though both their frequencies are unknown.
From the leftmost table, we can see that the interval 80 to 120 has a frequency of 35.
So in the most extreme case, for the rightmost table, the interval 80 to 100 could have a frequency of 35 and the interval 100 to 120 could have a frequency of zero or vice versa.
A frequency of 35 is still not the highest frequency with the interval 40 to 60 having the highest frequency of 55, thus still being modal class no matter what the frequencies of those two other intervals are.
Great stuff so far.
For this practise task, pause here to attempt question one, to identify the modal class of each grouped frequency table.
For this question, complete the rightmost group frequency table using the information from the left one and identify the modal class for both completed tables.
For B, write down a list of possible integer data points for the boxed interval.
There are many possible solutions.
Pause here for question two.
Well done so far for your effort.
For question one, the first modal class is 22 to 26, then 30 to 42 and 165 to 180.
And for question two, the frequencies for the rightmost table are 15, 34, 57, and 61, and the modal classes are 50 to 60 and 60 to 80.
For part B, your answer must have exactly eight integers because the frequency is eight.
They can be any combination of integers from 10 to 19, but not the excluded number 20.
Any given integer can occur multiple times.
Okay, the next summary statistic that we'll be looking at is the range.
The range from a list of raw data is the largest data point subtract the smallest data point.
So in this case, 8 - 1 = 7.
The method to find the range from a frequency table is exactly the same.
We look towards the column with the datum in it.
The range is the largest data point subtract the smallest data point, so 22 - 15 = 7.
Note that the frequency of each data point does not matter for the range.
It does not factor into any calculation when considering a range of data.
So, finding the range from either one of these two representations is pretty straightforward since we know the exact values of all of the data points in the data set.
However, finding the range from a grouped frequency table is a lot less straightforward.
This is because we do not know the exact values of any data point in the data set.
Rather, we are only given the intervals.
For example, we do not know the exact value for this one single data point in this interval.
All we know is that it has a value between 100 and 110, not including 110 itself.
So it could be 100, 108, 109.
5, et cetera.
And knowing its value is very important since it is the single maximum data point which plays a part in the calculation of the range.
For a group frequency table, there are two ways of considering a range.
Method one is to find the maximum possible range.
We can do this by considering the upper bound of the highest value interval subtract the lower bound of the lowest value interval.
So for this data set, we have 110 subtract 20 equals a maximum possible range of 90.
This method guarantees to overestimate your range.
However, in doing so, it considers every single data point in your data set, meaning no important data are excluded, whilst method two is the average expected range, it is the midpoint of the highest value interval subtract the midpoint of the lowest value interval.
For this data set, we have 105 takeaway 25, both mid points, giving an average expected range of 80.
This method is more likely to give you a potentially closer estimate of the range when compared to the true range.
However, there is a risk that some data points are omitted from its calculation.
Which range is preferable when comparing method one to method two will really depend on the context that it is used in.
As with the modal class, the average expected range may vary depending on the size and distribution of the intervals in the data set.
This is because the midpoint of an interval depends on the size of that interval.
These two grouped frequency tables both show the same data set but have different average expected ranges due to the data set being grouped differently.
However, the maximum possible range of 110 takeaway 20 equals 90 is the same across both tables.
For this check, pause here to consider both types of estimated range for this data set.
The upper bound of the highest value interval is 800, whilst the lower bound of its smallest value interval is 200 for a maximum possible range of 600, whilst the mid points for the highest and smallest value intervals are 700 and 275, giving an average expected range of 425.
Okay, for this check, the maximum possible range for this data set is 46.
Pause here to consider what the value of X is.
If the maximum possible range is 46, then something subtract four is 46.
Therefore, X equals 50, as 50 subtract 4 is 46.
Brilliant.
Onto the practise task for the range.
Pause here to find both the maximum possible range and average expected range of these three data sets.
And for question two, these two tables show the same data.
Find both types of ranges from both tables and explain why one range changes whilst the other stays the same.
Great effort on applying your knowledge to this second summary statistic.
Pause here to compare your answers to question one with the ones on screen, and notice how even though two data sets both share a range of 50, the two data sets look wildly different.
This is also in part due to the fact that 50 is the maximum possible range of one data set whilst it is the average expected range of the other.
For question two, we have a consistent maximum possible range of 60, whilst the average expected range differs at 50 and 40 respectively.
The maximum possible range does not change because the absolute upper and lower bounds of 80 and 20 do not change.
However, because some intervals have changed in size, some midpoint also change, affecting the size of the average expected range.
Okay, the last summary statistic that we'll be looking at today is the median class.
Let's see what this one's about.
In order to find the median value from a frequency table, we need to consider a running total of frequencies.
It's usually helpful to draw on a running total column for any table that you need to find the median from.
So our first running total is just the frequency eight.
Our next running total are the frequencies 8 + 14, then 8 + 14 + 11, then 8, 14, 11, and 16, and finally, you get the idea, 8, 14, 11, 16, and 12 all added together.
We can then evaluate each of these sums to get running totals of 8, 22, 33, 49, and 61.
After finding all of the running totals, we take the maximum running total, which is also the total frequency of the data set and use this value to find the position of the median.
The position of the median for this data set is the total frequency of 61, then we add 1 and divide that sum by 2.
Therefore, the median is the 31st data point in an ordered list shown by this frequency table.
Okay, so we know that the median is the 31st data point, but what's the value of the 31st data point? To find the value of the median, we need to look at the first running total greater than the median position of 31.
That's this row currently containing the running total of 33.
This is because the 23rd to 33rd data points lie in this row.
The 31st data point is three, and so the median value is also three.
This method is identical to finding the median class from a grouped frequency table.
Like the modal class, which is an interval of data, the median class is the interval that contains the median data point, even if we do not know the exact value of the median data point.
Firstly, let's find the running total.
The position of the median is the 152 + 1, then divided by 2 equals 76.
5th data point.
The 46th to 87th data points have values in the interval 30 to 40 even if we do not know the exact values of these data points, therefore the median class is 30 to 40.
Okay, for this check, pause here to calculate the running totals A and B.
The running totals are 14 and 36.
And now pause here to find the position of the median data point using all of these running totals.
The position of the median data point is the 34th data point.
We look to the row containing the first running total greater than 34, which is this row.
The median class is the interval 80 to 120.
A more efficient method of finding the running totals is to consider the running total for the previous interval plus the new frequency.
I'll demonstrate a few steps for the table on the left and then let you try those exact same steps for the table on the right.
The first running total is always just the first frequency, so for my data set, that is two.
The second running total is the previous running total of 2 plus the current frequency of 5, giving 7.
We do this again, 7 + 11 = 18, then 18 + 12 = 30, and finally, 30 + 7 = 37.
Pause now to give this a try for the right hand table.
Okay.
Your running totals are 18, 30, 40, 47 and 49.
Remember that final running total 49 is also the total frequency of your data set.
Okay, after finding all of the running totals, let's put our attention to the maximum running total.
The median position is this maximum running total plus one, then divided by two.
So for my data set, the median position is the 19th data point.
Pause here to try this yourself for your data set.
Okay, your median is the 25th data point.
After finding the median position, we take the median position and find the first running total greater than this position.
For my data set, 30 is the first running total greater than the 19th data point, and so the median is in this row.
Therefore, the median class for my data set is 150 to 200.
Pause now to find the median class of your data set.
The median class for your data set is the class 50 to 100.
Okay, let's use this knowledge for this practise task.
Pause here to find the median class for these two data sets.
And note, the sentence saying, the first running total greater is.
Means what is the first running total value that is greater than the median position that you calculated.
And for question two, let's bring together everything from this lesson.
Find the median class, modal class, and the two types of range for this data set.
And as a bonus, question three, it is possible to estimate the median from a group frequency table with a single value rather than a class interval.
Can you suggest a suitable estimate for the median and give an explanation for why you think this number is a suitable estimate? Pause now for these final two questions.
Okay, onto the answers.
Pause here to check all calculations and answers for question one.
For question two, the modal class is 3,000 to 4,000, whilst the maximum possible range is 9,000.
In comparison, the average expected range is a lot lower at 6,750.
The median class is 2,000 to 3,000, but let's have a look at an estimate for the exact value of the median for this bonus question three.
If we assume that all values in the median class of 2,000 to 3,000 are evenly distributed, and this is an assumption, then the median at the 45th position is going to be much, much closer to the upper bound of 3,000, which is at the 47th position using our running totals than the lower bound of 2,000 at the 27th position.
If it's that close to 3,000, then any estimate between roughly 2,750 and 3,000 itself seems pretty reasonable.
However, this is only an estimate.
Great work, and thank you all so much for all the effort that you've put into this lesson.
Well, we've looked at grouped frequency tables as a way of representing massive data sets containing a wide variety of values.
These groups are also called intervals or classes.
We've seen that the modal class is the interval with the highest frequency.
We've also seen that there are two types of estimate for the range.
These are the maximum possible range and the average expected range.
And finally, the median class considers the running total and identifying which interval the median value lies in.
Once again, I appreciate you being here for this lesson.
Thank you for joining me and until our next maths lesson together.
Take care and goodbye.