video

Lesson video

In progress...

Loading...

Hello, my name is Dr.

Rowlandson, and I'm thrilled that you're joining me in today's lesson.

Let's get started.

Welcome to today's lesson from the unit of comparisons of numerical summaries of data.

This lesson is called, Statistical summaries using technology.

And by the end of today's lesson we'll be able to use technology, such as spreadsheets, to effectively summarise and represent data.

Here is a previous keyword that we'll use again during today's lesson.

You may want to pause the video if you want to remind yourself what this word means, and then press Play when you're ready to continue.

This lesson has two learning cycles.

In the first learning cycle, we're going to be using spreadsheets to analyse data from a list.

And then in the second learning cycle we'll be using spreadsheets to analyse frequency tables.

Let's start off with using spreadsheets to analyse data in a list.

Spreadsheet software can be used to calculate summary statistics and plot graphs both quickly and accurately.

Examples of spreadsheet software include Google Sheets, Microsoft Excel, and Apple Numbers, but there are others available as well.

The demonstrations included in this lesson are done in Google Sheets, and there'll be some links to some data files for you to use, which will also be in Google Sheets.

However, you can download those files and open them in whichever spreadsheet software you would like to use.

Now, while the demonstrations in this lesson do use Google sheets, other software usually have the same functions, however, the buttons might be in slightly different places.

Here's a reminder of some of the basic features of a spreadsheet.

You have a grid which can be looked at as columns, which are all labelled with letters, or as rows, which are labelled with numbers, and within the grid there are cells, and each cell can be referenced with its cell name.

That is the combination of its letter and its number, for example, the cell you can see highlighted here is D3.

And these cells are normally filled with data, formulas, or any other information that you input into your spreadsheet.

When you want to perform a calculation in a spreadsheet, you click on a cell and you type your calculation into where it says Insert Function.

However, when performing any mathematical calculation in a spreadsheet, you must always type the equals symbol into the cell first.

And there's also a button on here for inserting charts and graphs too.

A spreadsheet is like a giant calculator, and just like your pocket calculator, you'll tell your spreadsheet what calculations are due by typing in numbers and symbols using your keyboard.

On your keyboard, there are buttons for each of the four basic operations.

Now two of these symbols are the same as the ones on your calculator, that is addition and subtraction.

These two symbols can be found on keys on a computer keyboard.

However, if you look at a computer keyboard, you will not find a multiplication symbol on there or a division symbol.

You will find a letter x, but the letter x is not a multiplication symbol.

So, what do we use for those? Well, for multiplication, we use an asterisk, and for division we use a forward slash.

Please note that you cannot multiply by using the letter x on your keyboard.

If you type that in, the spreadsheet will interpret that as text or as a reference to one of the cells on your spreadsheet not as a multiplication.

Now, a spreadsheet can do more than just add, subtract, multiply, and divide.

With a spreadsheet, you can also input commands to perform more complex calculations such as averages, or finding a sum of large sets of numbers, and we'll see plenty of examples of that during today's lesson.

Throughout this lesson we'll be using a spreadsheet of data, which is currently saved as a Google Sheets file.

You can access this Google Sheets file by clicking on the link you can see on the slide deck here.

This link also appears several times throughout the slide deck.

All the links take you to the same file.

Once you click on the link, you won't be able to edit anything on that spreadsheet until you either download it or make you a copy of it yourself.

You can do that by clicking on File, and then if you have a Google account, you can click on Make a copy, and you'll copy the Google sheet to your Google account.

If not, you can click on Download, and open it in whichever spreadsheet software you'd like to use, such as Microsoft Excel.

Once you've done that, click on the tab that says Rainfall in June, and this will be the data that we'll use in the first half of today's lesson.

What you should see is a page that contains data taken from the Met Office about the amount of rainfall each June, and rainfall is measured in millimetres.

The data is for four separate locations, which you can see in the columns, and it is for years from 1941 to 2022, which you can see in the rows.

And what we'll do with this data is calculate some summary statistics.

Let's start with the mean.

To calculate the mean rainfall for June in Durham, first scroll to the bottom of Column B, and click on the empty cell next to where it says mean.

And then we're going to command the spreadsheet to calculate the mean.

First, we'll type in an equals sign so that the spreadsheet knows that what we're about to type in is not just text but is a function.

And then for the mean, we type the word =average open brackets, and then highlight all the rainfall data from that column.

Then close brackets and press Enter, and we can see that the mean is 52.

95853 under lots of more digits.

We could choose to round that ourselves to 53, but there are ways to round numbers using the spreadsheet itself.

Feel free to look for it later.

We can also calculate the median.

To calculate the median, repeat the same process as before, but enter the command: =median open bracket.

Then highlight the data, close your bracket and press Enter, and we can see that the median is 47.

1.

We can also calculate the mode.

To do that, repeat the same process as before, but this time enter the command: =mode and open bracket.

And then you'll highlight the data, and then close your bracket and press Enter, and you'll get a mode of 23 in this case.

Now you may notice that that average is quite different to the other two averages.

For this dataset, the mode may be less useful than the other averages, because its data can take any value within a reasonable range depending on the degree of accuracy for which it was measured.

The majority of the numbers in this column will have a frequency of one, because they are measured to one decimal place, and that gives a lot more possibilities for what they could be.

Now, let's look at how to calculate the range on a spreadsheet.

There's not a specific command for calculating the range in a spreadsheet, but it can be found by calculating the difference between the maximum and the minimum values, and that's what we're going to do.

So, to do that, first, type in the command: =max, which is short for maximum, and open your bracket.

Highlight the data in the column and close your bracket, but before you press the Enter button, we're gonna now subtract the minimum value.

So, type in subtract, and then min, which is short for minimum.

Open the brackets, highlight the data again, close your brackets, and now you can press Enter.

That has calculated the difference between the highest and the lowest.

It's done, the maximum value, subtract the minimum value.

Alternatively, this whole process could be done together by typing in what you can see here in one go into the formula bar.

Either way you get the same answer, which is 186.

1.

So let's check what we've learned now.

Open up the same spreadsheet.

The link is on this slide if you need it.

And then click on the tab for rainfall data, and then look at the second column of rainfall data, which is for Tiree.

Scroll to the bottom of that data, and calculate the mean rainfall for June and Tiree, which you can do by using the command you can see on the screen.

Pause the video while you do that, and press Play when you're ready to see the answer.

Once you've done that, you should get an answer of 66.

6 if you round it to one decimal place.

On the same spreadsheets, use the commands you can see on the screen to calculate the median rainfall in Tiree.

Pause the video while you do that, and press Play when you're ready to see the answer.

The answer is 66.

5 once it's rounded to one decimal place.

And again, on the same spreadsheets, use the command you can see on the screen to find the mode.

Pause the video while you do that, and press Play when you're ready to see the answer.

The answer is 69.

Again, on the same spreadsheet, use the command you can see on the screen to calculate the range.

Pause the video while you do that, and press Play when you're ready to see the answer.

Once you've done that, you should get an answer of 115.

Finally, rather than doing all those calculations again for the last two columns of data, we'll do something a little bit different here.

Highlight the mean, median, mode and range, which you've just calculated.

Then in the bottom right-hand corner of that highlighted area, you should see a spot.

Click on the spot in the bottom right corner of the highlighted region and drag it to the right to calculate the same summary statistics for Aberporth and Amargh.

Pause the video while you do that, and press Play when you're ready to see what the answers are.

Once you've done that, you should get the answers you can see on the screen here.

Pause the video while you check them, and press Play when you're ready to continue.

Spreadsheet software can also be used to create visual representations of data such as graphs, and this is so much quicker and more accurate than doing it by hand.

For example, to plot a time series graph of rainfall for June and Durham, we'll do the following.

First, highlight the columns containing the years, which will go on the horizontal axis, and the rainfall, which will be the vertical axis.

And then click on Insert and then click Chart, or you can click on the button, which has the same symbol on the toolbar.

Once you do that, Google Sheets will try and guess which type of graph will be most suitable for your data, and in this case, it's provided a time series graph.

This graph can be altered and customised using the Chart editor toolbar.

It should appear once you insert the graph, but to get it up at any point, double-click on the graph, and you should see this on the right-hand side of the page.

And under this toolbar, you can click on Customise to change the appearance of the graph in many different ways.

You can change the colours, the fonts, the size of different things.

You can change the scales, the labels, the headings, and so on.

Now, in that example, we just created a time series graph for Durham's data, but multiple data sets can be represented on the same time series graph.

For example, to plot a time series graph of rainfall for June in Durham and Tiree on the same graph, we can do it by highlighting all three columns.

One containing the years, one containing the rainfall in Durham, and the other containing the rainfall in Tiree.

And then, do the same thing again, click on Insert and click on Chart, and it plots a graph that looks a bit like this.

We can see a blue line, which is for Durham, and a red line, which is for Tiree, but unfortunately in this case, Google Sheets hasn't quite worked out how to label the two lines with the key.

I could change that though on the customised settings in the Chart editor toolbar.

What happens if you don't want to use the graph that Google Sheets has plotted for you? Well, the Chart editor toolbar can be used to change the type of graph that is used to represent the data.

So once you've plotted the graph, if you go on the Chart editor toolbar, and under Setup, you can see at the top it says Chart type.

Currently it shows a line graph, but we could change that to a column chart, or a pie chart, or many other different types of graphs.

However, do be cautious when choosing what type of graph to use.

The usefulness of each type of graph may depend on the type of data that is being represented, or the context of that data.

In this case, a pie chart is not appropriate for the data that we're trying to show here.

Let's check what we've learned now.

Open up the spreadsheet again, and the link is available on this slide if you need it.

On this spreadsheet, create a time series graph to represent the rainfall for June in just Tiree.

Pause a video while you do that, and press Play when you're ready to see an answer.

Hopefully, you've got something that looks a bit like this.

Maybe you've customised it to make it a bit clearer, but it should still have the same shape.

Now on the same spreadsheet, create a time series graph to represent the rainfall in Aberporth and in Amargh on the same graph.

Pause the video while you do this, and press Play when you're ready to see an answer.

So you've got something that looks a bit like this.

Once again, you may have customised it and also entered something for the key, but you should still have the same shape representing the same data.

Okay, it's over to you now for Task A.

This task contains two questions, and they both use the same set of data.

Once again, we're going to use the same spreadsheet that we've been using during today's lesson, and the link is available here if you need it.

On that spreadsheet, at the bottom of the page, click on the tab for Bus journeys, and it should open up a dataset that has the years and it has the passenger journeys on local bus services for different regions around the UK.

The numbers that you can see are in the millions.

And what you're going to do with this data is first for Part A, use the spreadsheet software to calculate some summary statistics.

And then, for Parts B and C, use your summary statistics to make some interpretations.

Pause the video while you do this, and press Play when you're ready for Question Two.

And here is Question Two.

Again, you'll be using the same spreadsheet and the bus journey data, but this time it's about plotting graphs.

In Parts A, B, and C, you need to plot some time series graphs, and then use them for Parts D and E to make some interpretations.

Pause the video while you do this, and press Play when you're ready to go through some answers.

Okay, let's take a look at some answers.

So, you need to calculate the mean, median and range for the number of bus journeys in each region in the UK.

Well, here are six of the regions.

Pause the video while you check these, and then press Play when you're ready to see the rest.

And here are the rest of them.

Pause the video while you check these, and then press Play when you're ready to continue going through answers.

In Part B, it said which region tends to have the highest number of bus passengers? And justify your answer.

The answer is London, and you can justify it with the higher of the averages that you found.

For example, you can say it has the highest mean, which was 2,074.

3 million, or you can say it has the highest median, which was 2,226.

7 million.

Part C, which region tends to have the lowest number of bus passengers? And justify your answer.

This is Wales.

It has the lowest mean, which is 102.

9 million, or you can say it has the lowest median, which was 108 million.

And then, Question Two, you had to plot a time series graph to show the number of passenger journeys in London.

It should look something a bit like this.

And then, in Part B, plot a time series graph to show all the regions in the dataset.

It should look something a bit like this.

And then, in Part C, you have to plot a time series graph to show all the regions outside of London that should look a bit like this.

What you might have noticed in Part B was that the data for London squashed all of the time series together, but now we've excluded London from this graph, we can see the time series data for each region much more clearly.

So then for Part D, it said which region has the lowest number of passenger journeys every year? That would be Wales, it's that line you can see at the very bottom.

And then for E, which two regions outside of London had a steady increase in the number of passenger journeys from 2005 to 2015? That was the South East and the South West.

Those two are both increasing during those years, while all the others are either decreasing or staying pretty much the same.

Great job there.

Now let's use the spreadsheet to analyse frequency tables.

During this part of the lesson, we'll be using the same spreadsheet that we have done throughout.

You can access this spreadsheet by clicking on the link you can see on this slide, or you can load up the previous saved version of what you made earlier.

This time, at the bottom of the page, click on the tab that says Commuting distances, and you'll bring up a page filled with tables.

Each table shows data from the ONS, that's the Office of National Statistics, about the distances that people travel to work in each region of England and Wales.

When you calculate an estimate of the mean from a grouped frequency table, it usually requires multiple steps.

Well, we can estimate the mean from this data using the same steps, but the calculations which are required to estimate the mean from grouped frequency data can be performed by the spreadsheets rather than by yourself each time.

And this can be particularly helpful when dealing with large numbers or a lot of different groups.

For example, one of the calculations you would need to perform to calculate an estimate of the mean, is find the total frequency.

To do this, we can click on an empty cell, and enter the command: =sum and open brackets, and highlight the frequencies.

Then close your brackets and press Enter, and it gives the sum of all the frequencies, which we can see in this case, is 976,945.

So, what else do you need to do to calculate an estimate of a mean from a grouped frequency table? Well, another thing you need to do is estimate the total distances for each group.

That involves a few steps.

One, is to type the midpoints for each row, which you can do in the next column.

And then what we normally do is we multiply the frequencies by the midpoints.

Now, we don't need to do that and type it in.

We can ask the spreadsheet to do it for us, and we only need to really ask the spreadsheet to do it once, because then we can copy what we've done to the other rows.

For example, we could take the first row, click on the next empty cell in that row for 0 to 2 kilometres, and then to enter the multiplication, type the equals sign, and then click on the two cells that you want to multiply with an asterisk in between.

This gives the value of 202,172.

Now we don't have to do that for each row, because what we could do is drag the spot that's in the bottom of that cell all the way down to the bottom row, and it'll copy the same calculation out again for each row.

And now we nearly have everything we need to estimate the mean, but we need one more thing.

We need to find the sum of the estimated total distances, and you can do that by clicking on a cell, pressing equals, type in sum, and highlighting the estimated total distances, and then press and Enter.

And now we have just one last step.

We need to divide the estimated total distances by the total frequency.

So, click on an empty cell where you want to disPlay the mean.

Press the equals sign, and then click on the sum of the estimated total distances.

Type a slash for divide, and then click on the total frequency.

So now we've told the spreadsheet to divide one cell by the other, and then press Enter.

And this gives an answer of 11,439 and so on.

Even though you've got the spreadsheet to do a lot of the work for you here, it's still worth double-checking that that mean makes sense with the data you've got.

So let's check what we've learned.

On the same spreadsheet, which you can open using the link on the bottom of this slide, look at the table for the North West, and then find the total frequency.

Pause the video while you do that, and press Play when you're ready to see an answer.

The answer is 2,701,777.

Now, on the next column, type in the midpoints for each group.

Pause the video while you do that, and press Play when you're ready to see what those are.

The midpoints are: 1, 3.

5, 7.

5, 15, 25, 35, 50, and 80.

Then, in the next column, calculate the estimated total distance for each group.

Pause the video while you do that, and press Play when you're ready to see the answers.

Here they are.

Rather than reading all the numbers out, the top row is 585,311, while the bottom row is 6,590,640.

If you've got those two correct, it's likely that you got the rest correct as well.

Now, calculate an estimate for the mean distance travelled to work in the North West.

Pause the video while you do this, and press Play when you're ready to see an answer.

The answer is 11.

2 kilometres.

That's when it's rounded to one decimal place.

Spreadsheet software can also be used to create visual representations of data.

For example, here we have the data for the North East again.

To plot a pie chart for the distances travelled to work for the North East, we can do that by highlighting the columns containing the distances and the frequencies.

Clicking Insert, and then click on Chart, and Google Sheets has interpreted this data for us and plotted it as a pie chart.

This can be altered and customised using the Chart editor toolbar, and it can be changed into other types of graphs using the Chart editor toolbar as well, such as a bar chart in this case.

But once again, some charts are more appropriate than others, depending on what data you're using.

So, let's check what we've learned there.

In the same spreadsheet, look at the table for the North West, and create a pie chart for this data.

Pause the video while you do that, and press Play when you're ready to see an answer.

Here is what an answer would look like.

Once again, you may have customised it by changing the colours or its layout, or changing the way that it's labelled, but hopefully you should still have a pie chart with the same size sectors.

Okay, it's over to you now for Task B.

This task contains two questions which will both use the same spreadsheets.

And if you need to access it again, you can click on the link you can see on this slide here, and then click on the Commuting distances tab.

In Question One, Part A, you need to calculate an estimate for the mean distance travelled to work in the Yorkshire and Humber region.

And then, Part B, do the same again for the East Midlands region, and then use your answers from Parts A and B to answer Question C.

Pause the video while you do that, and press Play when you're ready for Question Two.

And here is Question Two.

Once again, we'll use the same spreadsheet which you can access from clicking on the link on this slide.

And we're using the Commuting distances again.

This question is all about you having a bit of a Play with the data and with the spreadsheet to see what you can produce.

So, choose a region from the page, it could be any region you like.

If you live in one of those regions, you could choose a region you live in, or you can choose multiple different regions, and represent its data using different visual representations.

Try one type of graph first, and then switch it to a different type of graph, and a different type of graph, and see what you think of each one.

Consider which types of graphs or charts represent this data more clearly than others.

And then once you've done that, choose a type of graph which represents the data most clearly to you, and then customise its appearance.

You really can't get this wrong.

It's all about you exploring yourself, how to use the spreadsheet software.

Pause the video while you do that, and then press Play when you're ready to look through these questions together.

Let's see how we got on with that then.

Here's question one.

You got to first calculate an estimate for the mean distance travelled to work in the Yorkshire and Humber region.

That is 11.

8 kilometres, once it's rounded to one decimal place.

And then do the same for the East Midlands region.

That is 13.

3 kilometres, rounded to one decimal place.

Then Part C, in which of the two regions do people tend to travel the furthest to work? Well, that would be the East Midlands.

Then, Question Two: You have to choose a region and present its data using different visual representations, and then consider which type of graphs or charts represent the data more clearly than others.

And then finally, choose the graph type, which represents the data most clearly to you and customise its appearance.

Here's an example.

This is a pie chart that represents the data for the Yorkshire and Humber region.

A pie chart does show this data pretty clearly.

Fantastic work today.

Now, let's summarise what we've learned during this lesson.

Technology can be used to calculate numerical summary statistics.

Technology, such as spreadsheet software, like we've used today.

But other type of software also do similar things, and some data software can do even more.

But what you might find when you're working with technology, is that calculating summary statistics can be much quicker with something like a spreadsheet than it can be when you're doing calculations by hand, especially when there's a lot of data.

Technology can also be used to create visual representations of data as well, such as time series graphs and pie charts.

And, once again, the technology can usually plot it quicker, more accurately, and more clearly than we can do normally by hand.

However, it is important to select appropriate representations and summaries based on the context and the type of data you have.

Thank you very much.

Have a great day.