Lesson video

In progress...

Hi, my name's Mr. Davison.

I'm so excited to be learning with you today.

Today's lesson is called Text File Size Calculation from the unit Representation of Text, Images and Sound.

By the end of today's lesson, you're going to be able to calculate and compare file sizes of text data when different encoding methods are used.

Today, our keywords are byte, which are a group of eight bits, number prefix, which is a word used in front of a number to represent a multiple of that number, encoding, which is putting the sequence of characters into an agreed format, and file size, a measure of how much binary data a file contains.

Today, we've got two learning cycles, so let's get on with the first, compare quantities of data.

When we're talking about computer data, we know that they're going to be stored as bits, and that bits are single binary digits with the value of either zero or one.

Now, bits are used to represent all data in computers, and data represented in binary can result in large sequences of bits, like the one we've got on screen.

Because computers deal with large numbers of bits, we find it easier to group them into collections of bits, almost smaller sections to make them easier to read and easier to compare.

These groups are the units used to measure data storage.

The most common grouping is eight bits, and we refer to that as a byte.

A sequence of 32 bits could therefore be counted as four bytes.

If we count in groups of eight along our 32 bits, we can see that we've got four bytes in total.

So let's just check that you've understood that.

Imagine I've got a binary sequence that is 16 bits long.

How many bytes is this? Well done, it's two bytes.

16 bits divided by eight gives us two bytes.

Unfortunately, sequences of data can become too large to express meaningfully, even if we use bytes.

And let's think of this as an example.

If Jacob says he's got a file for his homework stored on his computer, and it's 3,290,000 points long, he's considering is that a lot of data? And what Jacob might do is ask someone else to compare.

He asks Sam, and Sam replies that he has a photo on his phone, and that photo is reporting it's 4.

21 megabytes.

So the units are different and the comparison is not immediately straightforward.

Sam isn't sure whether Jacob's file is larger, or if his photo is larger.

Number prefixes help us express multiples of a number, and this makes it easier to understand and compare sizes.

The number prefixes we use a lot in computing are kilo, mega, giga, tera, and peta.

And you can see there, they're multiples of a thousand.

So kilo is a thousand of something, mega is a million, and so on.

In computing, bytes are summarised in multiples of a thousand using those same number prefixes.

That would make one kilobyte a thousand bytes, one megabyte a million bytes, one gigabyte a billion bytes, one terabyte is a trillion bytes, and one petabyte, which is one quadrillion bytes.

We can see by using a number prefix, we can hide the fact that a number has so many zeros.

Now before we're able to compare, we've got to remember what those number prefixes are in order.

So the following quantities of bytes are in ascending size order, which means it goes from the smallest to the largest.

Fill in the gaps of the ones that are missing.

Our correct answers are megabytes.

And the second answer is terabyte.

So in ascending order, the order of the number prefixes when we're applying it to bytes are kilobyte, megabyte, gigabyte, terabyte, petabyte.

Let's try and apply that to what Jacob and Sam were discussing before.

So with these two measurements, we've got to get them to the same number values.

So those prefixes are the same, which makes comparison a lot easier.

Converting to megabytes firstly would make it easier for Jacob to compare his data with Sam's.

He can convert his 3,290,000 bytes into megabytes.

To do that, we need to remember that one megabyte is one million bytes.

So that's six zeros on the end.

If we take that value, divide it by one million, we can convert 3,290,000 bytes into an equivalent of 3.

29 megabytes.

Those numbers aren't different.

They're just expressed in a different way to make comparison easier.

And we can see that's 3.

29 megabytes, which is almost as large as a 4.

21 megabyte photo.

We could have done this a different way as well.

Jacob could have converted to kilobytes first and then to megabytes, making the maths that little bit easier.

We know that one kilobyte is a thousand bytes.

So if we take Jacob's homework file, 3,290,000 and divided that by a thousand, we'd be able to convert it firstly to 3,290 kilobytes.

We could have then divided by a thousand again to get to our 3.

29 megabytes.

More steps, but we're dealing with easier calculations each time.

We could have done it in a third way entirely differently.

This time Sam could also have converted his megabyte value of the photo into bytes to compare the sizes.

As long as both people are working in the same units, our comparison will be straightforward.

So again, we know that one megabyte is one million bytes.

Sam's photo size is 4.

21 megabytes.

So if you multiply that by one million, we know that Sam's file is 4,210,000 bytes.

So we would have our two values for comparison.

However, larger numbers are more difficult to work with.

It's not as easy to compare two sets of large numbers.

Right, you're going to practise some of that now.

For the first task, I want you to use either the less than, the greater than or the equal sign to make each of these four statements correct.

Then I want you to complete the table to compare the total amount of bytes and the different number prefixes.

Lastly, I want you to help out Jacob and Sam.

Jacob is saying he wants to download a playlist onto his phone.

And it's saying that the file size of that playlist is 48,800,000,000 bits.

Sam is helping him out and is saying that the phone says it has five gigabytes of free storage.

So Sam thinks there should be enough storage for Jacob's playlist.

I want you to do some calculations and check to see if Sam is correct.

Explain your answer by comparing the quantity of data of the playlist with the amount of data storage that is available.

Pause the video and have a go now.

Well done, there was a lot of calculation to do there.

So in our first task, we had to use either the less than, the greater than or the equal sign to make each of the statements correct.

Firstly, one byte is less than 10 bits, because we know that one byte is actually eight bits and eight bits is less than 10.

2 bytes, which is 2 groups of 8, and 16 bits means that the second statement needs an equal sign.

For the third statement, 3 bytes would be 24 bits.

3 times 8.

So we know that that's greater than 20 bits.

And lastly, 8 bytes is 64 bits.

And again, that needs a greater sign because that's greater than 63.

For our table, on the first row, we know that 1 million bytes is 1 megabyte, so we put a 1 in the first row.

We know for the second row that 2 billion bytes times 8 is 16 billion bits.

On the third row, we could take our megabyte value and multiply it by a million to give us 40 billion bytes.

And on the last row, 12 million bits divided by 8 gives us 1,500,000 bytes.

That value is the same as 1.

5 megabytes, which is also then the equivalent of 0.

0015 gigabytes.

And lastly, we had to convert Jacob's playlist size, which was 48 billion, 800 million bits.

We had to convert that to gigabytes.

That would be the easier way to compare it to the storage, which was 5 gigabytes.

Converting that playlist size into gigabytes, we'd have to divide that value by 8.

And then to convert that value into gigabytes, we would have to divide it by 1 billion, giving us a result of 6.

1 gigabytes.

Therefore, we can see that Jacob's playlist size is larger than the available storage on the phone.

He won't be able to download it.

Let's try and apply some of this on the second learning cycle where we're going to calculate and compare text file sizes.

If we're going to be able to represent text on a computer, we're going to need either to use ASCII or Unicode, because we know those standards can be used to encode text characters.

ASCII uses 8 bits to represent each character, whereas Unicode uses either 16 or 32 bits, depending on the version being used.

Unicode is able to represent a wider range of characters than ASCII, but this has the effect of increasing the storage required for identical characters.

Let's consider this with an example.

Consider I've got some text.

ASCII and Unicode encode text.

If we count up the number of characters, we know that 30 characters are used to represent that sentence, including the spaces and the full stop.

If we use ASCII to encode the message, each character is going to require 8 bits for its representation.

Therefore, the total file size to store this text would be 8 times 30, which is 240 bits, or if we divide that by 8, 30 bytes.

Let's just think about that for a moment as well.

Now, Ayesha has spotted that if 1 byte is 8 bits, and if ASCII uses 8 bits per character, then surely every character is the equivalent of 1 byte of file storage.

We don't even have to do any multiplication.

If we count the number of characters, in this case 30, we know that each character is going to be 1 byte, so that sentence and all of those characters there are also going to take 30 bytes of storage.

So let's check you've understood that.

In bytes, what is the file size of an ASCII text file with the text Snap! in it? Well done! It's 5.

There are 5 characters, including the exclamation mark.

Each character is 1 byte, therefore 5 characters gives us 5 bytes.

So let's look at that same sentence again.

If 32-bit Unicode encoding is used, each character requires 32 bits for its representation.

Therefore, the total file size to store this text would be 32 multiplied by 30 to give us 960 bits, and if we divide that by 8, that's 120 bytes.

Ayesha spotted that 32-bit Unicode requires 4 bytes per character.

She says she can count just the characters and multiply by 4 to find the file size.

There's no need to work out the number of bits, so 4 times 30 characters gives us 120 bytes for 32-bit Unicode representation.

Let's think about this same example for the encoding, but this time with 16-bit Unicode.

In bytes, can you tell me what is the file size of a 16-bit Unicode text file with the text Snap! in it? Well done! It's 10.

There are 5 characters, including the exclamation mark, and if each character is using 60 bits, that means it's using 2 bytes.

So if we take the 5 characters, multiply it by 2, that gives us 10 bytes.

ASCII encoding results in smaller file sizes for text files.

Unicode encoding, in contrast, results in larger file sizes for text files.

ASCII encoding has a smaller character set, so can only represent a limited number of characters, whereas Unicode encoding has a larger character set, so is able to represent a wider range of characters.

Let's explore this in more detail.

I'm going to give you a table, and I want you to complete that table to show how the file sizes of text files change when different encoding methods are used.

Once you've done that, I'm going to give you a scenario where I want you to compare different ways of encoding data.

Now imagine you've got a low-cost smartwatch which is being developed.

The developers want to include an English dictionary on it.

First thing I want you to consider is the dictionary has 126 million characters in the file itself.

I want you to calculate for me the file size of the dictionary in megabytes if we use ASCII encoding.

Then, to compare this against other methods, for the second part of this, I want you to also work out the file size of the dictionary in megabytes if a 16-bit Unicode encoding scheme is used.

By comparing those two encoding methods, you should then be able to judge what the effect is of using those.

Now, for the third part, the amount of storage the smartwatch has affects the final cost.

I want to explain why encoding the dictionary using ASCII is the best choice in this particular scenario.

Once you've done that, I then want you to consider what would happen if we need to add to that dictionary.

The creators of the smartwatch would like to include an appendix to the dictionary providing definitions for different emojis.

Explain why ASCII would no longer be suitable.

And for the last question, the developers decide to use 32-bit Unicode.

How much additional storage is needed compared to ASCII? Describe what this will mean for the cost of the smartwatch.

Pause the video and have a go at all of those tasks.

For the first task, there was a lot of working out.

But those values that you can see on screen are the correct answers.

And for the second part of task B, remember that our low-cost smartwatch was being developed and it was going to have an English dictionary available on it.

If we used ASCII encoding, we know that one character is the equivalent of one byte of storage.

Therefore, 126 million characters would be 126 million bytes.

To convert that to megabytes, we divide by a million so we know that that dictionary file encoded in ASCII is 126 megabytes.

But if instead we used 16-bit Unicode encoding, we remember that one character is two bytes.

Therefore, 126 million characters would be 252 million bytes.

Again, dividing by a million, we can convert those number of bytes into 252 megabytes.

For part C, we're considering the cost of the smartwatch.

If we use ASCII encoding, because ASCII uses fewer bits per character than Unicode, we're going to need less storage on the smartwatch.

In this case, it's the cheaper option because we won't need as much.

So, whilst for storage, it's better to use ASCII because we need less of it, it doesn't give us as many options in the characters that we can represent in Unicode, which is important for part D.

The creators of the smartwatch want to add in emojis to the dictionary.

ASCII, as we know, is limited to 256 characters.

Emojis are not included in this, so Unicode would provide more characters and would be the better choice because we'd be able to represent emojis.

And lastly, we need to compare against 32-bit Unicode.

And how much additional storage would be needed compared to ASCII? Well, ASCII uses one byte per character, whereas 32-bit Unicode uses four bytes per character.

ASCII requires 126 megabytes, whereas 32-bit Unicode requires 504 megabytes.

Unicode, therefore, requires four times the amount of storage space, and this will increase the cost of the smartwatch.

Well done.

We covered a lot today and did lots of calculations.

Let's just see what we've learned.

Remember, a byte is equivalent to eight bits.

Number prefixes are used to summarise multiples of a number for easier comparison.

Different text encoding schemes use different amounts of bits to represent each character.

And file size measures the amount of binary data and is summarised using bytes and number prefixes.

I've finished the video