Lesson video

In progress...

Hi, my name's Mr. Davidson, and I'm so excited to be learning with you today.

Today's lesson is called Text File Size Calculation from the unit, Representation of Text, Images, and Sound.

By the end of today's lesson, you are going to be able to calculate and compare file sizes of text data when different encoding methods are used.

Today, our keywords are byte, which are a group of eight bits, number prefix, which is a word used in front of a number to represent a multiple of that number, encoding, which is putting the sequence of characters into an agreed format, and file size, a measure of how much binary data a file contains.

Today, we've got two learning cycles, so let's get on with the first, compare quantities of data.

When we're talking about computer data, we know that they're going to be stored as bits, and that bits are single binary digits with the value of either zero or one.

Now, bits are used to represent all data in computers, and data represented in binary can result in large sequences of bits, like the one we've got on screen.

Because computers deal with large numbers of bits, we find it easier to group them into collections of bits, almost smaller sections to make them easier to read and easier to compare.

These groups are the units used to measure data storage.

The most common grouping is eight bits, and we are referred to that as a byte.

A sequence of 32 bits could therefore be counted as four bytes.

If we count in groups of 8 along our 32 bits, we can see that we've got four bytes in total.

So let's just check that you've understood that.

Imagine I've got a binary sequence that is 16 bits long.

How many bytes is this?

Well done.

It's two bytes.

16 bits divided by 8 gives us 2 bytes.

Fortunately, sequences of data can become too large to express meaningfully even if we use bytes.

And let's think of this as an example.

If Jacob says he's got a file for his homework stored on his computer, and it's 3,290,000 bytes long, he's considering is that a lot of data?

And what Jacob might do is ask someone else to compare.

He asks Sam, and Sam replies that he has a photo on his phone, and that photo is reporting its 4.

21 megabytes.

So the units are different and the comparison is not immediately straightforward.

Sam isn't sure whether Jacob's file is larger, or if his photo is larger.

Number prefixes help us express multiples of a number, and this makes it easier to understand and compare sizes.

The number of prefixes we use a lot in computing are kilo, mega, giga, tera, and peta.

And you can see there, there are multiples of a thousand.

So kilo is a thousand of something, mega is a million, and so on.

In computing, bytes are summarised in multiples of a thousand using those same number prefixes.

That would make one kilobyte a thousand bytes, one megabyte a million bytes, one gigabyte a billion bytes, one terabyte is a trillion bytes, and one petabyte, which is one quadrillion bytes.

We can see, by using a number prefix, we can hide the fact that a number has so many zeroes.

Now, before we're able to compare, we've got to remember what those number prefixes are in order.

So the following quantities of bytes are in ascending size order, which means it goes from the smallest to the largest.

Fill in the gaps of the ones that are missing.

Our correct answers are megabyte, and the second answer is terabyte.

So in ascending order, the order of the number prefixes when we're applying it to bytes are kilobyte, megabyte, gigabyte, terabyte, petabyte.

Let's try and apply that to what Jacob and Sam were discussing before.

So with these two measurements, we've got to get them to the same number values.

So those prefixes are the same, which makes comparison a lot easier.

Converting to megabytes firstly would make it easier for Jacob to compare his data with Sam's.

He can convert his 3,290,000 bytes into megabytes.

To do that, we need to remember that one megabyte is one million bytes, so that's six zeroes on the end.

And if we take that value, divided by one million, we can convert 3,290,000 bytes into an equivalent of 3.

29 megabytes.

Those numbers aren't different, they're just expressed in a different way to make comparison easier.

And we can see that's 3.

29 megabytes, which is almost as large as a 4.

21 megabyte photo.

We could have done this a different way as well.

Jacob could have converted to kilobytes first and then to megabytes, making the maths that little bit easier.

We know that one kilobytes is a thousand bytes.

So if we take Jacob's homework file, 3,290,000, and divided that by a thousand, we'd be able to convert it firstly to 3,290 kilobytes.

We could have then divided by a thousand again to get to our 3.

29 megabytes.

More steps, but we are dealing with easier calculations each time.

We could have done it in a third way entirely differently.

This time, Sam could also have converted his megabyte value of the photo into bytes to compare the sizes.

As long as both people are working in the same units, our comparison will be straightforward.

So again, we know that one megabyte is one million bytes.

Sam's photo size is 4.

21 megabytes.

So if you multiply that by one million, we know that Sam's file is 4,210,000 bytes.

So we would have our two values for comparison.

However, larger numbers are more difficult to work with.

It's not as easy to compare two sets of large numbers.

Right, you are going to practise some of that now.

For the first task, I want you to use either the less than, the greater than, or the equal sign to make each of these four statements correct.

Then I want you to complete the table to compare the total amount of bytes and the different number prefixes.

Lastly, I want you to help out Jacob and Sam.

Jacob is saying he wants to download a playlist onto his phone, and it's saying that the file size of that playlist is 48, 800, 000, 000 bits.

Sam is helping him out and is saying that the phone says it has five gigabytes of free storage.

So Sam thinks there should be enough storage for Jacob's playlist.

I want you to do some calculations and check to see if Sam is correct.

Explain your answer by comparing the quantity of data of the playlist with the amount of data storage that is available.

Pause the video and have a go now.

Well done.

There was a lot of calculation to do there.

So in our first task, we had to use either the less than, the greater than, or the equal sign to make each of the statements correct.

Firstly, one byte is less than 10 bits because we know that one byte is actually eight bits, and 8 bits is less than 10.

Two bytes, which is two groups of eight, and 16 bits means that the second statement needs an equal sign.

For the third statement, 3 bytes would be 24 bits, three times eight, so we know that that's greater than 20 bits.

And lastly, 8 bytes is 64 bits.

And again, that needs a greater sign because that's greater than 63.

For our table, on the first row, we know that one million bytes is one megabyte, so we put a one in the first row.

We know for the second row that two billion bytes times eight is 16 billion bits.

On the third row, we could take our megabyte value and multiply it by a million to give us 40 billion bytes.

And on the last row, 12 million bits divided by 8 gives us 1,500,000 bytes.

That value is the same as 1.

5 megabytes, which is also then the equivalent of 0.

0015 gigabytes.

And lastly, we had to convert Jacob's playlist size, which was 48, 800, 000, 000 bits, we have to convert that to gigabytes.

That would be the easier way to compare it to the storage, which was five gigabytes.

Converting that playlist size into gigabytes, we'd have to divide that value by eight.

And then to convert that value into gigabytes, we would have to divide it by one billion giving us a result of 6.

1 gigabytes.

Therefore, we can see that Jacob's playlist size is larger than the available storage on the phone.

He won't be able to download it.

Let's try and apply some of this on the second learning cycle where we're going to calculate and compare text file sizes.

Either ASCII or Unicode standards can be used to encode text characters.

ASCII uses seven bits to represent each character, but in comparison, Unicode uses either 16 or 32 bits depending on the version being used.

ASCII uses seven bits to represent each character, whereas Unicode uses either 16 or 32 bits depending on the version being used.

We know that we use Unicode because it's able to represent a wider range of characters than ASCII, but this has the effect of increasing the storage required for identical characters.

Let's try an example.

I'm gonna consider some text that needs to be stored in a text file.

And the text is the sentence, "ASCII and Unicode encode text.

" In that sentence, there are 30 characters to represent, including the spaces and the full stop.

If we use ASCII to encode that sentence, each character is going to require seven bits for its representation.

Therefore, the total file size to store this text would be 7 times 30, which is 210 bits or 26.

25 bytes if we divided 210 by 8.

So let's check that you've understood that.

In bytes, what is the file size of an ASCII text file with the text "password" in it?

Well done, it's seven.

So there are eight characters in the word "password".

8 times 7 gives us 56 bits.

And if we divide 56 by 8, that tells us that the word "password" using ASCII would require seven bytes.

So let's look at that same sentence again.

If 32-bit Unicode encoding is used, each character requires 32 bits for its representation.

Therefore, the total file size to store this text would be 32 multiplied by 30 to give us 960 bits.

And if we divide that by 8, that's 120 bytes.

Aisha spotted that 32-bit Unicode requires four bytes per character.

She says she can count just the characters and multiply by four to find the file size.

There's no need to work out the number of bits.

So 4 times 30 characters gives us 120 bytes for 32-bit Unicode representation.

So let's use that same example text as before, but this time, let's apply it to 16-bit Unicode.

Can you tell me in bytes what is the file size of a 16-bit Unicode text file with the text "password" in it?

Well done, it's 16.

16-bit Unicode text per character uses 2 bytes.

So if we've got eight characters, we multiply that by two to give us 16 bytes of storage.

ASCII encoding results in smaller file sizes for text files.

Unicode encoding in contrast results in larger file sizes for text files.

ASCII encoding has a smaller character set so can only represent a limited number of characters, whereas Unicode encoding has a larger character set, so is able to represent a wide range of characters.

Let's put some of what we've learned into practise.

For this table, I want you to show how the file sizes of text files change when different encoding methods are used.

Once you've done that, I'm gonna give you a scenario where I want you to compare different ways of encoding data.

Now imagine you've got a low cost smartwatch, which is being developed.

The developers want to include an English dictionary on it.

First thing I want you to consider is the dictionary has 126 million characters in the file itself.

I want you to calculate for me the file size of the dictionary in megabytes if we use ASCII encoding.

Then to compare this against other methods, for the second part of this, I want you to also work out the file size of the dictionary in megabytes if a 16-bit Unicode encoding scheme is used.

By comparing those two encoding methods, you should then be able to judge what the effect is of using those.

Now for the third part, the amount of storage the smartwatch has affects the final cost.

I want to explain why encoding the dictionary using ASCII is the best choice in this particular scenario.

Once you've done that, I then want you to consider what would happen if we need to add to that dictionary.

The creators of the smartwatch would like to include an appendix to the dictionary providing definitions for different emojis.

Explain why ASCII would no longer be suitable.

And for the last question, the developers decide to use 32-bit Unicode.

How much additional storage is needed compared to ASCII?

Describe what this will mean for the cost of the smartwatch.

Pause the video and have a go at all of those tasks.

How did you get on?

Have a look at the table values and compare them against your own answers.

And for question two of task B, remember that we were looking at a low-cost smartwatch being developed and putting an English dictionary available on it.

We know that with ASCII, one character is represented by seven bits, therefore 126 million characters, if we multiply by seven, gives us 882 million bits.

And if we divide that by eight, that's 110,250,000 bytes.

That's still quite difficult to comprehend, so we convert to megabytes, remembering that one megabyte is a million bytes.

And if we divide the two numbers, we can work out the dictionary, if represented in ASCII, would be 110.

25 megabytes.

But if instead we used 16-bit Unicode encoding, we remember that one character is two bytes.

Therefore, 126 million characters would be 252 million bytes.

Again, dividing by a million, we can convert those number of bytes into 252 megabytes.

For Part C, we're considering the cost of the smart watch.

If we use ASCII encoding, because ASCII uses fewer bits per character than Unicode, we're going to need less storage on the smartwatch.

In this case, it's the cheaper option because we won't need as much.

So whilst for storage it's better to use ASCI because we need less of it, it doesn't give us as many options in the characters we can represent in Unicode, which is important for Part D, the creators of the smartwatch want to add in emojis to the dictionary.

The ASCII character set is limited to 128 characters.

Emojis aren't included in this.

So if we are going to add emojis to the dictionary, then we're going to need Unicode because Unicode provides more characters including emoji symbols.

And lastly, we need to compare against 32-bit Unicode and how much additional storage would be needed compared to ASCII.

ASCII uses seven bits per character, whereas 32-bit Unicode uses four bytes per character.

ASCII will require 110.

25 megabytes, whereas Unicode will require 504 megabytes.

Unicode therefore requires approximately 4.

5 times the amount of storage space, and that's gonna add to the cost of the smartwatch.

Well done.

We covered a lot today and did lots of calculations.

Let's just see what we've learned.

Remember, a byte is equivalent to eight bits.

Number prefixes are used to summarise multiples of a number for easier comparison.

Different text encoding schemes use different amounts of bits to represent each character.

And file size measures the amount of binary data and is summarised using bytes and number prefixes.

(no audio)

I've finished the video