Loading...
Databases play a surprisingly important role in our daily lives.
For example, every time you log onto your school's IT system, the chances are you're interacting with the database.
Now if databases are so important, how do we extract data, manipulate them, or even create a database using code? Well, in this unit, that's exactly what we're going to learn how to do.
We're going to use a language designed specifically to interact with databases called Structured Query Language, or most people referred to it as SQL or SEQUEL.
Now, in this lesson, we're going to gain a foundation understanding of the structure of databases.
So moving forward, we can learn how to use SQL appropriately and pick the correct functions to solve the problems that we have.
So in this lesson, all you'll need is a computer and a web browser.
You'll also need a free bit of open source software called DB Browser for SQLite.
And you'll find the link to that on task three on your worksheet for this lesson.
Now, please ask your parents or carer to do this for you in advance of this lesson, whether or not you're downloading it for a PC or a Mac.
So once you've done all of that, if you can clear away any distractions that you might have.
And when you're ready, let's get started.
Okay, so in this lesson, you will describe what are databases.
We're going to be able to define some key terms associated with databases.
And then we'll look at what flat file databases are and relational databases, and compare the two.
Okay, so our first task for this lesson is for you to get your teeth into a real database, okay? Which is to go to task one on your worksheet and search for vehicle information on the DVLA website.
So, I'd like you straight away to pause this video, head over to your worksheet, and you'll find a link to that website and some tasks for you to complete on your worksheet, okay? So pause the video and unpause it when you've done that.
Okay, so how do you get on with that task? Now, there are some questions on that worksheet that I wanted you to answer.
So let's go through what the answers might have been.
So first question is how do you think the vehicle data has been stored? Well, do you think that maybe it was in a database? Do you think maybe it was in a table? Or do you think maybe it was all just in a structure that is easy to search? Well, all those answers would have been correct, because the answer was it's stored in a database.
All of that data that you searched for was retrieved from a database, and probably which is from a table or multiple tables, and therefore, it's in a structures easy to search.
So if you had any of those three answers and they would have been broadly correct answers, okay? And hopefully later on in this lesson, you'll fully understand what that means and the structure of databases as well.
Now, the next question was, how many vehicles do you think that DVLA have in their records? Okay, so do you think it's every registered car in the country? Well, it is, every, it absolutely is every registered car in the country, but the question is, how many do you think that is? Do you think it's in the millions, maybe? Well, the actual answer is at the time of recording this, was 37.
9 million cars.
So that database that you searched there was a collection of 37.
9 million pieces of data that you then you searched for and it was able to retrieve and display it to you, okay? So there's some important terms that we're learning about already.
So, when large amounts of data need to be collected and organised, it is often stored in a database.
Now, a database is a structured set of data.
And it's structured in tables, that are therefore easy to search and update.
Okay, so here you can see, in the example that we looked at, here, you can see a single record of the database.
Now a record is a collection of data for one object, person, or thing.
And this case, you can see it's a record for a vehicle, okay? So I've highlighted on that image there.
You can see if we were searching for those cars, we could see the record is that complete set of data about that one individual car.
So you can see it's Volkswagen.
It was first registered in March, 2009.
Year of manufacture 2009.
And you can see cylinder capacity, CO2 emissions, fuel type with diesel, and et cetera, like that.
Okay, so that's a record.
Now we also use this term fields as well.
Now, fields are used in the database to provide category headings for each piece of data in the database.
So whilst the one complete piece of info set of data about one individual car was a record, the field are those category title.
So we have vehicle make, date of first registration, year of manufacture.
Those are all fields.
So we have already just explored some of those key terms. So, let's look at all the key terms we need to be aware of and spend a little bit more time exploring those as well.
So, here, you can see an image, which is a flat file database.
Now a flat file database is a database that contains a single table.
So all the data is stored in a single table, okay? Now it's called tblTracks.
So tbl is a short name for table.
So we can see here that this table holds a collection of records for a particular theme.
And in this case, you can see it's a collection of music tracks, okay? So as each table contains a collection of records as well.
So each record, much like the car example, contains data for one object, person, or place.
So in the example that we're looking at here.
In this example, each record is a single song track.
So we can see, we have four records here.
We have record number one, which you see it's TrackID is one, Float away, The Springs, pop.
That makes up a single record.
And the table we're looking at here has four examples of those records.
Now the table also has a number of fields.
And the fields are categories for each piece of data in the table.
So, again, like the car example, the fields like data registration, and things like that, and CO2 emissions.
In this example here, how many fields can we see? Well, the answer is four again.
So we've got TrackID, Title, Artists, and Genre.
And those are all the field titles that we've got there, okay, or the titles.
So, what we have there, you can see is the ones going horizontally, the rows make up the records, whereas this column headings, they're called fields.
Now really importantly about databases, a record in the table needs a unique identifier.
Now a primary key is used to give each record a unique code.
Now, the key thing about a primary key is that a primary key can never be repeated.
And therefore that ensures that each record in the table is unique.
Now this is really important when dealing with thousands or even millions of records, because if we're trying to find something from a database, we need to make sure we're finding the right thing.
So there has to be something unique about it.
Now, if we look at this particular example with tblTracks, TrackID is the only thing that could possibly be unique in this table.
So for example, if we look at genre, we can see pop being repeated more than once in the genre column.
Therefore, that is not unique.
We know that the artist is The Springs.
Well, there's only one, The Springs, in these four records that we see.
If The Springs ever recorded another song, and therefore, The Springs will be repeated and therefore that's not unique either.
And then finally, if we look at title, again, Float away is unique in this table, but there are songs out there where the same song name has been used by different artists.
Now it might be a completely different song, or it might be that they've made a different version of the same song.
So it would have the same song title, even though there's something else different about it.
So the only thing we've used that is TrackID by giving me a unique number there.
And you often see this, for example, if you ever purchased something online, you normally get an order ID, an order number, and that will be unique.
Or if you have a tried to parcel online, for example, you've got a unique code that you put in.
And when you type that into the database, it'll tell you all the information about the parcel and where it is in the country right now, okay? Now let's just do a quick key word recap.
So I'd like to pause this video now and see if you can work out the three words that we've talked about, the field, table, and records, where would they be on this data, on this labelled on this image that you can see on the screen there? So what word would fit at position A? What word would fit at position B? And what word would fit at position C? So pause the video now and see if you can identify them and unpause when you've done it.
Okay, so let's quickly go through the answers.
So let's start off with fields, will that position at position A, B, or C? So you can just shout that out to me.
So three, two, one, it was position B, okay? So fields was that at position B, 'cause you can see that there, the title, four columns, the TrackID, title, artists, and genre.
Okay, so the next one is table.
So was table at position A or C? Okay, so shout out to me.
Three, two, one, it was at position A, okay? So well done if you've got that.
And then finally, you don't shout this out for me because there's only one place it can be.
And therefore, the whole record that complete set of data about that one track is called a record.
So that's at up position C.
Okay, so now let's look at what we call flat file databases.
Now here is a flat file database.
Now we spoke about this earlier, that a flat file database is where all the data stored in a single table, okay? So you can see here, here's a flat file database that has been used to record data about the downloads of tracks from a music website, okay? Now, your task is to go to task two on your worksheet and highlight any potential issues that might be from storing data in this way.
Now, what obviously highlighting to you that flat file databases can cause problems. And you can see that from the title of this slide, it says an inefficient flat file database.
So if we're storing all the data about all the downloads that would be made from our organisation here, that we've got the tracks, we've got the artists, what genre is, we've got the download, the date and time that's been downloaded and who's downloaded it.
I'd like to spend a little bit of time looking through the data in this table and see if you can work out why this might be problematic or where are the inefficiencies in this database, okay? So I'd like you to pause the video now, and then if you can highlight the things that are, use the worksheet, highlight where you think the inefficiencies are and unpause when you're ready.
Okay, so how did you get on with that exercise? Were you able to identify some of the inefficiencies or inconsistencies or problems that a flat file database might cause? Now, one of the problems that I identified was that, and you might have noticed that as Sara Bibi's details were entered twice.
Now in a flat file database, her data would need to be entered twice, because she has downloaded two different tracks.
So her data details are entered in twice.
And that would lead to what we call data redundancy, which means that because we're repeating the exact same data twice, then that's more storage space than we need to use, and it can lead to problems later on when we're having to reenter that data.
And it's obviously, very time consuming as well, potentially.
Now, another example that you might seen of this data redundancy is the track Float away has been downloaded three times.
That means that all the data for Float away has to be entered three times.
So that means the TrackID, the title, the artist, the genre, that's all been entered in three times.
And again, that's more data than we need to be using in our database, okay? So like I said, that's an example of data redundancy.
Now, also having to enter data multiple times can mean that there's an increased risk of having inaccurate data.
Now you may have noticed with Float away, although it's entered three times, we have two different artists.
We have The Springs and The Spings.
Now we know that they're not separate artists, for example, a cover band doing The Springs Float away.
We know that it's the same, because we have that unique primary key.
So the TrackID is the same.
So we've got TrackID is one for all three.
So we know that they're same, but we've got a problem in that The Spings and The Springs are different.
So which one is correct? And because we don't know which one is correct, that can actually lead to like was inconsistent data, but lacks trust in the data as well, because now we don't know which one's correct.
And once we found out which one's correct, we then need to go and correct the whole database.
So every time there's an entry that says The Springs, we need to then correct it, which can obviously be problematic.
And we can avoid these problems too.
So this nicely brings us on to how we might avoid these problems by creating things called relational databases.
So, let's explore what is meant by a relational database.
Now, the most common model for a database is a relational model.
Now, a relational database contains more than one table.
Now, the data and the tables are linked using relationships, which is why it's called a relational database.
So let's explore some of those relationships.
Now you can have three relationships, you can have a one to one relationship, you can have a many to many relationship, and a one to many relationship.
So let's look at each one of those in turns to understand what is meant by those terms. Now, a one to one relationships means that one record in a table relates to a single record in another table.
Now a potential example of this might be that, you as a student in school might have in one table, student details, such as your first name and a surname, and another table, we might have your contact details, for example, so your address and telephone number.
Now one student would have one contact detail, and that contact detail relates to one student, okay? So that would be an example.
It might not be perfect.
For example, if you had a brother or sister in the school or a sibling or somebody else who have the same address as you, then that might become problematic, but it could in theory work if only one student lived at one address, okay.
And there was only you, one student at that address.
That would be a one to one relationship.
A one to many relationship means that one record in a table relates to multiple records in another table, okay? Like the songs table that we looked at before, it may well be that we have one person.
So one member can download many tracks.
Okay, So that'd be a one to many relationship.
And then many to many relationships means multiple records in a table might relate to multiple records in another table.
So again, think about the school scenario.
It may well be that one student can take many lessons.
So you have a table for students and a table for lessons.
So one student can take many lessons and a lesson contain many students, okay? So that will be a many to many relationship.
Now, the downloads database that we looked at before, it needs to be stored in a relational database.
So you can see on this diagram here, the data now is being split into three separate tables.
Now, keeping the database as a flat file database, as we discussed earlier, would cause issues with inconsistency and redundancy.
Now this is because each track can be downloaded multiple times.
So by splitting the data into three tables, we've got a track there.
Therefore, we're storing the data once in that track table.
So each track that's created by an artist, it's only ever recorded once in the table, okay? However, that can be downloaded many times or multiple times.
And the same with members.
So we will store a member once.
So Sarah Bibi's details would only appear on our database once on the members table, however, she can download multiple songs.
Okay, so using the language that I've used there, can you identify which one of the three relationships am I describing here, okay? So is it a one to one relationship? Is it a one to many relationship? Or is it a many to many relationship? Have you got an answer for that? So you can shout out on the screen for me three seconds.
So three, two, one, it was a many to many relationship.
And you can see that the way this is notated, actually on the diagram.
This is what you call crow's feet notation.
Now you don't have to worry too much about this, but it's called crow's foot, 'cause if you look on that diagram, you can see a dot and that link from the track's table and that links down to something that spreads out and it looks like a crow's foot.
That's why it's called crow's feet notation.
But those three kind of lines means many.
So if you see a one on one side, that's a one.
If you see the three dots, that means a many, so one to many relationship, okay? Now the data doesn't need to be repeated and relationships can be made using the source table's unique identifier.
So you'll notice that we've got download ID, TrackID, MemberID, date, and time.
We have a primary key in here.
So the download ID is the primary key, and that's unique to the download table.
And that cannot be repeated.
However, you'll notice that we have other numbers in here for TrackID and MemberID that are being repeated.
Now this is called a foreign key.
So when you link to a source table's primary key, you use a foreign key.
So the TrackID one, if we look at the first row here, we have the first record, download ID number one, links to TrackID number one and MemberID number one.
So whilst the TrackID, MemberID are not unique in this table, they are unique in another table, that parent table or the source table.
So number one TrackID relates to something unique in the track's table, okay? Now, a foreign key can be repeated because it's a link back to the primary key in the source table.
So using a foreign key, you can go to tblTracks table and find that linked track.
So let's explore what is TrackID number one.
Our TrackID number one, using the foreign key, you can go to tblTracks and find that the linked track is Float away by The Springs, and the genre is pop.
So, that data, Float away, The Springs, and pop, is never being repeated.
However, the TrackID, only that one digit there, number one, is being repeated in another table as a foreign key.
You also see the foreign key for the members table to find that who downloaded that track and at what time of day, okay? So we've got MemberID number one.
So that's a foreign key.
So let's go to the source key and find out who that was.
Well, the source table was the member's table.
So the MemberID number one is the primary key in the members table, and that was our Sarah Bibi, as you can see in her details that only been stored once in the members table, okay.
So the music database now has three tables, all linking together using these relationships.
So data isn't repeated, because it's been linked to the source table's primary key using that foreign key.
Okay, and hopefully, this diagram makes it a little bit more clear, okay? So just to clarify, the TrackID, that's unique in this table and the track, so for example, The Springs, and Float away, they're only ever going to appear once in the tracks table.
However, that song can be downloaded multiple times, but the only thing being repeated.
So the title, artist, and genre will never been repeated, but the TruckID is the only thing being repeated, because it's a foreign key.
And that makes a link between the tables.
Okay, so that brings us again nicely onto something called DBMS. Okay, so let's explore what is meant by DBMS. Now, DBMS is a database management system.
So we're going to be using DB Browser for SQL as your database management system for this unit.
Now a DBS allows you to define manipulate, retrieve, and manage data in a database.
Now, this is a free download.
So if you haven't downloaded it, you'll need to, for this lesson and the remainder of the unit.
And it is a free download whether or not whichever operating system that you're using.
But please do ask your parents or carers for permission before doing this, before installing this onto your computer, okay? So what we'd like you to do is I'd like to go ahead and use that using more of the downloads that we've also got for this unit, which is a database that we pre-populated for you about this song and downloads data.
So we're going to explore that.
So you need to download the software.
You also need to download the zip for this folder and extract it so that you have the database.
And I'm just going to demonstrate how you would then use this DBMS before you start doing that, okay? So let's head over to the software now so we can explore it together.
Okay, so I've downloaded DB Browser for SQLite, and have installed it on my machine.
And I've also downloaded the zip folder that comes with this lesson.
So once you've downloaded that if you can extract that, because inside there, there's music database in there.
Okay, and we need to use that.
So once you've opened up the software, it looks like this.
What you need to do is click on this Open Database at the top, okay? You can also access it by going to File and open database.
And I'm going to click on that.
It will open up a browser window for me.
And I'm going to navigate to that dbMusic.
db file that I extracted from this zip file that I downloaded.
Okay, so I'm going to click on Open.
And that will open up my database.
And you'll see in a second that there's all my files that I've got there.
So all the tables, sorry, that I've got there.
And you can see the structure I've got.
So we've got tblDownloads, tblMembers, and tblTracks.
Okay, now if we wanted to explore those in a little bit more detail, we can click on these little arrows on the left hand side of it.
And that would like show me a bit more detail about the fields that are stored that make up that table, sorry, okay.
It also tells us a little bit more about the type.
So the data type that we've got there.
Now, we're going to explore databases in a little bit more detail in a future lesson, but we can see that we've got integers and text fields there as well.
Okay, now, if we wanted to explore this table a bit more, we can't click on the table.
So highlight it and click on Modify Table.
And you can see a little bit more information that in a kind of user friendly way, but also allows us to see some of the code that made up the table.
So in this here, we've got the fields on the left hand side, we've got the data types there, but we've also got these little tick boxes.
So each one stands for not null, which means that you can't leave it blank.
PK stands for primary key.
And if you put your mouse over it, by the way, it does also tell you what it means.
AI means auto increments.
So you can do this on integer types, which means that you'll, next time you create a record, automatically that field will increment to the next number along.
So let's say we've got two downloads.
You may expect that to be ID number one, ID number two.
So if you make a new download, automatically, the download ID would change to three without you having to input that, okay? And then next one is unique.
So you take that to make sure that it is uniqueness.
Since it's primary key, it has to be unique, okay.
Now you don't need to do anything with this section here, it's just there so you can see it, okay.
And also if we have a look here again, we're going to explore SQL next lesson, but do have a look at the SQL that was used to make this table that was generated automatically, okay? Now, although it is code, I'll Imagine that you are able to just read through and pick out some key details from there as well, okay.
For example, you should be able to look at this and work out what the foreign keys are from looking at what the SQL is, okay? So, I'm just going to cancel that to make sure I don't save any changes to my database.
So the other thing we're going to require you to do in the next task in this lesson is to look and explore the data in this database.
So what I'd like to do is click on Browse Data.
You see the way it currently opens up by default on the database structure where my mouse is now.
But if click on Browse Data here, we can see that we've got our three tables, but it's showing me how many records are in each table.
So we can see tblDownloads has a 1,000 downloads.
So let's look at that one as well.
So we click on this dropdown and click on tblDownloads.
It will show us all the data in my database.
I can scroll down and see all these 1,000 records.
Now this is just a sample data set of a 1,000 records.
So just imagine for one second, when we looked how many million records are inside the DVLA website or database, sorry, that we looked at right at the beginning of this lesson.
So just think how many downloads there are, or download records there are in some of the more popular download music streaming services, for example, that you might have used.
So those databases are going to be absolutely massive.
So we can see this.
If you wanted to search for something, for example, we wanted to find out how many songs have been downloaded at 12 o'clock, let's say, noon.
So if I put 12, and it.
Yeah, so you have 12, it filters it and it'll search for which songs have been downloaded at that exact time.
So I can get rid of that filter.
It should show me everything in the database again.
Okay, so that's that.
So your next task, which I'm going to explain a second requires you to just explore this software, explore the database and answer the questions on your worksheet.
So heading over to the slides now, what I'd like you to do is complete task three on your worksheet, which will allow to explore this database and familiarise yourself with this new application, okay? So I'd like you to pause this video now, complete task three on your worksheet, which will allow you to explore the database and familiarise yourself with the new application.
So answer the questions, have fun with it, take your time, and I'll be here when you get back.
Okay, so the final task of this lesson is for you to look at what are the data that's being repeated.
Now we try to avoid some of the problems that we had in our database by making it from a flat file database in a relational database.
So we did reduce quite a lot of that issue that we had with data redundancy, but I'd like you to take another look at the music database and see where you can see some other repetition of data that might cause redundancy, okay.
and what could you do to reduce this repetition.
So pause the video, go back to your database and see if you can identify those places.
Okay, so, were you able to spot where there is still some repetition of data in this database? Okay, so let's go to the answers.
So where can you still see repetition? Well, you'll notice that genre and artists are repeated in the tracks table, because genre, for example, pop, there are many pop songs and they're all in that same table.
So pop has been repeated lots of times.
And the same with artist as well.
So artist is also being repeated.
So, The Springs, for example, released more than one track.
So, therefore, their data is being repeated lots of times in the tracks table, okay.
So the next question is, how can we reduce the chance of repetition? Well, all we need to do is remove them and separate that data into separate tables.
So you create two new tables, for you to have a new table for genres and a new table for artists.
And all you're doing that tracks table is put foreign keys in, that would relate to the primary keys in the genre table and the artist's table, okay.
Okay, so that's all for this lesson.
And I really hope that you've enjoyed getting to grips with some of the database theory and the concepts behind what makes the difference between a relational database and a flat file database, and why we use relational databases.
Now, we also got a small glimpse of some SQL.
So from lesson two to lesson five of this unit, that's what we're going to be learning how to use.
We're going to be learning how to use SQL to manipulate data and extract data from databases, okay? So we'd love to see some of the work that you've done in this lesson.
So if you'd like to share that with us, please do.
And please ask your parents or carer to share your work on Instagram, Facebook, or Twitter, tagging @OakNational and using the hashtag LearnwithOak.
Okay, so I'm looking forward to seeing you next lesson where we're going to get to grips with some SQL.
So I'll see you then.