In this post I will explain what database normalization is and describe an example of how to develop a relational database schema at 3rd normal form. I recommend reading this post on a large screen.
What is a relational database?
A well designed relational database will consist of structured tables that are related to one another through the use of foreign keys. Each table maps directly to a real world entity and will consist of columns that correspond to that entities attributes. It will also contain a primary key (one or more columns that uniquely identify a row), and foreign keys to map a row in that table to rows in other tables.
What is normalization?
Normalization of a database table is done to reduce data redundancy and duplication of attribute data. By normalizing a table in to more manageable smaller tables, it becomes easier to manage changes to attribute data, by making the change in a single table, and having that change propagate through the other tables via foreign keys.
Establishing the first normal form (1NF)
Above is an invoice containing customer orders from a company that sells fish. You will notice that there is a large amount of duplication in many of the rows. First normal form is achieved when all attributes become atomic (contain only a single value), any repeating groups are eliminated and a primary key is identified to uniquely identify a single row of related data. The primary key can be made up of multiple attributes provided they combine to uniquely identify a given row in the table, this is called a composite primary key. In its current un-normalised form (UNF) the primary key is made up of two fields: Invoice Number and Product Code. It takes the combination of these two attributes to uniquely identify a row in the table.
First normal form dependency diagram
1NF (INVOICE_NUMBER, PRODUCT_CODE → CUSTOMER_NUMBER, CUSTOMER_NAME, INVOICE_DATE, LINE_NUMBER, COMMON_NAME, SCIENTIFIC_NAME, CATEGORY, QUANTITY, PRICE_CHARGED) PARTIAL DEPENDENCIES: (PRODUCT_CODE → COMMON_NAME, SCIENTIFIC_NAME) (INVOICE_NUMBER → LINE_NUMBER, QUANTITY, PRICE_CHARGED, INVOICE_DATE) TRANSITIVE DEPENDENCY (CUSTOMER_NUMBER → CUSTOMER_NAME)
What are transitive and partial dependencies?
A partial dependency exists when an attribute in a table relies on part or all of an existing primary key.
A transitive dependency exists when an attribute in a tables relies on one or more attributes that are not primary keys.
Establishing the second normal form (2NF)
At this stage you must separate the 1NF diagram to illustrate the separate tables based on the partial relationships that were identified, and identify the primary key for each table.
Establishing the third normal form (3NF)
Finally you need to extract any transitive dependencies that reside within a table that was created during 2NF. The existing partial relationship tables are retained and a new table is created to represent the transitive dependency. The determinants in the transitive relationship become the primary key in the new table.
This database of tables can now be considered normalised.
3NF Entity-relationship diagram