Sunday, April 11, 2010

Codds commandments/rules


The Relational Model was propounded by Dr. E.F. Codd of IBM in 1972. For a package to qualify as an RDBMS, compliance with the 12 rules is required. In reality, all the 12 rules don’t carry the same degree of importance and some very good RDBMS products which exist today cannot even claim to obey more than 8 or so of these rules.

RULE 0
Any truly relational database must be manageable entirely through its relational capabilities
This is a single overall rule which tries to cover all others in a single sentence. It simply means that a RDBMS must be Relational, Wholly Relational and nothing but Relational.

Rule 1 : The Information Rule
All information is explicitly and logically represented in exactly one way – by data values in tables
It means that if an item of data does not reside somewhere in a table in the database, then it does not exist and this should be  extended even to information such as tables, columns, views, column names, constraints and all other objects should be contained in a table-form

Rule 2: The rule of guaranteed access
Every item of data must be logically addressable by restoring to a combination of table name, primary key value and column name.
Though it is possible to retrieve individual data item in many different ways in a Relational/SQL environment, this rule means that any item can be retrieved by supplying the table name, the primary key values of the rows to easily search the row and the column name which is to be found or propogated.

Rule 3 : The systematic treatment of null values
It means that NULL values are representation of missing and inapplicable information. This support for null values must be consistent throughout the RDBMS and it should be independent of data type( a null value in a CHAR field must mean the same as null in an integer field)

Rule 4 : The database description rule
The description of the database is held and maintained using the same logical structures used to define the data, thus allowing users with appropriate authority to query such information in the same way and using the same language as they would query any other data in the database.






Rule 5 : The comprehensive sub-language rule
This means that the RDBMS must be completely manageable though it’s own sub-categories of SQL-language commands for enabling
            Data-Definition(DDL)
            Data-Manipulation(DML)
            Data Authorisation(DCL)
            Transaction boundaries(TCL)
            Retrieval of data(DQL)
            View definition(Describe)
            Integrity constraints(Data Definition/Data Alteration)

Rule 6: The view updating rule
All views that can be updated in theory, can also be updated by the system
It simply means that only simple views are updatable, Complex views are non-updatable,(but can be made updateable through the use of INSTEAD OF TRIGGERS in ORACLE).  A simple view is a view which is based on a single table, has non calculated columns,no group function, no group by clause, no distinct, no rownum, no rowid or any such pseudocolumns. Simple view are again updatable subject to non-violation of constraints at the base table level.

Rule 7 : The Insert and Update rule
A RDBMS has to be capable of Inserting, Updating and Deleting data as a relational set(that is more than one row in a single go). Many RDBMS fail to achieve this and hence fall-back to a single-record-at-a-time procedural technique when it comes to data manipulation.

Rule 8 : The Physical independence rule
Users access to the database, via application programs, must remain logically consistent whenever changes to the storage representation or access methods to the data, are changed.
Example, if an index is built and destroyed by the DBA on a table, any user should still retrieve the same data from that table, although a bit slowly.

Rule 9 : The Logical data independence rule
Application programs must be independent of changes made to the base tables, with minor changes in the corresponding queries or use of views(makes changes in the query of the view)

TAB1                                                              FRAG1                                   FRAG2
A         B         C         D                                 A         B                     A         C         D
1          A         C         E                                 1          A                     1          C         E
4          A         C         F                                  4          A                     4          C         F
6          B         D         G                                 6          B                     6          D         G
2          B         D         H                                 2          B                     2          D         H

It should be possible to split a table vertically into more than one fragment as long as such splitting preserves all the original data(is non loss) and maintain the primary key in each and every segment


FRAG1                                   FRAG2                                               TAB1
A         B                                 A         C         D                     A         B         C         D
1          A                                 1          C         E                     1          A         C         E
4          A                                 4          C         D                     4          A         C         D
6          B                                 6          D         G                     6          B         D         G        
2          B                                 2          D         H                     2          B         D         H

It should be possible to combine base tables into one by way of a non-loss join


Rule 10 :  INTEGRITY RULES
The relational model includes two general integrity rules
Integrity Rule 1 (Entity Integrity):
If the attribute A of a relation  R is a primary attribute of R, then A cannot accept null values

Integrity Rule 1 (Referential Integrity):
It is concerned with foreign key, that is, with attributes of a relation having domains that are those of the primary key of another/same relation.

Rule 11 : Distribution rule
A RDBMS must have distribution independence
This is one of the attractive aspects of RDBMS packages. It simply means that databases built on relational model are suited to client-server architecture.


Rule 12 : No Subversion rule :
If an RDBMS supports a lower level language that permits for example, row-at-a-time processing, then this language must not be able to bypass any integrity rules or constraints of the relational language.

Basic RDBMS TERMS

1) RELATION
A relationship is an association among several entities
eg:- A cusst_Acct relationship associates a customer with each account that she or he has.
Also, a table or an entity set is sometimes called a relation.
In both cases, a relation is a rectangular output consisting of row(tuples or records) and columns(attributes of fields)

2) REDUNDANCY
If same piece of information is stored in database for number of times the database is said to be redundant. We should check our database should not be redundant as it wastes disk space, reduced efficiency of database, require more processing time, and their are chances of inconsistency due to it in our database.
eg:-If we have to tables emp_details (contains details of employee) and Payroll(contains Payment details to employee), than if we include details of employee in payroll table, than it is said to be redundancy as same piece of information is repeated.

3) INCONSISTENCY
Inconsistency is various copies of the same data which is not consistent. Inconsistency occurs due to redundancy, so redundancy should be reduced.
eg:- If we have details of employee stored in emp_details and payroll table than while updating information we should check that both tables are updated or not, if we update the address of one employee in emp_details and same details is not updated in payroll table, than database is said to be in inconsistent state.

4) PROPAGATING UPDATES
Propagating updates ensures that changes made to records/data of one relation or tables, are automatically made to other tables or relations. This process is known as Propagating updates. Where the term "Updates" is used to cover all the operations of insertion, deletion and modification.
We can avoid inconsistency by using propagating update technique.

5)INSTANCES
The collection of information stored in the database at a particular moment in time is called an instances of the database.

6)SCHEMES
The overall design of the database is called the database scheme. Schemes are changed infrequently, if at all.

7)USERS
There are broadly three different types of database system users.

End Users
They are users who interact with the system by invoking one of the application programs. Thus, they are persons who use the information generated by a computer based system. Retrieval is the most common function for this class of user. Generally application user, data entry operators and report generators are come under this category.

Application Programmers
They are who prepare or code the application. Application programs operates on the data in all the usual ways: retrieving information, inserting new information, deleting or changing existing information.

Database Administrator
These are the highly skilled users responsible for the overall performance and maintenance of the database. They design the database schema, create the database, create users, grant and revoke rights from users and manage the database. Their responsibility includes recovery of database in case of failure and tuning the database for optimum efficiency.

D)KEYS concept in DBMS
A Key is a single attribute or combination of two or more attributes of an
entity set that is used to identify one or more instances of the set.

PRIMARY KEY:-A primary key is a field that uniquely identifies each record in a table. As it uniquely identify each entity, it cannot contain null value and duplicate value.

SUPER KEY :- If we add additional attributes to a primary key, the resulting combination would still uniquely identify an instance of the entity set. Such augmented keys are called superkey.
A primary key is therefore a minimum superkey.

CANDIDATE KEY:-A nominee's for primary key field are known as candidate key.

ALTERNATE KEY:-A candidate key that is not the primary key is called an Alternate key.

COMPOSITE KEY:- Creating a primary key are jointly from more than one attribute is known as composite key.

FOREIGN KEY:- Foreign key is a primary key of master table, which is reference in the current table, so it is known as foreign key in the current table. A foreign key is one or more columns whose value must exist in the primary key of another table or same table

Domain

A domain defines the possible values of an attribute. Domain Integrity rules govern these values. In a database system, the domain integrity is defined by:
The datatype and the length
The NULL value acceptance
The allowable values, through techniques like constraints or rules
The default value

For example, if you define the attribute of Age, of an Employee entity, is an integer, the value of every instance of that attribute must always be numeric and an integer. If you also define that this attribute must always be positive, the a negative value is forbidden. The value of this attribute being compulsory indicates that the attribute cannot be NULL. All of these characteristics form the domain integrity of this attribute.


Integrity
Attribute Constraints
There are two major Integrity rules/constraints
Entity Integrity Constraint: States that “Primary key cannot have NULL value”

Referential Integrity Constraint: States that “Foreign Key can be either a NULL value or should be Primary Key value of some relation.

Also
Semantic Integrity Constraint :General restrictions on data and changes to it. Example Salary must be greater than 20000


Domain Integrity Constraint : Each column contains same type of data thus when you select a data type for a particular domain then DBMS will not accept any value of other data type.


Database Constraints
They are constraints on the database that require relations to satisfy certain properties. Relations that satisfy all such constraints are legal relations.

The basic concept in a relational model is that of a relation(table or output of a Query). A relation can be viewed as a table which has the following properties :

1)It is column homogeneous. That is, in any given column of a table all items are of the same kind.
2)Each item is a simple number or a character string. That is, a table must be in the First Normal Form.
3)All the rows of a table are distinct
4)The ordering of rows within a table is immaterial
5)The columns in a table are assigned distinct names and the ordering of these columns is immaterial

Database Management System


It is a collection of programs that enables user to create and maintain a database. In other words it is general-purpose software that provides the users with the processes of defining, constructing and manipulating the database for various applications.
Database management systems frees the programmer from the need to worry about the organization and location of data. The primary goal of a DBMS is to provide an environment which is both convenient and efficient to use in retrieving and storing information.

DATABASE STORAGE HIERARCHY
1) DATABASE
A collection of data files integrated and organised into a simple comprehensive file system, which is arranged to minimize duplication of data and to provide convenient access to information within that system to satisfy a wide variety of user needs.

2) DATA FILES
Datafile is the file which actually physically keeps the data in secondary storage device. It represents the physical representation of the database.

3) DATA OBJECTS(Tables)
Objects such as tables, procedures, functions, etc, especially TABLES logically keep the data stored together. These objects may me spread across one or more physical data files. Tables or Relations are Entity Sets.

4) RECORDS
A collection of related items of data that are treated as a unit. It is the information about an entity.
eg:- An employee record would be collection of all fields of one employee.
Record is sometimes referred as tuple.

5) FIELD
Individual element of data is called Field.
eg:- Bank cheque consist of following field cheque no, date, payee, numeric amt, signature, bank,etc.
Field is sometimes referred as Data item Or column or attribute.


Introducing Relational Databases

Relational Model Paradigm
1)         In a DBMS, relation between two files is created and maintained programmatically. Hence the relation is temporary.
In a RDBMS, relation between the two tables can be created at the time of table creation or later on, and hence it is permanent

2)         In a DBMS, we have the concept of Fields, Records and Database Files
In a RDBMS, we have the concept of Column(attribute), Row(Tuple or Entity) and Table(Relation or Entity Class/Set or Applet)

3)         IN a DBMS, Network traffic is high, since most of the processing is done at the client or requestors side.
In a RDBMS, network traffic is low, since most of the processing is done at the Server side

4)         DBMS does not support the Client-Server architecture. It is a point-to-point communication between two machines
            Most of the RDBMSs support the Client-Server Architecture
Example MS Access is a RDBMS, which does not support the Client-Server Architecture.

5)         In a DBMS, we have file –level locking.
            In a RDBMS, we have table, as well as Row-level locking

6)         In a DBMS, there is no support for Distributed-Database Concept
            In a RDBMS, there is support for a Distributed-Database Concept

7)         In a DBMS. There is no Security of Data.
In a RDBMS, we have multiple levels of Data-Security like i)Logging in Security ii) Command Level Security iii)Object Level Security

8)         In a RDBMS, we have the concept of Data-Dictionary(or Meta-Data or Data about Data) implemented in the form of System Tables and Views.
                        In a DBMS, there is no Data-Dictionary concept.








Object Oriented DBMS Model


IRIS is an Object Oriented DBMS under development at Hewlett Packard. It supports object model and normalized data.

A class is called a type and represents a collection of objects that share the same properties. A method is called a function and objects belonging to the same type share common functions. Objects respond only to their functions. Objects can be arranged in a hierarchy and inherit properties and functions

Object-Relational Model
Object-relational database management systems (ORDBMSs) add new object storage capabilities to the relational systems at the core of modern information systems. By encapsulating methods with data structures, an ORDBMS server can execute complex analytical and data manipulation operations to search and transform multimedia and other complex objects.

Relational Algebra
Relational Algebra is a procedural language which specifies how to get the required information and how to build a relation(set) from one or more other relations(sets). It specifies operations to be performed on existing relations(sets). The basic operations are the traditional set operations : UNION, DIFFERENCE, INTERSECTION and CARTESIAN PRODUCT. The first three operations, except Cartesian product, require that the operands(sets or relations or tables) be compatible(structurally same number, order and data-type of attributes) for the operation.

Consider the following two Operands(sets or relations or tables) which are Union compatible.

P                                                                                 Q
ID                    NAME                                                            ID                    NAME
101                 Jones                                                 103                 Smith
103                 Smith                                                 104                 Loly
104                 Loly                                                     106                 Byron
107                 Even                                                   110                 Drew
110                 Drew
112                 Smith






UNION operation

Hence                            R=P U(Union)  Q
                                    ID                    Name 
                                    101                 Jones            
103                 Smith            
104                 Loly                
107                 Even                                                  
110                 Drew
112                 Smith
                                    106                 Byron

DIFFERENCE operation
Hence                            R=P-Q
                                    ID                    Name 
                                    101                 Jones            
107                 Even                                                  
112                 Smith
                                   

                                         R=Q-P
                                    ID                    Name
106                 Byron

INTERSECTION operation
     R=P           Q

ID                    Name
103                 Smith
104                 Loly
110                 Drew

CARTESIAN PRODUCT
P                                                                                 Q
ID                    NAME                                                            DEPTNAME
101                 Jones                                                 Accounts
103                 Smith                                                 Medical
104                 Loly                                                     IT
107                 Even                                                  

                                    R=P X Q
                        ID                    NAME                                    DEPTNAME
                        101                 Jones                         Accounts
                        101                 Jones                         Medical
101                 Jones                         IT
                        103                 Smith                         Accounts
103                 Smith                         Medical
103                 Smith                         IT
104                 Loly                             Accounts
104                 Loly                             Medical
104                 Loly                             IT
107                 Even                           Accounts
107                 Even                           Medical
107                 Even                           IT

Additional Relational Algebraic Operations
Besides the above 4 basic set operations, PROJECTION, SELECTION, JOIN and DIVISION.

Projection(Л)
The projection of a relation(table or set) is defined as a projection of all its tuples(rows) over some set of attributes, i.e., yields a vertical subset of the relation. The projection operation is used to either reduce the number of attributes in the relation under consideration or to reorder its attributes

PERSONAL                                                                          Projection
ID                    NAME                        AGE                            ID                    AGE
1                      A                     23                                1                      23
2                      B                     56                                2                      56
3                      C                     34                                3                      34                               
4                      D                     87                                4                      87
5                      E                     25                                5                      27

The projection operation can also be used to reduce the cardinality of the resultant relation, that is, due to deletion of duplicate rows, if any.

Selection(σ)
The selection operation yields a horizontal subset of a given relation, that is, it selects only some tuples(rows) of a given relation under consideration. To have a tuple(row) included in the resultant relation, the specified selection conditions or predicates must be satisfied.

PERSONAL                                                              Selection(Age<=50)
ID                    NAME                        AGE                            ID        NAME                        AGE
1                      A                     23                                1          A                     23
2                      B                     56                                3          C                     34
3                      C                     34                                5          E                     25                   
4                      D                     87                               
5                      E                     25                               






Join(∞)
It allows combining of two or more relations to form a single new relation. The tuples(rows) from the operand relations that participate in the join operation and contribute to the resultant relation are related. Thus, the join operation allows the processing of relationships existing between the operand relations.

DEPT                                                                                     EMP
Deptno           Dname                       Empno           Ename           Age     Deptno
10                    Accounts                   1                      A                     23        20
20                    Medical                      2                      B                     43        20
30                    IT                                 3                      C                     65        30
                                                            4                      D                     42        10
                                                            5                      E                     45        30
                                                            6                      F                      62        10
Considering the above two operand relations(tables), suppose we want to respond to the query “Get the Empno of all employees whose department name is Accounts”.

This requires first computing the Cartesian product of the DEPT and EMP relations. Let us name this Cartesian product as TEMP. This is followed by selecting those tuples(rows) of TEMP where the attribute Dname has the value “Accounts” and the value of the attribute Deptno of DEPT relation is equal to the value of the attribute Deptno of EMP relation. The required result is obtained by projecting these tuples on the attribute Empno. The operations are shown below :

Cartesian Product    :           TEMP=(DEPT      X      EMP)
Projection      Empno(σ Dname=’Accounts’ ^  DEPT.deptno=EMP.deptno(TEMP))

Division(+)
Consider the following relation(table) P:

            P
A                     B
a1                    b1
a1                    b2
a2                    b1
a3                    b1
a4                    b2
a5                    b1
a5                    b2





Scenario 1
Now, if the relation Q is
            Q
            B
            b1
            b2

Hence,                       R= P  ÷   Q
                                                A
                                                a1
                                                a5

Scenario 2
Now, if the relation Q is
            Q
            B
            b1

Hence,                       R= P  ÷   Q
                                                A
                                                a1
                                                a2
                                                a3
                                                a5

Scenario 3
Now, if the relation Q is
            Q
            B
            b1
            b2
            b3

Hence,                       R= P  ÷   Q
                                                A
                                                Empty

Scenario 4
Now, if the relation Q is
            Q
            B
Empty

Hence,                       R= P  ÷   Q
                                                A
                                                a1
                                                a2
                                                a3
                                                a4
                                                a5

In scenario 1, the result of dividing P by Q is the relation R and it has two tuples. For each tuple in R, its product with the tuples of Q must be in P. In our scenario, (a1,b1), (a1,b2),(a5,b1) and (a5,b2) must all be tuples in P.

In scenario 2, the Cartesian product of Q and R gives a resulting relation which is again a subset of P.

In scenario 3, since there are no tuples in P with a value b3 for the attribute B(that is selectionB=b3(P)=0), we have an empty relation R, which has a cardinality of zero

In scenario 4, the relation Q is empty. It is unusual to allow division by an empty relation. The resultant relation is projection of P on the attributes in P-Q.

Consider Q as set of properties. Each tuple in P represents an object with some given property. The resultant relation R is a set of entities/tuples that posses all of the properties specified in Q. The two entities a1 and a5 posses all the properties, b1 and b2. The other entities/tuples in P, a2, a3 and a4 only posses one, not both, of the properties.

The Division operation is useful when a query involves the phrase “for all the objects/tuples having all the specified properties”

Relational calculus simply provides a definition of a relation in terms of one or more other relations. In other words, it says what data is required, but not how it is actually retrieved. Hence its is non-procedural, as it is a query system wherein queries are expressed as variables and formulas on these variables.

Relation Operators

In  a relational operator is a  construct that tests some kind of relation between two entities . These include numerical equality (e.g., 5=5) and inequality (e.g., 4≥3). In programming languages that include a distinct boolean type in their type system, like java, these operators return true or false, depending on whether the conditional relationship between the two entities holds or not. In other languages such as C, relational operators return the integers 0 or 1.


Relational Operator                                     Symbol
EQ   Equal to                                                =
NE   Not equal to                                          <>  or #  or !=
GT   Greater than                             >
GE   Greater than or equal to                     >=
LT   Less than                                               <
LE   Less than or equal to                          <=

An  expression created using a relational operator forms what is known as a relational expression or a condition.

Database Design / Application Lifecycle
Before a database design is set into motion, planning is an essential stage. This is typically the responsibility of management within the organisation. Good planning will enable the design and implementation to be successfully completed with the desired results. A general definition of the desired system should also be drawn up at this stage. This would include information such as what the database application should be able to do, to what areas it will be applied, and who will be using it.

Three main factors that should be analysed in the planning are
What work will need to be done
The resources needed to complete the work
      How much it will all cost

Requirements Analysis and Collection
Information can be gathered using various techniques. Some of these are listed below:
Interviewing individuals
Observation
Examining documents
Using questionnaires
Using expert knowledge and experience from other design work

Interviewing many individuals can be time consuming but can result in quality feedback which can be used in the system design. Using questionnaires is a quicker and easier way to gain feedback from relevant people. It is hard to ensure that the answers people give are accurate though. Looking at the current documents and paperwork in circulation can be useful in determining Database Design

The database design stage will result in a database model that will hopefully support the organization’s goals. The two design approaches that can be used are Bottom Up or Top Down. The Top Down approach starts with a very general overview of the system and details are added in an iterative manner. Bottom Up design involves getting all the details first then constructing the overall design from the smaller more detailed parts.

The aim of a database design should be to represent all relevant data and the relationships that exist between data items. The database model should also allow for all necessary transactions on the data to be possible.

The following stages follow the database design stage:
DBMS Selection
Application Design
Prototyping
Implementation
Data Conversion and Loading (if data from an existing system needs to be put into the new database)
Testing
Operational Maintenance (follows the installation of the system)