MCS-023 : INTRODUCTION TO DATABASE MANAGEMENT SYSTEMS December, 2016
Q1.a) Compute the closure of the following set F of functional dependencies for relation schema R = {A, B, C, D, E}.
A -> BC
CD -> E
B -> D
E -> A
List the candidate keys for R.
Answer:
A -> BC, B -> D so A -> D so A -> DC -> E
therefore A -> ABCDE
E -> A, A -> ABCDE, so E -> ABCDE
CD -> E, so CD -> ABCDE
B -> D, BC -> CD, so BC -> ABCDE
Attribute closure:
A -> ABCDE
B -> BD
C -> C
D -> D
E -> ABCDE
AB -> ABCDE
AC -> ABCDE
AD -> ABCDE
AE -> ABCDE
BC -> ABCDE
BD -> BD
BE -> ABCDE
CD -> ABCDE
CE -> ABCDE
DE -> ABCDE
ABC -> ABCDE
ABD -> ABCDE
ABE -> ABCDE
ACD -> ABCDE
ACE -> ABCDE
ADE -> ABCDE
BCD -> ABCDE
BDE -> ABCDE
CDE -> ABCDE
ABCD -> ABCDE
ABCE -> ABCDE
ABDE -> ABCDE
ACDE -> ABCDE
BCDE -> ABCDE
The candidate keys are A, E, CD, and BC
(b) Justify the following statements :
(i) Relation must have a key.
(ii) Weak entities do not have their own key attributes.
Ans. I) As its name suggests, it is the primary key of reference for the table and is used throughout the database to help establish relationships with other tables. As with any candidate key the primary key must contain unique values, must never be null and uniquely identify each record in the table
ii)In a relational database, a weak entity is an entity that cannot be uniquely identified by its attributes alone; therefore, it must use a foreign key in conjunction with its attributes to create a primary key. The foreign key is typically a primary key of an entity it is related to
(C)Compare primary, secondary and clustering indexes. Which of these indexes are dense and which are not ? How is implementation of clustering indexes performed ?
Ans.Similarities
· Both the index structures are implemented as separate first class objects in the database. This implies that table and its corresponding index (primary or secondary) exist as two separate structures.
· Both implement a level of indirection where the queries first lookup the index, and use the result of this lookup to directly fetch the record it is pointing to.
· Index blocks in both types of indexes keep the entries sorted ; i.e whatever the actual index entry is → typically a , the entries in the index block are always sorted on the index/search key.
Differences
Primary Index
· A primary index impacts the storage and organization of rows in data blocks. Data blocks are disk blocks that store the actual row data (multiple columns).
· Primary Index requires the rows in data blocks to be ordered on the index key.
· So apart from the index entries being themselves sorted in the index block, primary index also enforces an ordering of rows in the data blocks.
· The below diagram borrowed from “Database System Implementation by Garcia Molina et al” shows how the index entries in the index blocks (left side) contains pointers (these are row locators in database terminology) to corresponding rows in data blocks (right side). Each data block has rows in sorted order as per the index key.
<!–[if mso & !supportInlineShapes & supportFields]> SHAPE \* MERGEFORMAT <![endif]–><!–[if mso & !supportInlineShapes & supportFields]> <![endif]–>
· Primary Index can be created on both key and non-key columns. There is no such thing as primary index is meant only for primary key. But, yes usually it is created on the primary key of table.
· Since primary index alters the way (needs to keep the rows sorted) data is organized in a table, there can be at most 1 primary index on a given table.
Secondary Index
· Secondary Index does not have any impact on how the rows are actually organized in data blocks.
· They can be in any order. The only ordering is w.r.t the index key in index blocks.
· The below diagram borrowed from “Database System Implementation by Garcia Molina et al” shows how the index entries in the index blocks (left side) contains pointers (these are row locators in database terminology) to corresponding rows in data blocks (right side). The data blocks do not have rows sorted on the index key
<!–[if mso & !supportInlineShapes & supportFields]> SHAPE \* MERGEFORMAT <![endif]–><!–[if mso & !supportInlineShapes & supportFields]> <![endif]–>
Comparison
· Firstly, the user can define multiple secondary indexes because they have no impact on the organization of rows in table. But there can be only 1 primary index.
· Because primary index does not necessarily have to be on primary key, primary index can also have duplicate index keys. In fact the example shown above for secondary index is for duplicate keys. This is common to both.
o Obviously if the primary index is created on primary key, then there can’t be duplicate index keys since primary key enforces a UNIQUE CONSTRAINT.
· Both primary and secondary indexes can be used for point lookups and range queries. But range queries are expected to be faster for primary index in both cases — unique index key and duplicate index keys. Point lookups in case of NON UNIQUE index is expected to be faster with a primary index. But if the index is UNIQUE the point lookup with primary and secondary index should ideally take the same amount of time — at least same amount of I/O
o The reason is primary index forces the rows in data blocks to be ordered. So if the user is interested in finding rows for WHERE KEY>=20 AND KEY<=40, there are high chances of reading fewer disk blocks and thus less I/O. It could be possible that rows corresponding to these keys are in the same data block (even if the index is not unique).
o On the other hand, secondary index does not have any control over the organization of rows. So for the same example of finding all records between 20 and 40, it is possible that record(s) corresponding to each key is sitting in its own data block. This obviously implies that there will be more I/O and thus queries can be less efficient with secondary index.
o For point lookups with unique index keys, it does not really matter whether the index is primary or secondary. The result of lookup will be a row locator and the database has to anyways follow it to get to actual record. Thus a single I/O will be there.
· Because primary index forces the rows to be ordered in data blocks, DMLs will be less efficient. Because DMLs need to keep rows in sorted order within a data block, INSERT/UPDATE will cause frequent row movement unless the user takes care that inserts are done in a sequential order.
o Any DML that results in a row movement of a row within a data block will also require an update to corresponding primary index structure since the index entry now needs to be updated with new row locator (because the row was moved).
o Row movement will require an update to secondary index structure as well, but the probability of an INSERT causing a row movement is relatively low for secondary index since the INSERT does not need to keep rows in sorted order within a data block. Cases like row growing in size and thus needs to be moved to all together a different block are one of the few cases of row movement in secondary index.
(d) Consider the following relations : A : Pid PName B: 001 abc 002 cde 011 efg 014 ghi 015 ijk 016 klm Pid PNarne 002 cde 011 efg 015 ijk 016 Um Find the following : 10 (i) A U B (ii) A — B (iii) A n B (iv) A x B
Ans. BLOCK 1 PAGE NO.31
(e) Explain briefly about Data Replication. Give its disadvantages.
Ans. Database replication is the frequent electronic copying data from a database in one computer or server to a database in another so that all users share the same level of information. The result is a distributed database in which users can access data relevant to their tasks without interfering with the work of others. The implementation of database replication for the purpose of eliminating data ambiguity or inconsistency among users is known as normalization.
There are following advantages of replication:
Availability
If one of the sites containing relation R fails, then the relation R can be obtained from another site. Thus, queries (involving relation R) can be continued to be processed in spite of the failure of one site.
Increased parallelism
The sites containing relation R can process queries (involving relation R) in parallel This leads to faster query execution.
Less Data Movement over Network
The more replicas of, a relation are there, the greater are the chances that the required data is found where the transaction is executing. Hence, data replication reduces movement of data among sites and. increases .speed of processing.
Disadvantages of Data Replication
There are following disadvantages of replication:
Increased overhead on update.
When an updation is required, a database system must ensure that all replicas are updated.
Require more disk space
Storing replicas of same data at different sites consumes more disk space.
Expensive
Concurrency control and recovery techniques will be more advanced and hence more expensive.
In general, replication enhances the performance of read operations and increases the availability of data to read-only transactions. However, update transactions incur greater overhead. Controlling concurrent updates by several translations to replicated data is more complex than is using the centralized approach to concurrency control. We can simplify the management of replicas of relation r by choosing one of them as the primary copy of r. For example, in a banking system, an account can be associated the site in which the account has been opened. Similarly, in an airline-reservation system, a flight can be associated with the site at which the flight originates.
Question2 b: Consider the two sets F and G with their FDs as below
: F : G: A → C A → CD AC → D E → AH E → AD E → H
Check whether two sets are equivalent or not.
Solution :
Step 1 : Take Set F and Check G is covered from F or not.
(A)+ = {ACD}
(E)+ = {EADHC}
Hence, both A → CD and E → AH are covered.
⇒ G is derived from F. Hence G is covered by F.
⇒ F ⊇ G . ....(1)
Step 2 : Take Set G and Check F is covered from G or not.
(A)+ = {ACD}
(AC)+ = {ACD}
(E)+ = {EAHCD}
Hence F = {A → C, AC → D, E → AD, E → H} is covered.
⇒ F is derived from G. Hence F is covered from G.
⇒ G ⊇ F. ....(2)
From (1) and (2), F and G are equivalent.
3 (b)Discuss the wait-die and wound-wait protocols for deadlock prevention
Ans. Wait-die scheme: It is a non-preemptive technique for deadlock prevention. When transaction Ti requests a data item currently held by Tj, Ti is allowed to wait only if it has a timestamp smaller than that of Tj (That is Ti is older than Tj), otherwise Ti is rolled back (dies).
In this scheme, if a transaction requests to lock a resource (data item), which is already held with a conflicting lock by another transaction, then one of the two possibilities may occur −
(1) If TS(Ti) < TS(Tj) − that is Ti, which is requesting a conflicting lock, is older than Tj − then Ti is allowed to wait until the data-item is available.
(2) If TS(Ti) > TS(tj) − that is Ti is younger than Tj − then Ti dies. Ti is restarted later with a random delay but with the same timestamp.
This scheme allows the older transaction to wait but kills the younger one.
For example:
Suppose that transaction T22, T23, T24 have time-stamps 5, 10 and 15 respectively. If T22requests a data item held by T23 then T22 will wait. If T24 requests a data item held by T23, then T24 will be rolled back.
Wound-wait scheme: It is a preemptive technique for deadlock prevention. It is a counterpart to the wait-die scheme. When Transaction Ti requests a data item currently held by Tj, Ti is allowed to wait only if it has a timestamp larger than that of Tj, otherwise Tj is rolled back (Tj is wounded by Ti).
In this scheme, if a transaction requests to lock a resource (data item), which is already held with conflicting lock by some another transaction, one of the two possibilities may occur −
(1) If TS(Ti) < TS(Tj), then Ti forces Tj to be rolled back − that is Ti wounds Tj. Tj is restarted later with a random delay but with the same timestamp.
(2) If TS(Ti) > TS(Tj), then Ti is forced to wait until the resource is available.
This scheme, allows the younger transaction to wait; but when an older transaction requests an item held by a younger one, the older transaction forces the younger one to abort and release the item.
For example:
Suppose that Transactions T22, T23, T24 have time-stamps 5, 10 and 15 respectively . If T22requests a data item held by T23, then data item will be preempted from T23 and T23 will be rolled back. If T24 requests a data item held by T23, then T24 will wait.
In both the cases, the transaction that enters the system at a later stage is aborted.
3(c) Distinguish between deferred update and immediate update log based recovery techniques.
Ans.
4(b) Discuss the following relational constraints :
(i) Domain
Each table has certain set of columns and each column allows a same type of data, based on its data type. The column does not accept values of any other data type.
Definition: Domain constraints are user defined data type and we can define them like this:
Domain Constraint = data type + Constraints (NOT NULL / UNIQUE / PRIMARY KEY / FOREIGN KEY / CHECK / DEFAULT)
Domain Constraint = data type + Constraints (NOT NULL / UNIQUE / PRIMARY KEY / FOREIGN KEY / CHECK / DEFAULT)
(ii) Entity
Entity integrity is a basic constraint of database relational model (abbreviated RM) that refers to the morphology of the primary key but afterwards, the same format is applied to the foreign key and, also to any of simple components of any of two.[1] The relational model states that every relation (or table) must have an identifier, called the primary key (abbreviated PK), in such a way that every row of the same relation be identifiable by its content, that is, by a unique and minimal value. The PK is a not empty set of attributes (or columns). The same format applies to the foreign key (abbreviated FK) because each FK matches a preexistent PK. Each of attributes being part of a PK (or of a FK) must have data values (such as numbers, letters or typographic symbols) but not data marks (also known as NULL marks in SQL world). Morphologically, a composite primary key is in a “steady state”: If it is reduced, PK will lose its property of identifying every row of its relation but if it is extended, PK will be redundant. Then, primary key is irreducible and inextendable.[2]
(iii) Referential Integrity
Referential integrity is a property of data which, when satisfied, requires every value of one attribute (column) of a relation (table) to exist as a value of another attribute (column) in a different (or the same) relation (table).[1]
For referential integrity to hold in a relational database, any column in a base table that is declared a foreign key can contain either a null value, or only values from a parent table’s primary key or a candidate key.[2] In other words, when a foreign key value is used it must reference a valid, existing primary key in the parent table. For instance, deleting a record that contains a value referred to by a foreign key in another table would break referential integrity. Some relational database management systems (RDBMS) can enforce referential integrity, normally either by deleting the foreign key rows as well to maintain integrity, or by returning an error and not performing the delete. Which method is used may be determined by a referential integrity constraint defined in a data dictionary.
The adjective ‘referential’ describes the action that a foreign key performs, ‘referring’ to a link column in another table. In simple terms, ‘referential integrity’ is a guarantee that the target it ‘refers’ to will be found. A lack of referential integrity in a database can lead relational databases to return incomplete data, usually with no indication of an error.
(iv) Key Constraint
There are three types of key constraints that are most common.
Primary Key constraint
The PRIMARY KEY constraint uniquely identifies each record in a database table.
Primary keys must contain UNIQUE values, and cannot contain NULL values.
A table can have only one primary key, which may consist of single or multiple fields.
Foreign Key constraint
A FOREIGN KEY is a key used to link two tables together.
A FOREIGN KEY is a field (or collection of fields) in one table that refers to the PRIMARY KEY in another table.
The table containing the foreign key is called the child table, and the table containing the candidate key is called the referenced or parent table.
Unique Key constraint
The UNIQUE constraint ensures that all values in a column are different.
Both the UNIQUE and PRIMARY KEY constraints provide a guarantee for uniqueness for a column or set of columns.
A PRIMARY KEY constraint automatically has a UNIQUE constraint.
However, you can have many UNIQUE constraints per table, but only one PRIMARY KEY constraint per table.
5(b) Write short notes on the following :
(i) Web Databases A web database is a wide term for managing data online. Aweb database gives you the ability to build your owndatabases/data storage without you being a database guru or even a technical person. FormLogix web database solution will help you to easily create database driven web pages and host your database online.
(ii) Distributed Databases A distributed database is a database in which portions of the database are stored in multiple physical locations and processing is distributedamong multiple database nodes
(iii) Shadow Paging
In computer science, shadow paging is a technique for providing atomicity and durability (two of the ACID properties) in database systems. A page in this context refers to a unit of physical storage (probably on a hard disk), typically of the order of 210 to 216 bytes.