Skip to main content

Hashed files | DBMS - Computer science fundamentals tutorial

 Hashed Files
l Hashing for disk files is called External Hashing

l  The file blocks are divided into M equal-sized buckets, numbered bucket0, bucket1, ...,bucket M-1. Typically, a bucket corresponds to one (or a fixed number of) disk block.

l  One of the file fields is designated to be the hash key of the file.

l  The record with hash key value K is stored in bucket i, where i=h(K), and h is the hashing function.

l  Search is very efficient on the hash key.

l  Collisions occur when a new record hashes to a bucket that is already full. An overflow file is kept for storing such records. Overflow records that hash to each bucket can be linked together.

There are numerous methods for collision resolution, including the following:

l  Open addressing: Proceeding from the occupied position specified by the hash address, the program checks the subsequent positions in order until an unused (empty) position is found.

l  Chaining: For this method, various overflow locations are kept, usually by extending the array with a number of overflow positions. In addition, a pointer field is added to each record location. A collision is resolved by placing the new record in an unused overflow location and setting the pointer of the occupied hash address location to the address of that overflow location.

l  Multiple hashing: The program applies a second hash function if the first results in a collision. If another collision results, the program uses open addressing or applies a third hash function and then uses open addressing if necessary.
Hashed Files

                                                              Figure 1 Hashed Files

l  To reduce overflow records, a hash file is typically kept 70-80% full.

l  The hash function h should distribute the records uniformly among the buckets; otherwise, search time will be increased because many overflow records will exist.

l  Main disadvantages of static external hashing:

         - Fixed number of buckets M is a problem if the number of records in the file grows or shrinks.

         - Ordered access on the hash key is quite inefficient (requires  sorting the records).

Hashed Files - Overflow handling
                                                   Figure 2   Hashed Files - Overflow handling

Dynamic And Extendible Hashed Files

   Dynamic and Extendible Hashing Techniques

l  Hashing techniques are adapted to allow the dynamic growth and shrinking of the number of file records.

l  These techniques include the following: dynamic hashing , extendible hashing , and linear hashing .

l  Both dynamic and extendible hashing use the binary representation  of the hash value h(K) in order to access a directory. In dynamic hashing the directory is a binary tree. In extendible hashing the directory is an array of size 2d where d is called the global depth.

l  The directories can be stored on disk, and they expand or shrink dynamically. Directory entries point to the disk blocks that contain the stored records.

l  An insertion in a disk block that is full causes the block to split into two blocks and the records are redistributed among the two blocks. The directory is updated appropriately.

l  Dynamic and extendible hashing do not require an overflow area.

l  Linear hashing does require an overflow area but does not use a directory. Blocks are split in linear order  as the file expands.

                                                                   Figure 3 
 Extendible Hashing
 Extendible Hashing

Comments

Popular posts from this blog

Minterm and maxterm in Digital logic design - Computer Science fundamentals tutorial

Minterm and Maxterm First thing to know before we proceed   towards   what is ‘minterm’ and ‘maxterm’ we have to know the sum of product and product of sum. Sum of product: - The logical sum of two or more logical product term is called sum of products expression. It is basically an OR operation of AND operated variables such as Y = AB+BC+ABC Product of Sum: - The logical product of two or more logical sum term is called product of sums expressions. It is basically an AND operation of OR operated variables such as Y = (A+B).(B+C).(A+B+C) Minterm: - Product term containing all the k variables of the functions is either complimented or uncomplimented form is Minterm. Canonical form of sum of product: - It is defined as the logical sum of all the minterms derived from the rows of a truth table for which value of the function is 1. It is called a minterm canonical form. The canonical sum of product expression can be given in a compact form by lis...

Solve-write C program to find grade of student by using nested else-if statement

C program to find grade of student by using nested else-if statement Problem Description This program take input as your number then show your grade. Problem Solution 1. enter your marks as input. 2. then check your marks with 'If' block's condition. if it satisfied then show your grade as output. 3. if it not satisfied then it checks with all else-if block's condition repeatedly. 4. print the grade according to your given marks as input and exit. Program codes:-   #include<stdio.h> main() { int n; printf("\n enter the marks:"); scanf("%d",&n); if(n>89) printf("O"); else if(n>79) printf("E"); else if(n>69) printf("A"); else if(n>59) printf("B"); else if(n>49) printf("C"); else if(n>39) printf("D"); else printf("F"); } Program explanation:- 1. enter your marks. for example we take 70 . ...

Important MCQ of RDBMS( Relational database management system)-FCST

Important MCQ of RDBMS  1. A RDBMS consists a collection of ? a. Tables b. Fields c. Records d. Keys  ANS/- a. table 2. The term attribute refers to a ___________ of a table a. Record b. Tuple c. Column d. Key   ans/- c. Column 3. In relational model, the row of table is known to be ?  a. Relation b. Entity field c. Tuple d. Attribute  ans/- C. Tuple 4. . Address field of a person should not be part of primary key, since it is likely to ? a. Dependent b. Too long c. Changed d. Not changed  ans/- c. Changed 5. The relational model is concerned with ? a. Data structure and Data integrity b. Data Manipulation c. Both a and b d. None of these  ans/- c. Both a and b 6. Which is the false statement from the following ? a. A veiw is a named derived table b. A name relation is variable c. A veiw is a named reation and is virtual d. None of these  ans/- d. None of these 7. The union of primary key...