Hashed Files
l Hashing for disk files is called External Hashing
l The file blocks are divided into M equal-sized buckets, numbered bucket0, bucket1, ...,bucket M-1. Typically, a bucket corresponds to one (or a fixed number of) disk block.
l One of the file fields is designated to be the hash key of the file.
l The record with hash key value K is stored in bucket i, where i=h(K), and h is the hashing function.
l Search is very efficient on the hash key.
l Collisions occur when a new record hashes to a bucket that is already full. An overflow file is kept for storing such records. Overflow records that hash to each bucket can be linked together.
There are numerous methods for collision resolution, including the following:
l Open addressing: Proceeding from the occupied position specified by the hash address, the program checks the subsequent positions in order until an unused (empty) position is found.
l Chaining: For this method, various overflow locations are kept, usually by extending the array with a number of overflow positions. In addition, a pointer field is added to each record location. A collision is resolved by placing the new record in an unused overflow location and setting the pointer of the occupied hash address location to the address of that overflow location.
l Multiple hashing: The program applies a second hash function if the first results in a collision. If another collision results, the program uses open addressing or applies a third hash function and then uses open addressing if necessary.
Figure 1 Hashed Files
l To reduce overflow records, a hash file is typically kept 70-80% full.
l The hash function h should distribute the records uniformly among the buckets; otherwise, search time will be increased because many overflow records will exist.
l Main disadvantages of static external hashing:
- Fixed number of buckets M is a problem if the number of records in the file grows or shrinks.
- Ordered access on the hash key is quite inefficient (requires sorting the records).
Figure 2 Hashed Files - Overflow handling
Dynamic And Extendible Hashed Files
Dynamic and Extendible Hashing Techniques
l Hashing techniques are adapted to allow the dynamic growth and shrinking of the number of file records.
l These techniques include the following: dynamic hashing , extendible hashing , and linear hashing .
l Both dynamic and extendible hashing use the binary representation of the hash value h(K) in order to access a directory. In dynamic hashing the directory is a binary tree. In extendible hashing the directory is an array of size 2d where d is called the global depth.
l The directories can be stored on disk, and they expand or shrink dynamically. Directory entries point to the disk blocks that contain the stored records.
l An insertion in a disk block that is full causes the block to split into two blocks and the records are redistributed among the two blocks. The directory is updated appropriately.
l Dynamic and extendible hashing do not require an overflow area.
l Linear hashing does require an overflow area but does not use a directory. Blocks are split in linear order as the file expands.
Comments
Post a Comment
please subscribe my blog and let me suggest how I improve this site