Hashing
Hashing
What is Hashing
- Hashing is a method of sorting and indexing data. The idea behind hasing is to allow large amounts of data to be indexed using keys commonly created by formulas.
- 해시함수란 데이터의 효율적 관리를 목적으로 임의의 데이터를 고정된 길이의 데이터로 매핑하는 함수이다. 매핑 전 원래 데이터의 값을 키(key), 매핑 후 데이터의 값을 해시값(hash value), 매핑하는 과정 자체를 해싱(hashing) 이라고 한다.
Why we need Hashing
- Time efficient
Terminologies
Hash Function : A hash function is any function that can be used to map data of arbitrary size to data of fixed size.
Key : Input data given by user
Hash Value : The values returned by a hash function are called hash values, hash codes, digests, or simply hashes.
Hash Tables : It is a data structure which implements an associative array abstract data type, a structure that can map keys to values.
Collision : A collision occurs when two different key to a hash function produce the same output called hash values.
Characteristics of good Hash function
- It distributes hash values uniformly across the hash table.
- The hash function uses all the input data.
Collision Resolution Techniques

Direct Chaining
- Implements the buckets as linked lists. Colliding elements are stored in these lists.
- Implements the buckets as linked lists. Colliding elements are stored in these lists.
Open Addressing
- Colliding elements are stored in other vacant buckets. During storage and lookup, there are found through so called “probing”
Linear Probing :
- Linear probing is a strategy for resolving collisions by replacing the new key into the closest following empty cell.
- Linear probing is a strategy for resolving collisions by replacing the new key into the closest following empty cell.
Quadratic Probing :
- Qudratic probing operates by taking the original hash index and adding successive values of an arbitrary quadratic polynomial until an open slot is found.
- Qudratic probing operates by taking the original hash index and adding successive values of an arbitrary quadratic polynomial until an open slot is found.
Double Hashing :
- Interval between probes is computed by another hash function.
- Interval between probes is computed by another hash function.
- Colliding elements are stored in other vacant buckets. During storage and lookup, there are found through so called “probing”
What happens when Hash Table is full?
Direct Chaining
- This situation will never arise.
Open Addressing
- Need to create 2x size of current table and redo Hashing for existing keys.
- Need to create 2x size of current table and redo Hashing for existing keys.
Pros & Cons of Collision Resolution Technique
Direct Chaining
- No fear of exhausting Hash Table buckets.
- Fear of big Linked Lists(can effect performance big time).
Open Addressing
- Easy implementation
- Fear of exhasuting Hash Table buckets.
If input size is known then always use “Open Addressing”, else can use any of the two.
If Deletion is very high, then we should always go for ‘Direct Chaining’. Because when we delet a lot on ‘Open Addressing’, there’s gonna have lots of Hole and it will make problem. We can do “restruction”, but it’s not that efficient way.
Practical Use of Hashing
- Password Verification
- File System : File path is mapped to physical location on disk.
Pros & Cons of Hashing
Pros
- On an average Insertion/Deletion/Search operation takes O(1) time.
Cons
- In the worst case Insertion/Deletion/Search might take O(n) time(when hash function is not good enough)
출처 : "Data Structures & Algorithms" by DS GUY