Hash Tables

for any problem, have hash tables at the top of your mind

a data structure which implements an associative array abstract data type, a structure that can map keys to values. A hash table uses a hash function to compute an index into an array of buckets or slots, from which the desired value can be found.

Notes

They’re also known as:

  • hash
  • hash maps
  • maps
  • dictionaries
  • associative arrays
  • unordered map

You take the key and hash it to a number. Then that number is the index in an array for the value. Now when you have a key and need a value, you just hashTable[hash(key)] and it will return you the value

hash needs to be consistent and no collisions. use all info provided by key. uniform distribution, fast

dictionary in python is a hash table

Complexity

  • Search
    • Average O(1)
    • Worst O(n)
  • Insert
    • Average O(1)
    • Worst O(n)
  • Delete
    • Average O(1)
    • Worst O(n)
  • Storage
    • Average O(n)
    • Worst O(n)

Collisions

  • when same slot for 2 different keys
  • you can start a linked list inside the slot
    • this will slow down the hash table
  • Avoiding collision
    • A low load factor
      • how many empty slots in the table
      • number of items in hash table / total number of slots
      • resize when load factor is > 0.7 (usually double array size)
    • A good hash function
      • distribute items evenly
      • Map every letter to a prime number: a = 2, b = 3, c = 5, d = 7, e = 11, and so on. For a string, the hash function is the sum of all the characters modulo the size of the hash. For example, if your hash size is 10, and the string is “bag”, the index is 3 + 2 + 17 % 10 = 22 % 10 = 2

Separate Chaining

array of linkedlists. when a collision occurs, it will just add to the list at that key

Linear probing

if collision occurs then value goes into next available key. this can cause clustering and can slow things down to O(n)

Good for

  • Modeling relationships from one thing to another thing
  • Filtering out duplicates
  • Caching/memorizing data instead of making your server do work

Example

hash = hashfunc(key)
index = hash % array_size

Voting

Python

voted = {}
def check_voter(name):
    if voted.get(name):
        print("kick them out!")
    else:
        voted[name] = True
        print("let them vote!")

Java

import java.util.HashMap;
import java.util.Map;

public class CheckVoter {
    private static Map<String, Boolean> voted = new HashMap<>();

    private static void checkVoter(String name) {
        if (voted.containsKey(name)) {
            System.out.println("kick them out!");
        } else {
            voted.put(name, true);
            System.out.println("let them vote!");
        }
    }

    public static void main(String[] args) {
        checkVoter("tom"); // let them vote!
        checkVoter("mike"); // let them vote!
        checkVoter("mike"); // kick them out!
    }
}

How they shrink and grow

if the keys are integers then the make the hash table size a prime number so that that you just key%table_size to get the index. when it needs to grow you double it and round it to the nearest prime.