Hash Table

#include "libstephen/ht.h"

The hash table associates keys with values. It uses a “hash” to determine the location of a key inside an array, and uses that to achieve nearly constant time lookup. A hash table needs enough memory to ensure that the table is about half empty, in order to work properly. So, the table basically trades off memory for access speed.

Infrastructure

Hash tables need a good deal more infrastructure than a simple list needs. A list can store values without knowing or caring about their type. On the other hand, a hash table needs to know the data type of its keys for these reasons:

  • Hashing! A hash function takes a key and produces a number, which can hopefully be used to index into the table. You need to know the data type in order to come up with a good hash table.
  • Equality testing! A hash function is bound to have collisions. So, a hash table needs to be able to check whether keys are equal, and that means knowing the type of the keys.

To handle this, the hash table takes function pointers of two different types. The first is called HASH_FUNCTION:

typedef unsigned int (*HASH_FUNCTION)(DATA toHash);
unsigned int my_hash(DATA toHash) {
  //compute hash value
  return hash_value;
}

That typedef is a bit cryptic, but it is a function pointer to a function that takes a DATA and returns an unsigned int. The function below that is a hash function that implements the HASH_FUNCTION signature. The only hash functions in libstephen are ht_string_hash for strings.

The other type of function pointer that is defined is DATA_COMPARE:

typedef int (*DATA_COMPARE)(DATA d1, DATA d2);
int compare_int(DATA d1, DATA d2)
{
  return d1.data_llint - d2.data_llint;
}

This function compares two DATA. It returns 0 if they are equal. It returns > 0 if d1 > d2, and < 0 if d1 < d2. These can also be used for sorting, if I decide to implement it. Comparisons are implemented for integers, doubles, pointer values, and strings.

One other function pointer is declared, in order to make it easier for users to free values from the table. If you wanted to remove a (key, value) pair from the table, where the value was a pointer to a data structure, chances are you’d need to free it first. So, you’d end up looking up the value, freeing it, then removing it from the table. Instead, I’ve made a function pointer type that will act on a DATA, allowing you to specify an action to happen before removing an item from the table!

typedef void (*DATA_ACTION)(DATA d);
void freer(DATA d)
{
  free(d.data_ptr);
}

Operations

With those function pointer types in hand, we can do a lot of fun hash table stuff. Here is a list of operations, straight from the header:

void ht_init(smb_ht *pTable, HASH_FUNCTION hash_func, DATA_COMPARE equal);
smb_ht *ht_create(HASH_FUNCTION hash_func, DATA_COMPARE equal);
void ht_destroy_act(smb_ht *pTable, DATA_ACTION deleter);
void ht_destroy(smb_ht *pTable);
void ht_delete_act(smb_ht *pTable, DATA_ACTION deleter);
void ht_delete(smb_ht *pTable);

void ht_insert(smb_ht *pTable, DATA dKey, DATA dValue);
void ht_remove_act(smb_ht *pTable, DATA dKey, DATA_ACTION deleter,
                   smb_status *status);
void ht_remove(smb_ht *pTable, DATA dKey, smb_status *status);
DATA ht_get(smb_ht const *pTable, DATA dKey, smb_status *status);
void ht_print(smb_ht const *pTable, int full_mode);

I’ve included the init/create/delete/destroy functions, because the delete and destroy ones have _act variants that apply an action to every value in the table, before destroying the table.

We have insertion, removal, and retrieval. There’s also a printing function, which is really more of a debugging print function. It reveals some of the structure of the hash table, which is good for debugging.

Sample Usage

Here’s an example of a using a hash table:

smb_status status = SMB_SUCCESS;
smb_ht *ht = ht_create(&ht_string_hash, &data_compare_string);

char *key = "stephen";
char *value = "brennan";
ht_insert(ht, PTR(key), PTR(value));
brennan = ht_get(ht, PTR(key), &status).data_ptr;
assert(status == SMB_SUCCESS);

printf("%s: %s\n", key, value);
//STDOUT: stephen: brennan

Structure

The hash table structure I use could definitely be improved, and I may improve it in the future. Currently, it uses the simplest possible method for resolving collisions: chaining. Basically, each bucket is a linked list. In the future, I might change to quadratic probing. Anyhow, here is the hash table itself:

typedef struct smb_ht
{
  int length;
  int allocated;
  HASH_FUNCTION hash;
  DATA_COMPARE equal;
  struct smb_ht_bckt **table;
} smb_ht;

And here is the structure of the hash table bucket:

typedef struct smb_ht_bckt
{
  DATA key;
  DATA value;
  struct smb_ht_bckt *next;
} smb_ht_bckt;