Skip to main content
comment on load factor
Source Link
greybeard
  • 7.8k
  • 3
  • 21
  • 56

With hindsight, I think it would have merit to proceed in stages:
a) special case K == 1, no spills (just grow table & rehash) - private?
b) special case K == 1
c) general K (see question)
d) elaborate "free" management for spill? May well prove not worth it

  1. special case K == 1, no spills (just grow table & rehash) - private?
  2. special case K == 1
  3. general K (see question)
  4. elaborate "free" management for spill? May well prove not worth it

To have spelled out in an answer what indy's comment highlights:
Keeping Ts instead of (shared(?)) pointers(/references?) uses a lot of space, especially where items are significantly bigger than their keys. And it initialises a lot of Ts.
Having table size exceed nominal capacity is conventional, their quotient designated load factor. M looks the latter's reciprocal, but even 2 is a lot: conventionally, a floating point value is used here. Likewise, a spill area the size of nominal capacity is anything but shy.


With hindsight, I think it would have merit to proceed in stages:

  1. special case K == 1, no spills (just grow table & rehash) - private?
  2. special case K == 1
  3. general K (see question)
  4. elaborate "free" management for spill? May well prove not worth it

With hindsight, I think it would have merit to proceed in stages:
a) special case K == 1, no spills (just grow table & rehash) - private?
b) special case K == 1
c) general K (see question)
d) elaborate "free" management for spill? May well prove not worth it

To have spelled out in an answer what indy's comment highlights:
Keeping Ts instead of (shared(?)) pointers(/references?) uses a lot of space, especially where items are significantly bigger than their keys. And it initialises a lot of Ts.
Having table size exceed nominal capacity is conventional, their quotient designated load factor. M looks the latter's reciprocal, but even 2 is a lot: conventionally, a floating point value is used here. Likewise, a spill area the size of nominal capacity is anything but shy.


added 5 characters in body
Source Link
greybeard
  • 7.8k
  • 3
  • 21
  • 56

To further reduce searching code, I shy from consolidating bucket & spill.
So how about a consolidated function?
I tried to code it (bottom of post - I find myself way out of my depth with C++, more so with modern C++.)

To further reduce searching code, I shy from consolidating bucket & spill.
So how about a consolidated function?
I tried to code it (bottom of post - I find myself way out of my depth with C++, more so modern C++.)

To further reduce searching code, I shy from consolidating bucket & spill.
So how about a consolidated function?
I tried to code it (bottom of post - I find myself way out of my depth with C++, more so with modern C++.)

English
Source Link
greybeard
  • 7.8k
  • 3
  • 21
  • 56
// just hacked in to eyeball feasibility
    /// search for an item.
    /// return is_contained when dont_insert;
    /// otherwise false if is_contained,
    /// else insert & return true
    bool checkInsertIf(const T& item, bool dont_insert) {
        auto thisHash = myHash(item);
        size_t index = thisHash % nBuckets();
        auto& bucket = buckets[index];
        auto bucketSize = bucket.size;
        auto& values = bucket.values;
        auto range = std::min(bucketSize, K);
        auto begin = values.begin(),
             beyond = begin + range,
             found = std::find(begin, beyond, item);
        
        if (found != beyond)  // item found in bucket
            return dont_insert;

        if (bucketSize <= K) {
            if (!dont_insert && bucketSize < K)
                values[bucketSize] = item;
        }
        else /* (K < bucketSize): handle spilled */ {
            begin = spill.begin() + bucket.spillIndex;
            range = bucketSize - K;
            beyond = begin + range;  // given references this calls for factoring out
            found = std::find(begin, beyond, item);
            if (found != beyond)  // item found in spill
                return dont_insert;
        }
        // not found nowhereanywhere
        if (dont_insert)
            return false;
        if (K <= bucketSize) {
            reserveBucketSpill(index, bucketSize + 1);
            spill[bucket.spillIndex + bucketSize - K] = item;
        }
        bucket.size = bucketSize + 1;
        numElements++;
        return true;
    }

    bool contains(const T& item) const {
        return checkInsertIf(item, true);
    }
    
    bool insert(const T& item) {
        if (capacity <= numElements) {
            reserve(capacity <= 1 ? 2 : (capacity * 5) / 3);
        }
        return checkInsertIf(item, false);
    }
// just hacked in to eyeball feasibility
    /// search for an item.
    /// return is_contained when dont_insert;
    /// otherwise false if is_contained,
    /// else insert & return true
    bool checkInsertIf(const T& item, bool dont_insert) {
        auto thisHash = myHash(item);
        size_t index = thisHash % nBuckets();
        auto& bucket = buckets[index];
        auto bucketSize = bucket.size;
        auto& values = bucket.values;
        auto range = std::min(bucketSize, K);
        auto begin = values.begin(),
             beyond = begin + range,
             found = std::find(begin, beyond, item);
        
        if (found != beyond)  // item found in bucket
            return dont_insert;

        if (bucketSize <= K) {
            if (!dont_insert && bucketSize < K)
                values[bucketSize] = item;
        }
        else /* (K < bucketSize): handle spilled */ {
            begin = spill.begin() + bucket.spillIndex;
            range = bucketSize - K;
            beyond = begin + range;  // given references this calls for factoring out
            found = std::find(begin, beyond, item);
            if (found != beyond)  // item found in spill
                return dont_insert;
        }
        // found nowhere
        if (dont_insert)
            return false;
        if (K <= bucketSize) {
            reserveBucketSpill(index, bucketSize + 1);
            spill[bucket.spillIndex + bucketSize - K] = item;
        }
        bucket.size = bucketSize + 1;
        numElements++;
        return true;
    }

    bool contains(const T& item) const {
        return checkInsertIf(item, true);
    }
    
    bool insert(const T& item) {
        if (capacity <= numElements) {
            reserve(capacity <= 1 ? 2 : capacity * 5 / 3);
        }
        return checkInsertIf(item, false);
    }
// just hacked in to eyeball feasibility
    /// search for an item.
    /// return is_contained when dont_insert;
    /// otherwise false if is_contained,
    /// else insert & return true
    bool checkInsertIf(const T& item, bool dont_insert) {
        auto thisHash = myHash(item);
        size_t index = thisHash % nBuckets();
        auto& bucket = buckets[index];
        auto bucketSize = bucket.size;
        auto& values = bucket.values;
        auto range = std::min(bucketSize, K);
        auto begin = values.begin(),
             beyond = begin + range,
             found = std::find(begin, beyond, item);
        
        if (found != beyond)  // item found in bucket
            return dont_insert;

        if (bucketSize <= K) {
            if (!dont_insert && bucketSize < K)
                values[bucketSize] = item;
        }
        else /* (K < bucketSize): handle spilled */ {
            begin = spill.begin() + bucket.spillIndex;
            range = bucketSize - K;
            beyond = begin + range;  // given references this calls for factoring out
            found = std::find(begin, beyond, item);
            if (found != beyond)  // item found in spill
                return dont_insert;
        }
        // not found anywhere
        if (dont_insert)
            return false;
        if (K <= bucketSize) {
            reserveBucketSpill(index, bucketSize + 1);
            spill[bucket.spillIndex + bucketSize - K] = item;
        }
        bucket.size = bucketSize + 1;
        numElements++;
        return true;
    }

    bool contains(const T& item) const {
        return checkInsertIf(item, true);
    }
    
    bool insert(const T& item) {
        if (capacity <= numElements) {
            reserve(capacity <= 1 ? 2 : (capacity * 5) / 3);
        }
        return checkInsertIf(item, false);
    }
Source Link
greybeard
  • 7.8k
  • 3
  • 21
  • 56
Loading