7
\$\begingroup\$

The main problem that I tried to tackle here was trying not to have the erase method to be \$O(n)\$. I kept two data structures: a linked-list of pairs and a map of a key and a pair containing a value and a pointer that points to the node in the linked-list.

Am I on the right track? Right now the code is all rough, ugly and all over the place but I tested it and it works properly.

#include <iostream>
#include <list>
#include <map>
#include <string>

using std::cout;
using std::endl;

template <typename key, typename value>
class table {
    using node = std::pair<key, value>;
    using iterator = typename std::list<std::pair<key, value>>::iterator;
    using value_node = std::pair<value, iterator>;

    std::list<node> list;
    std::map<key, value_node> data;

public:
    table & push(key const & k, value const & v) {
        list.emplace_back(k, v);
        data.emplace(k, value_node {v, std::prev(std::end(list))});
        return *this;
    }

    iterator begin() {
        return std::begin(list);
    }

    iterator end() {
        return std::end(list);
    }

    table & erase(key const & k) {
        list.erase(data[k].second);
        data.erase(k);
        return *this;
    }

    // Just to check if the list items are actually being erased.
    size_t length() const {
        return list.size();
    }

    value & operator [](key const & k) {
        return data[k].first;
    }
};

int main() {

    table<std::string, std::string> t;

    t.push("hello", "world");
    t.push("blabla", "hello");

    cout << t["hello"] << endl;
    cout << t["blabla"] << endl;
    cout << t.length() << endl;

    for (auto const & e : t) {
        cout << e.first << " " << e.second << endl;
    }

    t.erase("hello");
    t.erase("blabla");
    cout << t.length() << endl;

    cout << "Size: " << sizeof(table<int, int>) << endl;

    return 0;
}
\$\endgroup\$
17
  • \$\begingroup\$ Is there a reason an unordered_map wouldn't do the same job? \$\endgroup\$ Commented Jan 19, 2017 at 2:28
  • 4
    \$\begingroup\$ The idea is to keep the order of insertion. So when you iterate, you iterate the elements by the order that you've inserted them. Also, one of the challenges was to make the erase method fast because of the linked list. The unordered_map uses a hash function that randomizes the order of the elements and the map sorts them using <; none of them keeps the order of insertion. \$\endgroup\$ Commented Jan 19, 2017 at 2:33
  • \$\begingroup\$ If item is not inserted, then data[k] is erroneous call, therefore your erase function is wrong. You should to use std::map::at and get exception or std::map::find and compare its result against std::end(data), then in case of failure is is better to return bool(false) instead of *this. \$\endgroup\$ Commented Jan 19, 2017 at 18:17
  • \$\begingroup\$ You don't need to duplicate neither key nor value at all. 1st option is to use std::map< std::reference_wrapper< key const >, iterator const, std::less< key > > instead of your highly expensive way. Another approach is to use std::set< iterator > and use your custom comparator. It just adds one level of indirection in operator [] and others (it is cheap). Also it is better to use iterator const it = list.insert(std::cend(list), {k, v}); instead of ad-hoc std::prev(std::end(list)) in both cases. insert is more universal way of adding an elements to containers. \$\endgroup\$ Commented Jan 19, 2017 at 18:18
  • \$\begingroup\$ @Orient Thanks, I've changed the code and now it's much much better. (I still haven't updated it here) I am, however, interested in your approach std::reference_wrapper< key const >, iterator const, std::less< key > >, could you please explain it a bit better? Are you saying that I should make the map key and value a reference to the first and second element of the list pair? And I'm a still bit confused by your idea of using the set, since my map already has a "reference" to the iterator. iterator const it = list.insert(std::cend(list), {k, v}); was a nice one ;) \$\endgroup\$ Commented Jan 21, 2017 at 23:44

2 Answers 2

3
\$\begingroup\$

algorithm, datastructure

not to have the erase method to be 𝑂(𝑛)

The tombstone technique makes this very simple.

Allocate a single valid bit for each entry. Clear this bit to delete the entry. Accessors will skip over invalid entries, rather than returning them to caller.

After a bunch of deletions the map will be bloated with an inconveniently large number of dead entries. Pick some fraction, perhaps \$F = 0.25\$. Track the number of live and dead entries. When the \$\frac{dead}{total}\$ fraction exceeds \$F\$, copy the live entries into a new map and discard the old one. This gives us amortized \$O(\log N)\$ deletion cost, similar to how growing a vector with multiplicative increases will give amortized cost of \$O(\log N)\$ for adding each entry.

If long runs of adjacent invalid entries are a problem, we could trigger cleanups on that, as well. For access patterns like "delete many items in the order we added them", it will be important to cache a pointer to most recently accessed entry; else we risk quadratic complexity in repeatedly skipping over a run.

\$\endgroup\$
2
\$\begingroup\$

The idea certainly has merit. It could be useful for a least-recently-created cache, such as one that holds time-dependent data samples. Or it could form the basis of an LRU cache, if we add code to splice each element to the end of the list when it's accessed.

There are a few problems with it, and to explore those I'm going to replace the main() function with some unit tests:

#include <gtest/gtest.h>

TEST(table, push)
{
    table<std::string, int> t;
    EXPECT_TRUE(t.empty());

    t.emplace("two", 2);
    EXPECT_FALSE(t.empty());
    EXPECT_EQ(t.size(), 1uz);
    EXPECT_EQ(t["two"], 2);
}

If our class were a drop-in replacement for std::map (and I think it should be), then we would have size() rather than length() and emplace() rather than push(). And we'd have an empty() accessor.

Having addressed those, let's add the next test:

TEST(table, push_twice)
{
    table<int, std::string> t;

    t.emplace(2, "two");
    t.emplace(2, "TWO");
    EXPECT_EQ(t.size(), 1uz);
    EXPECT_EQ(t[2], "two");
}

This fails, because we add to the list even when the map already has an entry for the specified key (and because we use the list's size() rather than the map's one). What's happened here is that we now have an inconsistency between the two collections, and we've created a value element that cannot be addressed by any key.

To be consistent with std::map, we should not add to the list if the key was already present. To fix this, we need to change our map so that its values are iterators into the list (this also has a benefit that we no longer store two copies of the value objects; I'll return to that later).

    using node = std::pair<key, value>;
    using list_t = std::list<node>;
    using iterator = typename list_t::iterator;
    using map = std::map<key, iterator>;
    using value_node = map::value_type;

    list_t list = {};
    map data = {};
    std::pair<iterator,bool> emplace(key const& k, value const& v)
    {
        list.emplace_back(k, v);
        const iterator iter = std::prev(std::end(list));  // N.B. need to #include <iterator>
        auto [map_iter, created] = data.emplace(k, iter);
        if (created) {
            return std::pair{iter, true};
        } else {
            // existing node is mapping.first
            list.erase(iter);
            return std::pair{map_iter->second, false};
        }
    }
    value & operator[](key const& k)
    {
        return data[k]->second;
    }

While we're changing operator[], we need to add some tests that it works like std::map:

TEST(table, index_operator)
{
    table<int, std::string> t;

    t[4] = "four";
    EXPECT_EQ(t.size(), 1uz);
    EXPECT_EQ(t[4], "four");
}

This is bad - we blindly assume that it's only called for existing keys. But it's easy to fix:

    value& operator[](key const& k)
    {
        auto map_iter = data.find(k);
        auto iter = map_iter == data.end() ? emplace(k, {}).first : map_iter->second;
        return iter->second;
    }

We should probably implement at(), too:

    value& at(key const& k)
    {
        return data.at(k)->second;
    }
    const value& at(key const& k) const
    {
        return data.at(k)->second;
    }

And combine those two functions, by deducing this:

    auto& at(this auto& self, key const& k)
    {
        return self.data.at(k)->second;
    }

There's the complementary assumption in erase(), which assumes that its "key" argument is contained. It can be exposed by this test:

TEST(table, erase)
{
    table<int, std::string> t;

    EXPECT_TRUE(t.emplace(2, "two").second);

    EXPECT_NO_THROW(t.erase(2));
    EXPECT_TRUE(t.empty());
    EXPECT_THROW(t.at(2), std::out_of_range);

    EXPECT_NO_THROW(t.erase(2));
}

If the key is not present, then we're creating a default-constructed iterator, which is not acceptable to list.erase(). We need to test whether the key is present before acting:

    void erase(key const & k)
    {
        if (auto map_iter = data.find(k);  map_iter != data.end()) {
            auto list_iter = map_iter->second;
            list.erase(list_iter);
            data.erase(map_iter);
        }
    }

We have a serious bug lurking in code we didn't even write. When our class contains indirections, such as pointers, references or iterators, then we need to carefully consider whether the compiler generated copy and move operations are suitable. In our case, this test fails:

TEST(sequenced_map, copy)
{
    const sequenced_map<int, std::string> initial(2, "two", 3, "three");

    auto copy = initial;
    copy[2] = "TWO";
    EXPECT_EQ(copy[2], "TWO");
    EXPECT_EQ(initial.at(2), "two");

    copy = initial;
    copy[2] = "Two";
    EXPECT_EQ(copy[2], "Two");
    EXPECT_EQ(initial.at(2), "two");
}

The reason is that our copied map contains iterators into initial's list. We can copy the list safely, but need to regenerate the map to point at the destination's list:

    sequenced_map(const sequenced_map& other)
        : list{other.list},
          data{other.data}
    {
        for (auto i = list.begin();  i != list.end();  ++i) {
            data.at(i->first) = i;
        }
    }

    sequenced_map& operator=(const sequenced_map& other)
    {
        data.clear();
        list.clear();
        for (auto const& e: other) {
            emplace(e.first, e.second);
        }
        return *this;
    }

Compiler-generated move operations are fine, though:

    sequenced_map(sequenced_map&&) = default;

    sequenced_map& operator=(sequenced_map&&) = default;

I said I'd return to the issue of having two copies of data objects. Storing two copies of the value was a real problem, since it precluded use of our container with move-only types. It's also wasteful; that's true of the two copies of the key, too. We can store just one copy if we make the map use a reference as its key. That's a very simple change to the map type:

    using map = std::map<std::reference_wrapper<const key>, iterator, std::less<key>>;

We have intermediate copies in some of our functions. To help identify these, I like to test with a move-only type for keys and values:

template<typename T>
struct movable
{
    T value;

    movable()
        requires std::is_default_constructible_v<T>
        : value{}
    {};

    template<typename U>
    movable(U&& value)
        : value{std::forward<U>(value)}
    {}

    movable(const movable&) = delete;
    auto operator=(const movable&) = delete;

    movable(movable&&) = default;
    movable& operator=(movable&&) = default;

    bool operator==(const movable& other) const
    { return value == other.value; }

    auto operator<=>(const movable& other) const
    { return value <=> other.value; }
};
TEST(table, move_only)
{
    table<movable<int>, movable<std::string>> t;

    EXPECT_TRUE(t.emplace(2, "two").second);

    EXPECT_EQ(t[2], "two");
    EXPECT_EQ(t.at(2), "two");

    auto const& ct = t;
    EXPECT_EQ(ct.at(2), "two");

    EXPECT_NO_THROW(t.erase(2));
    EXPECT_TRUE(t.empty());
}

The main change provoked by this is to emplace(), which now needs to be able to accept rvalues:

    auto emplace(Key key, Value value)
    {
        list.emplace_back(std::move(key), std::move(value));
        const iterator iter = prev(list.end());
        auto [map_iter, created] = data.emplace(iter->first, iter);
        if (created) {
            // return new iterator
            return std::pair{iter, true};
        } else {
            // return existing iterator
            list.erase(iter);
            return std::pair{map_iter->second, false};
        }
    }

There are overloads and entire functions missing from the interface, which should be implemented if we want our class to be a drop-in replacement for a standard map.

A final compatibility issue is that the use of std::map requires the key type to be totally ordered. The template would be more flexible if it could use a std::unordered_map when necessary.


Improved code

I've applied the above suggestions, and renamed a few things to match standard-container conventions and my own expectations. I believe this modified code is more robust and flexible than the original.

#include <algorithm>
#include <concepts>
#include <initializer_list>
#include <iterator>
#include <list>
#include <map>
#include <memory>
#include <unordered_map>
#include <utility>

template<typename T>
constexpr bool prefers_unordered()
{
    return false;
}

template<std::equality_comparable T>
requires requires(const T& t) { std::hash(t); }
constexpr bool prefers_unordered()
{
    return true;
}

template<typename Key, typename Value, bool is_unordered = prefers_unordered<Key>()>
class sequenced_map
{
    using List = std::list<std::pair<const Key, Value>>;

    using key_type = Key;
    using mapped_type = Value;
    using value_type = typename List::value_type;
    using size_type = typename List::size_type;
    using difference_type = typename List::difference_type;
    using reference = typename List::reference;
    using const_reference = typename List::const_reference;
    using pointer = typename List::pointer;
    using const_pointer = typename List::const_pointer;
    using iterator = typename List::iterator;
    using const_iterator = typename List::const_iterator;
    using reverse_iterator = typename List::reverse_iterator;
    using const_reverse_iterator = typename List::const_reverse_iterator;

    using Map = std::conditional_t<is_unordered,
                                   std::unordered_map<std::reference_wrapper<const Key>, iterator>,
                                   std::map<std::reference_wrapper<const Key>, iterator, std::less<Key>>>;

    List list = {};
    Map map = {};

public:
    sequenced_map() = default;

    template<std::input_iterator Iter>
    sequenced_map(Iter first, Iter last) {
        for (auto i = first;  i != last;  ++i) {
            emplace(*i);
        }
    }

    template<typename T>
    sequenced_map(std::initializer_list<T> init)
        : sequenced_map(init.begin(), init.end())
    {
    }

    template<typename... Rest>
    sequenced_map(key_type k, value_type v, Rest... rest)
    {
        insert_pairs(std::move(k), std::move(v), std::move(rest)...);
    }

    sequenced_map(const sequenced_map& other)
        : list{other.list},
          map{other.map}
    {
        for (auto i = list.begin();  i != list.end();  ++i) {
            map.at(i->first) = i;
        }
    }

    sequenced_map(sequenced_map&&) = default;

    sequenced_map& operator=(const sequenced_map& other)
    {
        map.clear();
        list.clear();
        for (auto const& e: other) {
            emplace(e.first, e.second);
        }
        return *this;
    }

    sequenced_map& operator=(sequenced_map&&) = default;

    auto begin(this auto& self) { return self.list.begin(); }
    auto end(this auto& self) { return self.list.end(); }
    auto rbegin(this auto& self) { return self.list.rbegin(); }
    auto rend(this auto& self) { return self.list.rend(); }

    auto cbegin() const { return begin(); }
    auto cend() const { return end(); }
    auto crbegin() const { return rbegin(); }
    auto crend() const { return rend(); }

    size_type size() const
    {
        return list.size();
    }

    size_type empty() const
    {
        return list.empty();
    }

    size_type max_size() const
    {
        return std::min(list.max_size(), map.max_size());
    }

    size_type count (key_type const& key) const
    {
        return map.count(key);
    }

    iterator find(key_type const& key)
    {
        auto map_iter = map.find(key);
        return map_iter == map.end() ? list.end() : map_iter->second;
    }

    const_iterator find(key_type const& key) const
    {
        auto map_iter = map.find(key);
        return map_iter == map.end() ? list.end() : map_iter->second;
    }

    void clear()
    {
        map.clear();
        list.clear();
    }

    template<typename V>
    std::pair<iterator,bool> insert(V&& value)
    {
        const auto iter = list.emplace(list.end(), std::forward<V>(value));
        auto [map_iter, created] = map.emplace(iter->first, iter);
        if (created) {
            // return new iterator
            return {iter, true};
        } else {
            // remove tentatively-added item
            list.erase(iter);
            // return existing iterator
            return {map_iter->second, false};
        }
    }

    template<std::input_iterator InputIt>
    void insert(InputIt first, InputIt last)
    {
        for (auto i = first;  i != last;  i++) {
            insert(*i);
        }
    }

    void insert(std::initializer_list<value_type> ilist)
    {
        insert(ilist.begin(), ilist.end());
    }

    template <typename K, typename M>
    std::pair<iterator, bool> insert_or_assign(K&& k, M&& obj)
    {
        auto map_iter = map.find(k);
        if (map_iter == map.end()) {
            return emplace(std::forward<K>(k), std::forward<M>(obj));
        }
        // assign to existing element
        map_iter->second = std::forward<M>(obj);
        return { *map_iter, false };
    }


    template<typename... Args>
    std::pair<iterator,bool> emplace(Args&&... value)
    {
        const auto iter = list.emplace(list.end(), std::forward<Args>(value)...);
        auto [map_iter, created] = map.emplace(iter->first, iter);
        if (created) {
            // return new iterator
            return {iter, true};
        } else {
            // remove tentatively-added item
            list.erase(iter);
            // return existing iterator
            return {map_iter->second, false};
        }
    }

    template <typename K, typename... Args>
    std::pair<iterator,bool> try_emplace(K&& k, Args&&... args)
    {
        auto map_iter = map.find(k);
        if (map_iter == map.end()) {
            return emplace(std::forward<K>(k), value_type(args...));
        }
        // not already present
        return { *map_iter, false };
    }

    size_type erase(key_type const& key)
    {
        if (auto map_iter = map.find(key);  map_iter != map.end()) {
            auto list_iter = map_iter->second;
            list.erase(list_iter);
            map.erase(map_iter);
            return 1;           // one element removed
        }
        return 0;               // no elements removed
    }

    iterator erase(const_iterator it)
    {
        map.erase(map.find(it->first));
        return list.erase(it);
    }

    iterator erase(const_iterator first, const_iterator last)
    {
        for (auto it = first;  it != last;  ++it) {
            map.erase(map.find(it->first));
        }
        return list.erase(first, last);
    }

    void swap(sequenced_map& other) {
        list.swap(other.list);
        map.swap(other.map);
    }

    value_type& operator[](key_type const& key)
    {
        auto map_iter = map.find(key);
        auto iter = map_iter == map.end() ? emplace(key, value_type{}).first : map_iter->second;
        return iter->second;
    }
    value_type& operator[](key_type&& key)
    {
        auto map_iter = map.find(key);
        auto iter = map_iter == map.end() ? emplace(std::move(key), value_type{}).first : map_iter->second;
        return iter->second;
    }

    auto& at(this auto& self, key_type const& key)
    {
        return self.map.at(std::ref(key))->second;
    }

private:
    void insert_pairs()
    {}

    template<typename... Rest>
    void insert_pairs(key_type k, value_type v, Rest... rest)
    {
        emplace(std::move(k), std::move(v));
        insert_pairs(std::move(rest)...);
    }
};

#include <gtest/gtest.h>

#include <array>
#include <ranges>
#include <stdexcept>
#include <string>
#include <type_traits>
#include <vector>

TEST(sequenced_map, construct)
{
    EXPECT_TRUE((sequenced_map<int,int>{}).empty());
    {
        auto a = std::array<std::pair<int,int>, 2>{{ std::pair{2, 4}, std::pair{3, 9} }};
        sequenced_map<int, int> seq(a.begin(), a.end());
        EXPECT_EQ(seq.size(), 2);
    }
    {
        sequenced_map<int, int> seq{{ std::pair{2, 4}, std::pair{3, 9} }};
        EXPECT_EQ(seq.size(), 2);
    }
}

TEST(sequenced_map, copy)
{
    const sequenced_map<int, std::string> initial(2, "two", 3, "three");

    auto copy = initial;
    copy[2] = "TWO";
    EXPECT_EQ(copy[2], "TWO");
    EXPECT_EQ(initial.at(2), "two");

    copy = initial;
    copy[2] = "Two";
    EXPECT_EQ(copy[2], "Two");
    EXPECT_EQ(initial.at(2), "two");
}

TEST(sequenced_map, move)
{
    sequenced_map<int, std::string> initial(2, "two", 3, "three");

    auto copy = std::move(initial);
    EXPECT_EQ(copy[2], "two");
    copy[2] = "TWO";
    EXPECT_EQ(copy[2], "TWO");

    initial = std::move(copy);
    EXPECT_EQ(initial[2], "TWO");
}

TEST(sequenced_map, emplace)
{
    sequenced_map<std::string, int> seq;
    EXPECT_TRUE(seq.empty());
    EXPECT_EQ(seq.size(), 0uz);

    seq.emplace("two", 2);
    EXPECT_FALSE(seq.empty());
    EXPECT_EQ(seq.size(), 1uz);
    EXPECT_EQ(seq["two"], 2);
}

TEST(sequenced_map, emplace_twice)
{
    sequenced_map<int, std::string> seq;

    EXPECT_TRUE(seq.emplace(2, "two").second);
    EXPECT_FALSE(seq.emplace(2, "TWO").second);
    EXPECT_EQ(seq.size(), 1uz);
    EXPECT_EQ(seq[2], "two");
}

TEST(sequenced_map, clear)
{
    sequenced_map<std::string, int> seq;

    seq.emplace("two", 2);
    EXPECT_EQ(seq.size(), 1uz);

    seq.clear();
    EXPECT_TRUE(seq.empty());
}

TEST(sequenced_map, index_operator)
{
    sequenced_map<int, std::string> seq;

    seq[4] = "four";
    EXPECT_EQ(seq.size(), 1uz);
    EXPECT_EQ(seq[4], "four");

    EXPECT_EQ(seq.at(4), "four");
    EXPECT_THROW(seq.at(5), std::out_of_range);

    auto const& ct = seq;
    EXPECT_EQ(ct.at(4), "four");
    EXPECT_THROW(ct.at(5), std::out_of_range);

    EXPECT_NO_THROW(seq.at(4) = "Four");
    EXPECT_EQ(ct.at(4), "Four");
}

TEST(sequenced_map, erase)
{
    sequenced_map<int, std::string> seq(2, "two", 3, "three");

    auto iter = seq.find(3);

    EXPECT_EQ(seq.erase(seq.begin()), iter);
    EXPECT_THROW(seq.at(2), std::out_of_range);
    EXPECT_EQ(seq.erase(3), 1);
    EXPECT_TRUE(seq.empty());

    EXPECT_EQ(seq.erase(3), 0);
}

TEST(sequenced_map, swap)
{
    sequenced_map<int, std::string> initial(2, "two", 3, "three");
    sequenced_map<int, std::string> copy;

    swap(initial, copy);
    EXPECT_TRUE(initial.empty());
    EXPECT_EQ(copy[2], "two");
}

TEST(sequenced_map, order)
{
    using Vector = std::vector<std::pair<int, int>>;
    static const auto values = Vector{
        { 5, 4}, {12, 15}, {8, 6}, {3, 15}};
    sequenced_map<int, int> seq;

    for (auto value: values) {
        seq.emplace(value.first, value.second);
    }

    EXPECT_EQ(seq | std::ranges::to<Vector>(), values);
}

template<typename T>
struct movable
{
    T value;

    movable()
        requires std::is_default_constructible_v<T>
        : value{}
    {};

    template<typename U>
    movable(U&& value)
        : value{std::forward<U>(value)}
    {}

    movable(const movable&) = delete;
    auto operator=(const movable&) = delete;

    movable(movable&&) = default;
    movable& operator=(movable&&) = default;

    bool operator==(const movable& other) const
    { return value == other.value; }

    auto operator<=>(const movable& other) const
    { return value <=> other.value; }
};

TEST(sequenced_map, move_only)
{
    sequenced_map<movable<int>, movable<std::string>> seq;

    EXPECT_TRUE(seq.emplace(2, "two").second);

    EXPECT_EQ(seq[2], "two");
    EXPECT_EQ(seq.at(2), "two");

    auto const& ct = seq;
    EXPECT_EQ(ct.at(2), "two");

    EXPECT_NO_THROW(seq.erase(2));
    EXPECT_TRUE(seq.empty());
}
\$\endgroup\$

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.