-
Notifications
You must be signed in to change notification settings - Fork 1.7k
/
Copy pathhashed-sharding.txt
260 lines (192 loc) · 10.1 KB
/
hashed-sharding.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
.. _sharding-hashed:
.. _sharding-hashed-sharding:
===============
Hashed Sharding
===============
.. meta::
:description: Explore how hashed sharding partitions data using hashed indexes for even distribution across a MongoDB sharded cluster.
.. default-domain:: mongodb
Hashed sharding uses either a :ref:`single field hashed index
<index-hashed-index>` or a :ref:`compound hashed index
<index-type-compound-hashed>` as the shard key to
partition data across your sharded cluster.
Sharding on a Single Field Hashed Index
Hashed sharding provides a more even data distribution across the
sharded cluster at the cost of reducing
:ref:`sharding-query-isolation`. Post-hash, documents with "close"
shard key values are unlikely to be on the same chunk or shard - the
:binary:`~bin.mongos` is more likely to perform
:ref:`sharding-mongos-broadcast` to fulfill a given ranged query.
:binary:`~bin.mongos` can target queries with equality matches to a
single shard.
.. include:: /images/sharding-hash-based.rst
Hashed indexes compute the hash value of a single field as the index
value; this value is used as your shard key. [#hashvalue]_
Sharding on a Compound Hashed Index
MongoDB includes support for creating compound indexes with a single
:ref:`hashed field <index-type-hashed>`. To create a compound hashed
index, specify ``hashed`` as the value of any single index key when
creating the index.
Compound hashed index compute the hash value of a single field in the
compound index; this value is used along with the other fields in the
index as your shard key.
Compound hashed sharding supports features like :ref:`zone sharding
<zone-sharding>`, where the prefix (i.e. first) non-hashed field or
fields support zone ranges while the hashed field supports more even
distribution of the sharded data. Compound hashed sharding also
supports shard keys with a hashed prefix for resolving data
distribution issues related to :ref:`monotonically increasing
fields <shard-key-monotonic>`.
.. include:: /includes/tip-applications-do-not-need-to-compute-hashes.rst
.. include:: /includes/warning-hashed-index-floating-point.rst
.. [#hashvalue]
:binary:`~bin.mongosh` provides the :method:`convertShardKeyToHashed()`
method. This method uses the same hashing function as the hashed index and
can be used to see what the hashed value would be for a key.
.. _hashed-sharding-shard-key:
Hashed Sharding Shard Key
-------------------------
The field you choose as your hashed shard key should have a good
:ref:`cardinality<shard-key-range>`, or large number of different values.
Hashed keys are ideal for shard keys with fields that change
:ref:`monotonically<shard-key-monotonic>` like :term:`ObjectId` values or
timestamps. A good example of this is the default ``_id`` field, assuming
it only contains :term:`ObjectId` values.
To shard a collection using a hashed shard key, see
:ref:`deploy-hashed-sharded-cluster-shard-collection`.
.. _hashed-versus-ranged-sharding:
Hashed vs Ranged Sharding
-------------------------
Given a collection using a monotonically increasing value ``X`` as the
shard key, using ranged sharding results in a distribution of incoming
inserts similar to the following:
.. include:: /images/sharded-cluster-monotonic-distribution.rst
Since the value of ``X`` is always increasing, the chunk with an upper bound
of :bsontype:`MaxKey` receives the majority incoming writes. This
restricts insert operations to the single shard containing this chunk, which
reduces or removes the advantage of distributed writes in a sharded cluster.
By using a hashed index on ``X``, the distribution of inserts is similar
to the following:
.. include:: /images/sharded-cluster-hashed-distribution.rst
Since the data is now distributed more evenly, inserts are efficiently
distributed throughout the cluster.
Shard the Collection
--------------------
Use the :method:`sh.shardCollection()` method, specifying the full namespace
of the collection and the target :ref:`hashed index <index-type-hashed>`
to use as the :term:`shard key`.
.. code-block:: javascript
sh.shardCollection( "database.collection", { <field> : "hashed" } )
To shard a collection on a
:ref:`compound hashed index <index-type-compound-hashed>`, specify
the full namespace of the collection and the target compound
hashed index to use as the :term:`shard key`:
.. code-block:: javascript
sh.shardCollection(
"database.collection",
{ "fieldA" : 1, "fieldB" : 1, "fieldC" : "hashed" }
)
.. important::
- Starting in MongoDB 5.0, you can :ref:`reshard a collection
<sharding-resharding>` by changing a collection's shard key.
- You can :ref:`refine a shard key <shard-key-refine>` by adding a suffix
field or fields to the existing shard key.
Shard a Populated Collection
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If you shard a populated collection using a hashed shard key:
- The sharding operation creates an initial chunk to cover all of the
shard key values.
- After the initial chunk creation, the balancer moves ranges of the
initial chunk when it needs to balance data.
.. _shard-empty-collection:
Shard an Empty Collection
~~~~~~~~~~~~~~~~~~~~~~~~~
.. include:: /includes/extracts/zoned-sharding-shard-operation-chunk-distribution.rst
Sharding Empty Collection on Single Field Hashed Shard Key
- With no :ref:`zones and zone ranges <zones-sharding>` specified
for the empty or non-existing collection:
- The sharding operation creates an empty chunk to cover the entire
range of the shard key values. Starting in version 8.0, the
operation creates 1 chunk per shard by default and migrates
across the cluster. You can use ``numInitialChunks`` option to
specify a different number of initial chunks and cause an initial
chunk distribution. This initial creation and distribution of
chunks allows for faster setup of sharding.
- After the initial distribution, the balancer manages the chunk
distribution going forward.
- With zones and zone ranges specified
for the empty or a non-existing collection:
- The sharding operation creates empty chunks for the defined zone
ranges as well as any additional chunks to cover the entire range
of the shard key values and performs an initial chunk distribution
based on the zone ranges. This initial creation and distribution
of chunks allows for faster setup of zoned sharding.
- After the initial distribution, the balancer manages the chunk
distribution going forward.
Sharding Empty Collection on Compound Hashed Shard Key with Hashed Field Prefix
If the compound hashed shard key has the hashed field as the prefix
(the hashed field is the first field in the shard key):
- With no zones and zone ranges specified
for the empty or non-existing collection:
- The sharding operation creates empty chunks to cover the entire
range of the shard key values and performs an initial chunk
distribution. The value of all non-hashed fields is :bsontype:`MinKey` at
each split point. Starting in version 8.0, the operation creates
1 chunk per shard by default and migrates across the cluster. You
can use ``numInitialChunks`` option to specify a different number
of initial chunks and cause an initial chunk distribution. This
initial creation and distribution of chunks allows for faster
setup of sharding.
- After the initial distribution, the balancer manages the chunk
distribution going forward.
- With a *single* zone with a
range from ``MinKey`` to ``MaxKey`` specified
for the empty or a non-existing collection *and*
the ``presplitHashedZones`` option specified to
:method:`sh.shardCollection()`:
- The sharding operation creates empty chunks for the defined zone
range as well as any additional chunks to cover the entire range
of the shard key values and performs an initial chunk
distribution based on the zone ranges. This initial creation and
distribution of chunks allows for faster setup of zoned sharding.
- After the initial distribution, the balancer manages the chunk
distribution going forward.
Sharding Empty Collection on Compound Hashed Shard Key with Non-Hashed Prefix
If the compound hashed shard key has one or more non-hashed fields as
the prefix (i.e. the hashed field is *not* the first field in the
shard key):
- With no zones and zone ranges specified
for the empty or non-existing collection *and*
:ref:`preSplitHashedZones
<method-shard-collection-presplitHashedZones>` is ``false`` or
omitted, MongoDB does not perform any initial chunk creation or
distribution when sharding the collection.
- With no zones and zone ranges specified
for the empty or non-existing collection *and*
:ref:`preSplitHashedZones
<method-shard-collection-presplitHashedZones>`,
:method:`sh.shardCollection()` / :dbcommand:`shardCollection`
returns an error.
- With zones and zone ranges specified
for the empty or a non-existing collection *and* the
:ref:`preSplitHashedZones
<method-shard-collection-presplitHashedZones>` option specified to
:method:`sh.shardCollection()`:
- The sharding operation creates empty chunks for the defined zone
ranges as well as any additional chunks to cover the entire range
of the shard key values.
- The sharding operation further subdivides the initial chunk for
each range, such that each shard in the zone is allocated an
equal number of chunks.
- This initial creation and distribution of chunks allows for
faster setup of zoned sharding. After the initial distribution,
the balancer manages the chunk distribution going forward.
The defined ranges for each zone *must* meet certain requirements.
For a description of the requirements and a complete example, see
:ref:`pre-define-zone-range-hashed-example`.
Drop a Hashed Shard Key Index
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. include:: /includes/drop-hashed-shard-key-index.rst
.. seealso::
To learn how to deploy a sharded cluster and implement hashed
sharding, see :ref:`sharding-procedure-setup`.