Skip to content

Commit fcbe0cb

Browse files
AdamGoldmerwokblurb-it[bot]Fidget-Spinner
authored
bpo-42967: only use '&' as a query string separator (#24297)
bpo-42967: [security] Address a web cache-poisoning issue reported in urllib.parse.parse_qsl(). urllib.parse will only us "&" as query string separator by default instead of both ";" and "&" as allowed in earlier versions. An optional argument seperator with default value "&" is added to specify the separator. Co-authored-by: Éric Araujo <merwok@netwok.org> Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com> Co-authored-by: Ken Jin <28750310+Fidget-Spinner@users.noreply.github.com> Co-authored-by: Éric Araujo <merwok@netwok.org>
1 parent 1b57426 commit fcbe0cb

File tree

12 files changed

+186
-47
lines changed

12 files changed

+186
-47
lines changed

‎Doc/library/cgi.rst

+6-3
Original file line numberDiff line numberDiff line change
@@ -277,14 +277,14 @@ These are useful if you want more control, or if you want to employ some of the
277277
algorithms implemented in this module in other circumstances.
278278

279279

280-
.. function:: parse(fp=None, environ=os.environ, keep_blank_values=False, strict_parsing=False)
280+
.. function:: parse(fp=None, environ=os.environ, keep_blank_values=False, strict_parsing=False, separator="&")
281281

282282
Parse a query in the environment or from a file (the file defaults to
283-
``sys.stdin``). The *keep_blank_values* and *strict_parsing* parameters are
283+
``sys.stdin``). The *keep_blank_values*, *strict_parsing* and *separator* parameters are
284284
passed to :func:`urllib.parse.parse_qs` unchanged.
285285

286286

287-
.. function:: parse_multipart(fp, pdict, encoding="utf-8", errors="replace")
287+
.. function:: parse_multipart(fp, pdict, encoding="utf-8", errors="replace", separator="&")
288288

289289
Parse input of type :mimetype:`multipart/form-data` (for file uploads).
290290
Arguments are *fp* for the input file, *pdict* for a dictionary containing
@@ -303,6 +303,9 @@ algorithms implemented in this module in other circumstances.
303303
Added the *encoding* and *errors* parameters. For non-file fields, the
304304
value is now a list of strings, not bytes.
305305

306+
.. versionchanged:: 3.10
307+
Added the *separator* parameter.
308+
306309

307310
.. function:: parse_header(string)
308311

‎Doc/library/urllib.parse.rst

+14-2
Original file line numberDiff line numberDiff line change
@@ -165,7 +165,7 @@ or on combining URL components into a URL string.
165165
now raise :exc:`ValueError`.
166166

167167

168-
.. function:: parse_qs(qs, keep_blank_values=False, strict_parsing=False, encoding='utf-8', errors='replace', max_num_fields=None)
168+
.. function:: parse_qs(qs, keep_blank_values=False, strict_parsing=False, encoding='utf-8', errors='replace', max_num_fields=None, separator='&')
169169

170170
Parse a query string given as a string argument (data of type
171171
:mimetype:`application/x-www-form-urlencoded`). Data are returned as a
@@ -190,6 +190,8 @@ or on combining URL components into a URL string.
190190
read. If set, then throws a :exc:`ValueError` if there are more than
191191
*max_num_fields* fields read.
192192

193+
The optional argument *separator* is the symbol to use for separating the query arguments. It defaults to `&`.
194+
193195
Use the :func:`urllib.parse.urlencode` function (with the ``doseq``
194196
parameter set to ``True``) to convert such dictionaries into query
195197
strings.
@@ -201,8 +203,12 @@ or on combining URL components into a URL string.
201203
.. versionchanged:: 3.8
202204
Added *max_num_fields* parameter.
203205

206+
.. versionchanged:: 3.10
207+
Added *separator* parameter with the default value of `&`. Python versions earlier than Python 3.10 allowed using both ";" and "&" as
208+
query parameter separator. This has been changed to allow only a single separator key, with "&" as the default separator.
209+
204210

205-
.. function:: parse_qsl(qs, keep_blank_values=False, strict_parsing=False, encoding='utf-8', errors='replace', max_num_fields=None)
211+
.. function:: parse_qsl(qs, keep_blank_values=False, strict_parsing=False, encoding='utf-8', errors='replace', max_num_fields=None, separator='&')
206212

207213
Parse a query string given as a string argument (data of type
208214
:mimetype:`application/x-www-form-urlencoded`). Data are returned as a list of
@@ -226,6 +232,8 @@ or on combining URL components into a URL string.
226232
read. If set, then throws a :exc:`ValueError` if there are more than
227233
*max_num_fields* fields read.
228234

235+
The optional argument *separator* is the symbol to use for separating the query arguments. It defaults to `&`.
236+
229237
Use the :func:`urllib.parse.urlencode` function to convert such lists of pairs into
230238
query strings.
231239

@@ -235,6 +243,10 @@ or on combining URL components into a URL string.
235243
.. versionchanged:: 3.8
236244
Added *max_num_fields* parameter.
237245

246+
.. versionchanged:: 3.10
247+
Added *separator* parameter with the default value of `&`. Python versions earlier than Python 3.10 allowed using both ";" and "&" as
248+
query parameter separator. This has been changed to allow only a single separator key, with "&" as the default separator.
249+
238250

239251
.. function:: urlunparse(parts)
240252

‎Doc/whatsnew/3.10.rst

+13
Original file line numberDiff line numberDiff line change
@@ -546,6 +546,19 @@ Add new method :meth:`~unittest.TestCase.assertNoLogs` to complement the
546546
existing :meth:`~unittest.TestCase.assertLogs`. (Contributed by Kit Yan Choi
547547
in :issue:`39385`.)
548548
549+
urllib.parse
550+
------------
551+
552+
Python versions earlier than Python 3.10 allowed using both ``;`` and ``&`` as
553+
query parameter separators in :func:`urllib.parse.parse_qs` and
554+
:func:`urllib.parse.parse_qsl`. Due to security concerns, and to conform with
555+
newer W3C recommendations, this has been changed to allow only a single
556+
separator key, with ``&`` as the default. This change also affects
557+
:func:`cgi.parse` and :func:`cgi.parse_multipart` as they use the affected
558+
functions internally. For more details, please see their respective
559+
documentation.
560+
(Contributed by Adam Goldschmidt, Senthil Kumaran and Ken Jin in :issue:`42967`.)
561+
549562
xml
550563
---
551564

‎Doc/whatsnew/3.6.rst

+13
Original file line numberDiff line numberDiff line change
@@ -2443,3 +2443,16 @@ because of the behavior of the socket option ``SO_REUSEADDR`` in UDP. For more
24432443
details, see the documentation for ``loop.create_datagram_endpoint()``.
24442444
(Contributed by Kyle Stanley, Antoine Pitrou, and Yury Selivanov in
24452445
:issue:`37228`.)
2446+
2447+
Notable changes in Python 3.6.13
2448+
================================
2449+
2450+
Earlier Python versions allowed using both ";" and "&" as
2451+
query parameter separators in :func:`urllib.parse.parse_qs` and
2452+
:func:`urllib.parse.parse_qsl`. Due to security concerns, and to conform with
2453+
newer W3C recommendations, this has been changed to allow only a single
2454+
separator key, with "&" as the default. This change also affects
2455+
:func:`cgi.parse` and :func:`cgi.parse_multipart` as they use the affected
2456+
functions internally. For more details, please see their respective
2457+
documentation.
2458+
(Contributed by Adam Goldschmidt, Senthil Kumaran and Ken Jin in :issue:`42967`.)

‎Doc/whatsnew/3.7.rst

+13
Original file line numberDiff line numberDiff line change
@@ -2557,3 +2557,16 @@ because of the behavior of the socket option ``SO_REUSEADDR`` in UDP. For more
25572557
details, see the documentation for ``loop.create_datagram_endpoint()``.
25582558
(Contributed by Kyle Stanley, Antoine Pitrou, and Yury Selivanov in
25592559
:issue:`37228`.)
2560+
2561+
Notable changes in Python 3.7.10
2562+
================================
2563+
2564+
Earlier Python versions allowed using both ``;`` and ``&`` as
2565+
query parameter separators in :func:`urllib.parse.parse_qs` and
2566+
:func:`urllib.parse.parse_qsl`. Due to security concerns, and to conform with
2567+
newer W3C recommendations, this has been changed to allow only a single
2568+
separator key, with ``&`` as the default. This change also affects
2569+
:func:`cgi.parse` and :func:`cgi.parse_multipart` as they use the affected
2570+
functions internally. For more details, please see their respective
2571+
documentation.
2572+
(Contributed by Adam Goldschmidt, Senthil Kumaran and Ken Jin in :issue:`42967`.)

‎Doc/whatsnew/3.8.rst

+13
Original file line numberDiff line numberDiff line change
@@ -2234,3 +2234,16 @@ because of the behavior of the socket option ``SO_REUSEADDR`` in UDP. For more
22342234
details, see the documentation for ``loop.create_datagram_endpoint()``.
22352235
(Contributed by Kyle Stanley, Antoine Pitrou, and Yury Selivanov in
22362236
:issue:`37228`.)
2237+
2238+
Notable changes in Python 3.8.8
2239+
===============================
2240+
2241+
Earlier Python versions allowed using both ";" and "&" as
2242+
query parameter separators in :func:`urllib.parse.parse_qs` and
2243+
:func:`urllib.parse.parse_qsl`. Due to security concerns, and to conform with
2244+
newer W3C recommendations, this has been changed to allow only a single
2245+
separator key, with "&" as the default. This change also affects
2246+
:func:`cgi.parse` and :func:`cgi.parse_multipart` as they use the affected
2247+
functions internally. For more details, please see their respective
2248+
documentation.
2249+
(Contributed by Adam Goldschmidt, Senthil Kumaran and Ken Jin in :issue:`42967`.)

‎Doc/whatsnew/3.9.rst

+14-1
Original file line numberDiff line numberDiff line change
@@ -1515,4 +1515,17 @@ need to account for this change. A :exc:`DeprecationWarning` may be emitted for
15151515
invalid forms of parameterizing :class:`collections.abc.Callable` which may have
15161516
passed silently in Python 3.9.1. This :exc:`DeprecationWarning` will
15171517
become a :exc:`TypeError` in Python 3.10.
1518-
(Contributed by Ken Jin in :issue:`42195`.)
1518+
(Contributed by Ken Jin in :issue:`42195`.)
1519+
1520+
urllib.parse
1521+
------------
1522+
1523+
Earlier Python versions allowed using both ";" and "&" as
1524+
query parameter separators in :func:`urllib.parse.parse_qs` and
1525+
:func:`urllib.parse.parse_qsl`. Due to security concerns, and to conform with
1526+
newer W3C recommendations, this has been changed to allow only a single
1527+
separator key, with "&" as the default. This change also affects
1528+
:func:`cgi.parse` and :func:`cgi.parse_multipart` as they use the affected
1529+
functions internally. For more details, please see their respective
1530+
documentation.
1531+
(Contributed by Adam Goldschmidt, Senthil Kumaran and Ken Jin in :issue:`42967`.)

‎Lib/cgi.py

+14-9
Original file line numberDiff line numberDiff line change
@@ -115,7 +115,8 @@ def closelog():
115115
# 0 ==> unlimited input
116116
maxlen = 0
117117

118-
def parse(fp=None, environ=os.environ, keep_blank_values=0, strict_parsing=0):
118+
def parse(fp=None, environ=os.environ, keep_blank_values=0,
119+
strict_parsing=0, separator='&'):
119120
"""Parse a query in the environment or from a file (default stdin)
120121
121122
Arguments, all optional:
@@ -134,6 +135,9 @@ def parse(fp=None, environ=os.environ, keep_blank_values=0, strict_parsing=0):
134135
strict_parsing: flag indicating what to do with parsing errors.
135136
If false (the default), errors are silently ignored.
136137
If true, errors raise a ValueError exception.
138+
139+
separator: str. The symbol to use for separating the query arguments.
140+
Defaults to &.
137141
"""
138142
if fp is None:
139143
fp = sys.stdin
@@ -154,7 +158,7 @@ def parse(fp=None, environ=os.environ, keep_blank_values=0, strict_parsing=0):
154158
if environ['REQUEST_METHOD'] == 'POST':
155159
ctype, pdict = parse_header(environ['CONTENT_TYPE'])
156160
if ctype == 'multipart/form-data':
157-
return parse_multipart(fp, pdict)
161+
return parse_multipart(fp, pdict, separator=separator)
158162
elif ctype == 'application/x-www-form-urlencoded':
159163
clength = int(environ['CONTENT_LENGTH'])
160164
if maxlen and clength > maxlen:
@@ -178,10 +182,10 @@ def parse(fp=None, environ=os.environ, keep_blank_values=0, strict_parsing=0):
178182
qs = ""
179183
environ['QUERY_STRING'] = qs # XXX Shouldn't, really
180184
return urllib.parse.parse_qs(qs, keep_blank_values, strict_parsing,
181-
encoding=encoding)
185+
encoding=encoding, separator=separator)
182186

183187

184-
def parse_multipart(fp, pdict, encoding="utf-8", errors="replace"):
188+
def parse_multipart(fp, pdict, encoding="utf-8", errors="replace", separator='&'):
185189
"""Parse multipart input.
186190
187191
Arguments:
@@ -205,7 +209,7 @@ def parse_multipart(fp, pdict, encoding="utf-8", errors="replace"):
205209
except KeyError:
206210
pass
207211
fs = FieldStorage(fp, headers=headers, encoding=encoding, errors=errors,
208-
environ={'REQUEST_METHOD': 'POST'})
212+
environ={'REQUEST_METHOD': 'POST'}, separator=separator)
209213
return {k: fs.getlist(k) for k in fs}
210214

211215
def _parseparam(s):
@@ -315,7 +319,7 @@ class FieldStorage:
315319
def __init__(self, fp=None, headers=None, outerboundary=b'',
316320
environ=os.environ, keep_blank_values=0, strict_parsing=0,
317321
limit=None, encoding='utf-8', errors='replace',
318-
max_num_fields=None):
322+
max_num_fields=None, separator='&'):
319323
"""Constructor. Read multipart/* until last part.
320324
321325
Arguments, all optional:
@@ -363,6 +367,7 @@ def __init__(self, fp=None, headers=None, outerboundary=b'',
363367
self.keep_blank_values = keep_blank_values
364368
self.strict_parsing = strict_parsing
365369
self.max_num_fields = max_num_fields
370+
self.separator = separator
366371
if 'REQUEST_METHOD' in environ:
367372
method = environ['REQUEST_METHOD'].upper()
368373
self.qs_on_post = None
@@ -589,7 +594,7 @@ def read_urlencoded(self):
589594
query = urllib.parse.parse_qsl(
590595
qs, self.keep_blank_values, self.strict_parsing,
591596
encoding=self.encoding, errors=self.errors,
592-
max_num_fields=self.max_num_fields)
597+
max_num_fields=self.max_num_fields, separator=self.separator)
593598
self.list = [MiniFieldStorage(key, value) for key, value in query]
594599
self.skip_lines()
595600

@@ -605,7 +610,7 @@ def read_multi(self, environ, keep_blank_values, strict_parsing):
605610
query = urllib.parse.parse_qsl(
606611
self.qs_on_post, self.keep_blank_values, self.strict_parsing,
607612
encoding=self.encoding, errors=self.errors,
608-
max_num_fields=self.max_num_fields)
613+
max_num_fields=self.max_num_fields, separator=self.separator)
609614
self.list.extend(MiniFieldStorage(key, value) for key, value in query)
610615

611616
klass = self.FieldStorageClass or self.__class__
@@ -649,7 +654,7 @@ def read_multi(self, environ, keep_blank_values, strict_parsing):
649654
else self.limit - self.bytes_read
650655
part = klass(self.fp, headers, ib, environ, keep_blank_values,
651656
strict_parsing, limit,
652-
self.encoding, self.errors, max_num_fields)
657+
self.encoding, self.errors, max_num_fields, self.separator)
653658

654659
if max_num_fields is not None:
655660
max_num_fields -= 1

‎Lib/test/test_cgi.py

+24-5
Original file line numberDiff line numberDiff line change
@@ -53,12 +53,9 @@ def do_test(buf, method):
5353
("", ValueError("bad query field: ''")),
5454
("&", ValueError("bad query field: ''")),
5555
("&&", ValueError("bad query field: ''")),
56-
(";", ValueError("bad query field: ''")),
57-
(";&;", ValueError("bad query field: ''")),
5856
# Should the next few really be valid?
5957
("=", {}),
6058
("=&=", {}),
61-
("=;=", {}),
6259
# This rest seem to make sense
6360
("=a", {'': ['a']}),
6461
("&=a", ValueError("bad query field: ''")),
@@ -73,8 +70,6 @@ def do_test(buf, method):
7370
("a=a+b&b=b+c", {'a': ['a b'], 'b': ['b c']}),
7471
("a=a+b&a=b+a", {'a': ['a b', 'b a']}),
7572
("x=1&y=2.0&z=2-3.%2b0", {'x': ['1'], 'y': ['2.0'], 'z': ['2-3.+0']}),
76-
("x=1;y=2.0&z=2-3.%2b0", {'x': ['1'], 'y': ['2.0'], 'z': ['2-3.+0']}),
77-
("x=1;y=2.0;z=2-3.%2b0", {'x': ['1'], 'y': ['2.0'], 'z': ['2-3.+0']}),
7873
("Hbc5161168c542333633315dee1182227:key_store_seqid=400006&cuyer=r&view=bustomer&order_id=0bb2e248638833d48cb7fed300000f1b&expire=964546263&lobale=en-US&kid=130003.300038&ss=env",
7974
{'Hbc5161168c542333633315dee1182227:key_store_seqid': ['400006'],
8075
'cuyer': ['r'],
@@ -201,6 +196,30 @@ def test_strict(self):
201196
else:
202197
self.assertEqual(fs.getvalue(key), expect_val[0])
203198

199+
def test_separator(self):
200+
parse_semicolon = [
201+
("x=1;y=2.0", {'x': ['1'], 'y': ['2.0']}),
202+
("x=1;y=2.0;z=2-3.%2b0", {'x': ['1'], 'y': ['2.0'], 'z': ['2-3.+0']}),
203+
(";", ValueError("bad query field: ''")),
204+
(";;", ValueError("bad query field: ''")),
205+
("=;a", ValueError("bad query field: 'a'")),
206+
(";b=a", ValueError("bad query field: ''")),
207+
("b;=a", ValueError("bad query field: 'b'")),
208+
("a=a+b;b=b+c", {'a': ['a b'], 'b': ['b c']}),
209+
("a=a+b;a=b+a", {'a': ['a b', 'b a']}),
210+
]
211+
for orig, expect in parse_semicolon:
212+
env = {'QUERY_STRING': orig}
213+
fs = cgi.FieldStorage(separator=';', environ=env)
214+
if isinstance(expect, dict):
215+
for key in expect.keys():
216+
expect_val = expect[key]
217+
self.assertIn(key, fs)
218+
if len(expect_val) > 1:
219+
self.assertEqual(fs.getvalue(key), expect_val)
220+
else:
221+
self.assertEqual(fs.getvalue(key), expect_val[0])
222+
204223
def test_log(self):
205224
cgi.log("Testing")
206225

0 commit comments

Comments
 (0)
X