Python parser/compiler vs. interpreter, and string concatenation compile-time vs. run-time?

Question

At this spot in this article by one of the major Python people, the author notes that automatic string concatenation is a feature of the parser/compiler as opposed to the interpreter, which is why you must use + to concatenate strings at runtime.

I don't understand anything about that. I know you can concatenate with + and I know two string literals side by side are auto-concatenated and I know you of course can't do that with variables containing strings but I have no idea what the difference is between a parser/compiler and an interpreter (for python, or in general) and I have no idea how it ties in to this whole string concatenation thing.

Explanation???

python is an interpreted language, no compiler. so, everything is done at runtime by the interpreter. — SnakeDoc
– SnakeDoc, Commented Dec 26, 2013 at 21:41
@SnakeDoc: Python is a programming language; it is neither compiled nor interpreted, until you process the source code with a compiler or interpreter. — Robert Harvey
– Robert Harvey, Commented Dec 26, 2013 at 21:44
@SnakeDoc: As a matter of fact, the CPython implementation (the one most people use) isn't interpreted but bytecode-compiled. — Max Noel
– Max Noel, Commented Dec 26, 2013 at 21:46
@mgilson: My point is that compiled/interpreted is never as black and white as it seems, and the language itself is neither. However, since you asked: ironpython.net — Robert Harvey
– Robert Harvey, Commented Dec 26, 2013 at 22:07
CPU also reads bytecode and executes it instruction by instruction. — Cat Plus Plus
– Cat Plus Plus, Commented Dec 26, 2013 at 23:13

Tim Pietzcker · Accepted Answer · 2013-12-26 21:51:44Z

Python is an interpreted language (as opposed to languages like C++ that are compiled to machine code before execution).

Now there is an intermediate step: The source (text) files are compiled to bytecode, and that bytecode is then run by the Python interpreter.

Verbatim string concatenation (as in "a" "b" becoming "ab") is already done by the bytecode compiler. The same goes for "a" + "b" because the compiler can already figure out the literal values:

>>> import dis
>>> def s(): print "a" "b"
...
>>> dis.dis(s)
  1           0 LOAD_CONST               1 ('ab')
              3 PRINT_ITEM
              4 PRINT_NEWLINE
              5 LOAD_CONST               0 (None)
              8 RETURN_VALUE
>>> def s(): print "ab"
...
>>> dis.dis(s)
  1           0 LOAD_CONST               1 ('ab')
              3 PRINT_ITEM
              4 PRINT_NEWLINE
              5 LOAD_CONST               0 (None)
              8 RETURN_VALUE
>>> def s(): print "a"+"b"
...
>>> dis.dis(s)
  1           0 LOAD_CONST               3 ('ab')
              3 PRINT_ITEM
              4 PRINT_NEWLINE
              5 LOAD_CONST               0 (None)
              8 RETURN_VALUE

But for values that can't trivially be inferred at compile time, it's the interpreter's job to do the concatenation:

>>> def s(): print "a" + chr(98)
...
>>> dis.dis(s)
  1           0 LOAD_CONST               1 ('a')
              3 LOAD_GLOBAL              0 (chr)
              6 LOAD_CONST               2 (98)
              9 CALL_FUNCTION            1
             12 BINARY_ADD
             13 PRINT_ITEM
             14 PRINT_NEWLINE
             15 LOAD_CONST               0 (None)
             18 RETURN_VALUE
>>> s()
ab

Nice answer, but note that the case compiler vs interpreter is completely irrelevant here. Also, the same compile time string concatenation occurs in C/C++.
dis module leaps to the rescue yet again, explicating otherwise abstruse intricacies...
The peephole optimization for concatenating constants with the binary + operator is limited to length 20. Take a look at (lambda: 'aaaaaaaaaa' + 'bbbbbbbbbb' + 'c').__code__.co_consts.

Community · Accepted Answer · 2017-05-23 11:56:47Z

2

When Python code is being translated into byte-code side-by-side strings are being merged. This is done only once - every time you'll run the script without deleting the precompiled pyc the concatenation result will be there. Even without the precompiled file, the concatenation result will be placed in the byte-code, so still each time this code (e.g. a function) is being run there is no need to calculate the result of concatenation.

If you use + on the other hand, the byte-code will contain both strings, and the expression will be evaluated every time this code is being run. EDIT: not always as noted by Tim Pietzcker in his answer - however in such case it's a matter of compiler's optimization, not behaviour guaranteed to always happen by language semantics.

Note that because syntax is part of the language definition, the differentiation between compiler and interpreter is irrelevant here.

Reference: lexical analysis in Python

edited May 23, 2017 at 11:56

CommunityBot

11 silver badge

answered Dec 26, 2013 at 21:43

BartoszKP

36k15 gold badges109 silver badges135 bronze badges

10 Comments

Robert Harvey Over a year ago

This seems like the most likely answer.

Tim Pietzcker Over a year ago

@RobertHarvey: Unfortunately, it's not correct, though, as can be seen by disassembling the bytecode.

BartoszKP Over a year ago

@TimPietzcker I don't get it - in your bytecode there is one string after having "a" "b" in the code.

mgilson Over a year ago

@TimPietzcker -- Can you elaborate on exactly what is incorrect about this answer? I'm not seeing a problem here, but maybe I haven't read it closely enough...

Eryk Sun Over a year ago

The peephole optimization for concatenating constants with the binary + operator is limited to length 20. Take a look at (lambda: 'aaaaaaaaaa' + 'bbbbbbbbbb' + 'c').__code__.co_consts.

|

dstromberg · Accepted Answer · 2013-12-26 23:26:36Z

A compiled language (EG: C, C++) translates human-readable source code into machine-readable machine code.

An interpreted language (EG: old microsoft BASIC on 6502's) recomputes what a step needs to do, each time that step is executed.

A middle ground exists. Languages like Python and Java compile, but they don't compile to machine code; instead they compile to an idealised, software-only machine's byte code. This gives great portability, and decent speed, especially if combined with a JIT (Java, Pypy, CPython 2.[56] with psyco all JIT compile byte code).

Confusingly, Java people often say their language is compiled and that Python is not compiled, and there was some discussion a while back of implementing a Java Runtime Environment in hardware, though I'm not sure it ever materialized.

Also, gcj compiles Java source code to machine readable executables, as does Cython - among others. But Java and Python are both mostly byte-code interpreted.

Collectives™ on Stack Overflow

Python parser/compiler vs. interpreter, and string concatenation compile-time vs. run-time?

3 Answers 3

3 Comments

10 Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

10 Comments

Comments

Related