1

At this spot in this article by one of the major Python people, the author notes that automatic string concatenation is a feature of the parser/compiler as opposed to the interpreter, which is why you must use + to concatenate strings at runtime.

I don't understand anything about that. I know you can concatenate with + and I know two string literals side by side are auto-concatenated and I know you of course can't do that with variables containing strings but I have no idea what the difference is between a parser/compiler and an interpreter (for python, or in general) and I have no idea how it ties in to this whole string concatenation thing.

Explanation???

11
  • 1
    python is an interpreted language, no compiler. so, everything is done at runtime by the interpreter. Commented Dec 26, 2013 at 21:41
  • 3
    @SnakeDoc: Python is a programming language; it is neither compiled nor interpreted, until you process the source code with a compiler or interpreter. Commented Dec 26, 2013 at 21:44
  • 2
    @SnakeDoc: As a matter of fact, the CPython implementation (the one most people use) isn't interpreted but bytecode-compiled. Commented Dec 26, 2013 at 21:46
  • 1
    @mgilson: My point is that compiled/interpreted is never as black and white as it seems, and the language itself is neither. However, since you asked: ironpython.net Commented Dec 26, 2013 at 22:07
  • 2
    CPU also reads bytecode and executes it instruction by instruction. Commented Dec 26, 2013 at 23:13

3 Answers 3

6

Python is an interpreted language (as opposed to languages like C++ that are compiled to machine code before execution).

Now there is an intermediate step: The source (text) files are compiled to bytecode, and that bytecode is then run by the Python interpreter.

Verbatim string concatenation (as in "a" "b" becoming "ab") is already done by the bytecode compiler. The same goes for "a" + "b" because the compiler can already figure out the literal values:

>>> import dis
>>> def s(): print "a" "b"
...
>>> dis.dis(s)
  1           0 LOAD_CONST               1 ('ab')
              3 PRINT_ITEM
              4 PRINT_NEWLINE
              5 LOAD_CONST               0 (None)
              8 RETURN_VALUE
>>> def s(): print "ab"
...
>>> dis.dis(s)
  1           0 LOAD_CONST               1 ('ab')
              3 PRINT_ITEM
              4 PRINT_NEWLINE
              5 LOAD_CONST               0 (None)
              8 RETURN_VALUE
>>> def s(): print "a"+"b"
...
>>> dis.dis(s)
  1           0 LOAD_CONST               3 ('ab')
              3 PRINT_ITEM
              4 PRINT_NEWLINE
              5 LOAD_CONST               0 (None)
              8 RETURN_VALUE

But for values that can't trivially be inferred at compile time, it's the interpreter's job to do the concatenation:

>>> def s(): print "a" + chr(98)
...
>>> dis.dis(s)
  1           0 LOAD_CONST               1 ('a')
              3 LOAD_GLOBAL              0 (chr)
              6 LOAD_CONST               2 (98)
              9 CALL_FUNCTION            1
             12 BINARY_ADD
             13 PRINT_ITEM
             14 PRINT_NEWLINE
             15 LOAD_CONST               0 (None)
             18 RETURN_VALUE
>>> s()
ab
Sign up to request clarification or add additional context in comments.

3 Comments

Nice answer, but note that the case compiler vs interpreter is completely irrelevant here. Also, the same compile time string concatenation occurs in C/C++.
dis module leaps to the rescue yet again, explicating otherwise abstruse intricacies...
The peephole optimization for concatenating constants with the binary + operator is limited to length 20. Take a look at (lambda: 'aaaaaaaaaa' + 'bbbbbbbbbb' + 'c').__code__.co_consts.
2

When Python code is being translated into byte-code side-by-side strings are being merged. This is done only once - every time you'll run the script without deleting the precompiled pyc the concatenation result will be there. Even without the precompiled file, the concatenation result will be placed in the byte-code, so still each time this code (e.g. a function) is being run there is no need to calculate the result of concatenation.

If you use + on the other hand, the byte-code will contain both strings, and the expression will be evaluated every time this code is being run. EDIT: not always as noted by Tim Pietzcker in his answer - however in such case it's a matter of compiler's optimization, not behaviour guaranteed to always happen by language semantics.

Note that because syntax is part of the language definition, the differentiation between compiler and interpreter is irrelevant here.

Reference: lexical analysis in Python

10 Comments

This seems like the most likely answer.
@RobertHarvey: Unfortunately, it's not correct, though, as can be seen by disassembling the bytecode.
@TimPietzcker I don't get it - in your bytecode there is one string after having "a" "b" in the code.
@TimPietzcker -- Can you elaborate on exactly what is incorrect about this answer? I'm not seeing a problem here, but maybe I haven't read it closely enough...
The peephole optimization for concatenating constants with the binary + operator is limited to length 20. Take a look at (lambda: 'aaaaaaaaaa' + 'bbbbbbbbbb' + 'c').__code__.co_consts.
|
0

A compiled language (EG: C, C++) translates human-readable source code into machine-readable machine code.

An interpreted language (EG: old microsoft BASIC on 6502's) recomputes what a step needs to do, each time that step is executed.

A middle ground exists. Languages like Python and Java compile, but they don't compile to machine code; instead they compile to an idealised, software-only machine's byte code. This gives great portability, and decent speed, especially if combined with a JIT (Java, Pypy, CPython 2.[56] with psyco all JIT compile byte code).

Confusingly, Java people often say their language is compiled and that Python is not compiled, and there was some discussion a while back of implementing a Java Runtime Environment in hardware, though I'm not sure it ever materialized.

Also, gcj compiles Java source code to machine readable executables, as does Cython - among others. But Java and Python are both mostly byte-code interpreted.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.