7

Trying to understand the behaviour of the environment in Linux (Ubuntu 13.04 concretely), I've find different situations where setting envirionment variables are used or defined for/in different contexts. For example, if I check, locale, I get:

$ locale
LANG=en_US.UTF-8
LANGUAGE=es_ES:es_HN:es_EC:en
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC=es_ES.UTF-8
// more output

But, if I find, for example, LC_CTYPE using env | grep "LC_CTYPE", it sends no output. In general, locale shows me 13 LC_* variables and env only nine:

$ locale | grep "LC_*" | wc -l
13
$ env | grep "LC_*" | wc -l
9

Other variable with different "nature" is PS1. For example:

$ env | grep "PS1" # No output, but...
$ set | grep "PS1" | head -n 1
PS1=$'\\[\\033[1;33m\\][\\t][\\W]\342\230\233\\[\\033[0m\\] '

and of course, PS1 is a well-defined variable in my current environment, since I see my prompt changed accordingly.

Other way of viewing environment variables in other context is by means of strace. It is a program which allows you to see what's going on when you execute a program. Sample:

$ strace -v ./a.out # a.out is a common Hello World, made in C.
execve("./a.out", ["./a.out"], ["LC_PAPER=es_ES.UTF-8", ...]) = 0
brk(0)
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
# etc, etc
write(1, "Hello World\n", 12Hello World
)           = 12
exit_group(0)                           = ?

The first thing the shell makes when executing a program, is call to execve, which does really call a program. Its first argument is the program being called, the second is the argv params of the called program, and the third parameter is the environment variables.

In that 3rd parameter, for example, doesn't appear either PS1 or LC_TYPE.

In general, variables appearing in env or set appear in the list of environment variables sent to execve. Some locale variables appear in env or set but others not (LC_TYPE, LC_COLLATE and LC_MESSAGE, as well as LC_ALL but with an empty value). Lastly, other variables are not defined in env although they have a visible effect (PS1), as reflected by set.

What's going on here? What are the differences between env, set (without arguments), locale (obviously respect to locale variables only)?

3 Answers 3

15

The primary issue here -- which accounts for why, e.g., $PS1 is not reported by env -- is that env is reporting from a non-interactive environment. Processes are executed from a fork of your interactive shell, but there's a subtlety involved in how their environment is set: It's actually inherited via a native C level external variable set for all exec()'d processes (see man environ). Here's an illustration:

#include <stdio.h>

extern char **environ;

int main (void) {
    int i;
    for (i = 0; environ[i] != NULL; i++) {
        printf("%s\n", environ[i]);
    }
    return 0;
}      

What's interesting about this is, if you compile and run it, you'll find the contents of **environ exactly match the one reported by env:

$ gcc test.c
$ ./a.out > aout.txt
$ env > env.txt
$ diff env.txt aout.txt
68c68
< _=/bin/env
---
> _=./a.out

The only difference is the name of the executable. So where does **environ come from and why doesn't it contain, e.g., $PS1?

The fundamental explanation is that process are always created as children of other processes and they inherit **environ, but PS1 was never part of it. At start up, a shell may source variables from standard places, and those places differ depending on whether the shell is interactive or not; see INVOCATION in man bash. An aspect of this is that:

PS1 is set [...] if bash is interactive, allowing a shell script or a startup file to test this state.

Now, notice in /etc/bashrc something like this:

# are we an interactive shell?
if [ "$PS1" ]; then

Which is where your actual (fancy) prompt is set, and neither it nor the initial value of $PS1 were ever exported. The initial value was created by the shell at invocation because it was interactive, and then it sourced then that file -- but PS1 did not get put into **environ. You can see this if you execute:

#!/bin/sh

echo $PS1

Nothing -- even though if you echo $PS1 in your interactive shell it's defined. This is because the **environ of the executed #!/bin/sh is the same as that of the parent interactive shell, but that does NOT contain PS1. This implies each shell uses an internal table of global variables separate, but originally populated, from **environ (this is confusing, since it means **environ does not include many things referred to as environment variables).

The contents of **environ are in in /proc/[PID]/environ, and if you check that for your current interactive shell, cat /proc/$BASHPID/environ, you'll see PS1 is not there.

But how does stuff get into "environ"?

The simple answer is, via system calls. For example, if we throw some stuff into the example C program from earlier:

#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

extern char **environ;

int main (void) {
    int i;
    if (putenv("MYFOO=whatbar?")) {
        fprintf(stderr, "putenv() failed: %s\n", strerror(errno));
        exit(1);
    }

    for (i = 0; environ[i] != NULL; i++) {
        printf("%s\n", environ[i]);
    }

    return 0;
}           

MYFOO=whatbar? will be in the output (see man putenv). Since the shell creates processes by fork()ing (which duplicates the parent's memory stack) and then calling execv() (which passes on the duplicated **environ), we can see a mechanism by which environment variables may be exported to child processes.

If you throw a fork() into that example, you'll see this is the case, and (to reiterate), this process of fork'ing and potentially exec'ing is how child processes are created and inherit **environ from their ancestors. exec calls replace the process image, but as per man execv and man environ (nb. some versions of the former do not refer to this), **environ is passed on by the system.

Here's a literal fork and exec of /usr/bin/env with MYFOO=whatbar? exported via putenv():

#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>

extern char **environ;

int main (void) {
    pid_t pid;

    if (putenv("MYFOO=whatbar?")) {
        fprintf(stderr, "putenv() failed: %s\n", strerror(errno));
        exit(1);
    }

    pid_t pid = fork();
    if (!pid) execl("/usr/bin/env", "env", NULL);

    return 0;
}         

So where's the stuff that's not in "environ"?

It's private data of a particular shell instance. Bash will show you this + the inherited environ stuff via set with no arguments. Note this output also includes sourced functions.

But, if I find, for example, LC_CTYPE using env | grep "LC_CTYPE", it sends no output. In general, locale shows me 13 LC_* variables and env only nine:

I get no LC_ variables at all from env (just LANG) but 13 from locale. I would presume these are variables set by a locale call and not exported; the fact that you get any from env perhaps reflects a naive error in some configuration somewhere.

6
  • Why do you draw the line around "non-interactive subshells"? PS1="foo "; bash Is the new bash not an "interactive subshell"? Commented Apr 6, 2014 at 14:26
  • Right, but that's not my point. My point is that interactive and non-interactive "subshells" (i.e. explicitely called shells) behave the same way. Commented Apr 6, 2014 at 14:30
  • I think you look at the wrong area here, mix up the two. The difference between exported and non-exported variables ends at the execve(). At that point new processes are all the same, whether it's env or bash. What happens after the execve() is special to the program. And it is irrelevant whether the program has been started from a shell or whatever. Shell invocation is IMHO completely irrelevant for understanding the export mechanism. Commented Apr 6, 2014 at 14:41
  • @Peregring-lk : All apologies for the previous ambiguities. After being pushed by Mr. Laging, I did some deeper research and experimentation to discover where the apparent dependencies in environment come from, and have updated my answer with those, I think in-depth and definitive, findings!
    – goldilocks
    Commented Apr 6, 2014 at 18:36
  • @TAFKA'goldilocks' Thanks for your complete replay; and finally, where can I see non-exported variables? In other words, where are grouped all variables including those not collected by env?
    – ABu
    Commented Apr 6, 2014 at 20:42
4

The shell knows two types of variables:

  1. "internal" variables which are known to the shell only (and to subshells)

  2. exported variables, the "official" ones which are seen by execve and thus by env. The shell builtin export shows you the exported variables.

If you execute

export PS1

and repeat

env | grep "PS1"

then you see it. Variables can be exported during creation (export foo=bar instead of foo=bar), they can be exported automatically on creation or modification (set -a), they can be exported later (var=foo; ...; export var) and they can be "unexported" (export -n var).

If the shell creates "real" subshells (by a|b, (a;b), $(a) and so on) it keeps several non-exported variables in order to avoid chaos.

1
  • +1 The fact that subshells inherit non-exported variables that other child processes do not is an interesting case, I guess accounted for by the fact that a subshell is a fork with no exec. This also means "subshell" does not refer to subprocess shells executed via a shebang. Thanks for arguing some of these points with me!
    – goldilocks
    Commented Apr 6, 2014 at 23:00
4

The output from the locale command is not a list of environment variables from the current environment. It is a display of that process' effective locale settings (which is influenced in part by certain environment variables) and is presented in the same key=value format that the env command uses.

You can see the source for the eglibc locale command implementation here: http://www.eglibc.org/cgi-bin/viewvc.cgi/branches/eglibc-2_19/libc/locale/programs/locale.c?view=markup

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.