1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
|
.\" Copyright, the authors of the Linux man-pages project
.\"
.\" SPDX-License-Identifier: Linux-man-pages-copyleft
.\"
.TH fsopen 2 (date) "Linux man-pages (unreleased)"
.SH NAME
fsopen \- create a new filesystem context
.SH LIBRARY
Standard C library
.RI ( libc ,\~ \-lc )
.SH SYNOPSIS
.nf
.B #include <sys/mount.h>
.P
.BI "int fsopen(const char *" fsname ", unsigned int " flags );
.fi
.SH DESCRIPTION
The
.BR fsopen ()
system call is part of
the suite of file-descriptor-based mount facilities in Linux.
.P
.BR fsopen ()
creates a blank filesystem configuration context within the kernel
for the filesystem named by
.I fsname
and places it into creation mode.
A new file descriptor
associated with the filesystem configuration context
is then returned.
The calling process must have the
.B \%CAP_SYS_ADMIN
capability in order to create a new filesystem configuration context.
.P
A filesystem configuration context is
an in-kernel representation of a pending transaction,
containing a set of configuration parameters that are to be applied
when creating a new instance of a filesystem
(or modifying the configuration of an existing filesystem instance,
such as when using
.BR fspick (2)).
.P
After obtaining a filesystem configuration context with
.BR fsopen (),
the general workflow for operating on the context looks like the following:
.IP (1) 5
Pass the filesystem context file descriptor to
.BR fsconfig (2)
to specify any desired filesystem parameters.
This may be done as many times as necessary.
.IP (2)
Pass the same filesystem context file descriptor to
.BR fsconfig (2)
with
.B \%FSCONFIG_CMD_CREATE
to create an instance of the configured filesystem.
.IP (3)
Pass the same filesystem context file descriptor to
.BR fsmount (2)
to create a new detached mount object for
the root of the filesystem instance,
which is then attached to a new file descriptor.
(This also places the filesystem context file descriptor into
reconfiguration mode,
similar to the mode produced by
.BR fspick (2).)
Once a mount object has been created with
.BR fsmount (2),
the filesystem context file descriptor can be safely closed.
.IP (4)
Now that a mount object has been created,
you may
.RS
.IP \[bu] 3
use the detached mount object file descriptor as a
.I dirfd
argument to "*at()" system calls;
and/or
.IP \[bu]
attach the mount object to a mount point
by passing the mount object file descriptor to
.BR move_mount (2).
This will also prevent the mount object from
being unmounted and destroyed when
the mount object file descriptor is closed.
.RE
.IP
The mount object file descriptor will
remain associated with the mount object
even after doing the above operations,
so you may repeatedly use the mount object file descriptor with
.BR move_mount (2)
and/or "*at()" system calls
as many times as necessary.
.P
A filesystem context will move between different modes
throughout its lifecycle
(such as the creation phase
when created with
.BR fsopen (),
the reconfiguration phase
when an existing filesystem instance is selected with
.BR fspick (2),
and the intermediate "awaiting-mount" phase
.\" FS_CONTEXT_AWAITING_MOUNT is the term the kernel uses for this.
between
.B \%FSCONFIG_CMD_CREATE
and
.BR fsmount (2)),
which has an impact on
what operations are permitted on the filesystem context.
.P
The file descriptor returned by
.BR fsopen ()
also acts as a channel for filesystem drivers to
provide more comprehensive diagnostic information
than is normally provided through the standard
.BR errno (3)
interface for system calls.
If an error occurs at any time during the workflow mentioned above,
calling
.BR read (2)
on the filesystem context file descriptor
will retrieve any ancillary information about the encountered errors.
(See the "Message retrieval interface" section
for more details on the message format.)
.P
.I flags
can be used to control aspects of
the creation of the filesystem configuration context file descriptor.
A value for
.I flags
is constructed by bitwise ORing
zero or more of the following constants:
.RS
.TP
.B FSOPEN_CLOEXEC
Set the close-on-exec
.RB ( FD_CLOEXEC )
flag on the new file descriptor.
See the description of the
.B O_CLOEXEC
flag in
.BR open (2)
for reasons why this may be useful.
.RE
.P
A list of filesystems supported by the running kernel
(and thus a list of valid values for
.IR fsname )
can be obtained from
.IR /proc/filesystems .
(See also
.BR proc_filesystems (5).)
.SS Message retrieval interface
When doing operations on a filesystem configuration context,
the filesystem driver may choose to provide
ancillary information to userspace
in the form of message strings.
.P
The filesystem context file descriptors returned by
.BR fsopen ()
and
.BR fspick (2)
may be queried for message strings at any time by calling
.BR read (2)
on the file descriptor.
Each call to
.BR read (2)
will return a single message,
prefixed to indicate its class:
.RS
.TP
.BI e\~ message
An error message was logged.
This is usually associated with an error being returned
from the corresponding system call which triggered this message.
.TP
.BI w\~ message
A warning message was logged.
.TP
.BI i\~ message
An informational message was logged.
.RE
.P
Messages are removed from the queue as they are read.
Note that the message queue has limited depth,
so it is possible for messages to get lost.
If there are no messages in the message queue,
.B read(2)
will return \-1 and
.I errno
will be set to
.BR \%ENODATA .
If the
.I buf
argument to
.BR read (2)
is not large enough to contain the entire message,
.BR read (2)
will return \-1 and
.I errno
will be set to
.BR \%EMSGSIZE .
(See BUGS.)
.P
If there are multiple filesystem contexts
referencing the same filesystem instance
(such as if you call
.BR fspick (2)
multiple times for the same mount),
each one gets its own independent message queue.
This does not apply to multiple file descriptors that are
tied to the same underlying open file description
(such as those created with
.BR dup (2)).
.P
Message strings will usually be prefixed by
the name of the filesystem or kernel subsystem
that logged the message,
though this may not always be the case.
See the Linux kernel source code for details.
.SH RETURN VALUE
On success, a new file descriptor is returned.
On error, \-1 is returned, and
.I errno
is set to indicate the error.
.SH ERRORS
.TP
.B EFAULT
.I fsname
is NULL
or a pointer to a location
outside the calling process's accessible address space.
.TP
.B EINVAL
.I flags
had an invalid flag set.
.TP
.B EMFILE
The calling process has too many open files to create more.
.TP
.B ENFILE
The system has too many open files to create more.
.TP
.B ENODEV
The filesystem named by
.I fsname
is not supported by the kernel.
.TP
.B ENOMEM
The kernel could not allocate sufficient memory to complete the operation.
.TP
.B EPERM
The calling process does not have the required
.B \%CAP_SYS_ADMIN
capability.
.SH STANDARDS
Linux.
.SH HISTORY
Linux 5.2.
.\" commit 24dcb3d90a1f67fe08c68a004af37df059d74005
.\" commit 400913252d09f9cfb8cce33daee43167921fc343
glibc 2.36.
.SH BUGS
.SS Message retrieval interface and EMSGSIZE
As described in the "Message retrieval interface" subsection above,
calling
.BR read (2)
with too small a buffer to contain
the next pending message in the message queue
for the filesystem configuration context
will cause
.BR read (2)
to return \-1 and set
.BR errno (3)
to
.BR \%EMSGSIZE .
.P
However,
this failed operation still
consumes the message from the message queue.
This effectively discards the message silently,
as no data is copied into the
.BR read (2)
buffer.
.P
Programs should take care to ensure that
their buffers are sufficiently large
to contain any reasonable message string,
in order to avoid silently losing valuable diagnostic information.
.\" Aleksa Sarai
.\" This unfortunate behaviour has existed since this feature was merged, but
.\" I have sent a patchset which will finally fix it.
.\" <https://lore.kernel.org/r/20250807-fscontext-log-cleanups-v3-1-8d91d6242dc3@cyphar.com/>
.SH EXAMPLES
To illustrate the workflow for creating a new mount,
the following is an example of how to mount an
.BR ext4 (5)
filesystem stored on
.I /dev/sdb1
onto
.IR /mnt .
.P
.in +4n
.EX
int fsfd, mntfd;
\&
fsfd = fsopen("ext4", FSOPEN_CLOEXEC);
fsconfig(fsfd, FSCONFIG_SET_FLAG, "ro", NULL, 0);
fsconfig(fsfd, FSCONFIG_SET_PATH, "source", "/dev/sdb1", AT_FDCWD);
fsconfig(fsfd, FSCONFIG_SET_FLAG, "noatime", NULL, 0);
fsconfig(fsfd, FSCONFIG_SET_FLAG, "acl", NULL, 0);
fsconfig(fsfd, FSCONFIG_SET_FLAG, "user_xattr", NULL, 0);
fsconfig(fsfd, FSCONFIG_SET_FLAG, "iversion", NULL, 0)
fsconfig(fsfd, FSCONFIG_CMD_CREATE, NULL, NULL, 0);
mntfd = fsmount(fsfd, FSMOUNT_CLOEXEC, MOUNT_ATTR_RELATIME);
move_mount(mntfd, "", AT_FDCWD, "/mnt", MOVE_MOUNT_F_EMPTY_PATH);
.EE
.in
.P
First,
an ext4 configuration context is created and attached to the file descriptor
.IR fsfd .
Then, a series of parameters
(such as the source of the filesystem)
are provided using
.BR fsconfig (2),
followed by the filesystem instance being created with
.BR \%FSCONFIG_CMD_CREATE .
.BR fsmount (2)
is then used to create a new mount object attached to the file descriptor
.IR mntfd ,
which is then attached to the intended mount point using
.BR move_mount (2).
.P
The above procedure is functionally equivalent to
the following mount operation using
.BR mount (2):
.P
.in +4n
.EX
mount("/dev/sdb1", "/mnt", "ext4", MS_RELATIME,
"ro,noatime,acl,user_xattr,iversion");
.EE
.in
.P
And here's an example of creating a mount object
of an NFS server share
and setting a Smack security module label.
However, instead of attaching it to a mount point,
the program uses the mount object directly
to open a file from the NFS share.
.P
.in +4n
.EX
int fsfd, mntfd, fd;
\&
fsfd = fsopen("nfs", 0);
fsconfig(fsfd, FSCONFIG_SET_STRING, "source", "example.com/pub", 0);
fsconfig(fsfd, FSCONFIG_SET_STRING, "nfsvers", "3", 0);
fsconfig(fsfd, FSCONFIG_SET_STRING, "rsize", "65536", 0);
fsconfig(fsfd, FSCONFIG_SET_STRING, "wsize", "65536", 0);
fsconfig(fsfd, FSCONFIG_SET_STRING, "smackfsdef", "foolabel", 0);
fsconfig(fsfd, FSCONFIG_SET_FLAG, "rdma", NULL, 0);
fsconfig(fsfd, FSCONFIG_CMD_CREATE, NULL, NULL, 0);
mntfd = fsmount(fsfd, 0, MOUNT_ATTR_NODEV);
fd = openat(mntfd, "src/linux-5.2.tar.xz", O_RDONLY);
.EE
.in
.P
Unlike the previous example,
this operation has no trivial equivalent with
.BR mount (2),
as it was not previously possible to create a mount object
that is not attached to any mount point.
.SH SEE ALSO
.BR fsconfig (2),
.BR fsmount (2),
.BR fspick (2),
.BR mount (2),
.BR mount_setattr (2),
.BR move_mount (2),
.BR open_tree (2),
.BR mount_namespaces (7)
|