17
\$\begingroup\$

Given a path to a file or a directory, detect whether the path can traverse to outside the current working directory, or attempt to traverse to the "parent directory" of the root directory.

This challenge is about implementing a loose filter of paths that may perform a directory traversal attack.

(There is a stricter variant of the filter, implemented already in programs like GNU tar, that disallows all occurrences of .. as pathname components. I chose to use a loose filter for a coding challenge to be fun. 🙂)

Input

A string representing a Unix file path. For simplicity of this challenge, the path is assumed to never contain any control character ([0x00, 0x1F] ∪ 0x7F), newline, or space (0x20). The input string would also be non-empty. A forward slash (/, 0x2F) is used as a path separator.

The input path may be relative or absolute. An absolute path is indicated by the leading slash in the path string (for example, /usr/share/misc).

  • If there are multiple consecutive slashes in the path, they are treated as if there is only one slash (footnote 1). Example: a//b///c////d///// is treated the same as a/b/c/d/.
  • A directory name of . (dot) refers to the same directory as the previous part of the file path. The . directory at the beginning of the file path refers to the working directory. The . doesn't increase any level of directories in the filesystem hierarchy. Example: ./e/./f/././g is the same as e/f/g.
  • A directory name of .. (dot-dot) refers to the parent directory of the directory previously specified in the file path. It decreases one level of directories in the filesystem hierarchy. Example: h/i/../j/../k/l/../../m is the same as h/m provided that h/i, h/j and h/k/l are all directories and not any other kind of file. POSIX has set an exception that the "parent directory" of the root directory "may refer to the root directory itself" (reference), however for this challenge you must report such path as an attack attempt (see below).

Output

A boolean value (for each input, if your program or function allows multiple path inputs):

True (or "attack") if the path would traverse to outside the working directory when resolved, or, in the case of an absolute path, would traverse to outside the root directory if there were no guard about the "parent directory of root directory" above. (In other words, assume the "chroot jail" is weak that it can break.)

False (or "no-attack") if the path would remain inside (i.e. never traverse out) the working directory for relative path, or inside the (jailed) root directory for an absolute path when resolved.

  • You may output in other data types as appropriate for your language that can represent boolean states (e.g. zero and nonzero).
  • You may swap the True and False meanings in the output of your submission. Please mention if you do swap them in your submission. (And perhaps give us a reason, such as whether it can save more bytes with the True and False meanings swapped.)

(Note: Just because the result is "false" or "no-attack" doesn't mean the path is always "safe" against other problems (e.g. spoofing).)

(There is no need to print out the resolved file path. That is another challenge.)

Additional requirements

  • Except for standard input and output streams, there should be no access to the file system during program execution. (If you use mkdir(2) or chdir(2) then your submission entry would be non-competing.) This challenge is about parsing of the path strings, not about actual traversal.
  • Assume all filenames specified in the input paths exist and are directories. (Note: In a stricter filtering scenario, a path like a/../b/c/../.. would be disallowed as that checks the presence of a, b and b/c as directories and would look like an attack attempt.)
  • Support of non-ASCII character sets (such as UTF-8) is optional.
  • Using built-in functions to resolve paths is allowed, but you must indicate so in your submission. I'm not sure if any built-in from a language would trivialize the problem, but I don't want to ban such submissions if they can provide informative examples for real-world uses.
  • . Aim for code size as small as possible.

Test data

.             -> False
...           -> False (footnote 2)
..a           -> False
./            -> False
.//           -> False
a             -> False
a/b           -> False
a//b          -> False
a/./b         -> False
a/..          -> False
a/...         -> False (footnote 2)
a/.../..      -> False (footnote 2)
a/..b         -> False
a/../b/..     -> False
a/b/../..     -> False
..            -> True
../a/         -> True
./..          -> True
a/../..       -> True
a/../../b     -> True
a/../../b/c   -> True
a/../b/../..  -> True
a/./../..     -> True
a/b/../../..  -> True

/             -> False
//            -> False (footnote 1)
/a/..         -> False
/..           -> True
/a/../..      -> True

C:/           -> False (footnote 3)
C:/..         -> False (footnote 3)
https://      -> False (footnote 3)
https://..    -> False (footnote 3)
C:/../..      -> True (footnote 3)
https://../.. -> True (footnote 3)

(Updated to add more test cases suggested by @doubleunary and @Themoonisacheese, thank you all.)

Footnotes

  1. POSIX reserves the paths with two leading slashes to be special and "may be interpreted in an implementation-defined manner" (reference). See also this question in Unix/Linux Stack Exchange. For this challenge we ignore such special interpretation.
  2. Unlike in DOS, a directory name of three dots (...) is not special in Unix systems.
  3. Unix filesystems allow colons in filenames, so these examples refer to ordinary directories, not DOS paths or URLs.
\$\endgroup\$
7
  • \$\begingroup\$ Suggested test cases: a/../../b/c and ./.. \$\endgroup\$ Commented Jan 26 at 15:15
  • \$\begingroup\$ Suggested test case: a/b/../.. -> false and a/b/../../.. ->true \$\endgroup\$ Commented Jan 26 at 15:35
  • 1
    \$\begingroup\$ I don't think it's actually possible to do this with regex (or at least, i'm not good enough to do it), but here's my attempt (notably fails on the test cases I suggested) regex101.com/r/S4yiTE/1 \$\endgroup\$ Commented Jan 26 at 16:06
  • 2
    \$\begingroup\$ Note that the text of the challenge is rather ambiguous: "False if the path would resolve to inside the working directory" suggests that a path that initially moves up & out of the working directory, but that finally descends back inside it would be Ok, but it's clear from the test cases that it is an attack. I was quite uncertain until I went through all the test cases. Suggest changing 'resolve to' into 'aways remain' or similar. \$\endgroup\$ Commented Jan 27 at 8:53
  • 1
    \$\begingroup\$ @DominicvanEssen Without knowing the name of the working directory you cannot technically traverse back after you went outside the working directory. But it's good to clarify it anyway. Thanks for suggestion. \$\endgroup\$ Commented Jan 27 at 9:48

11 Answers 11

5
\$\begingroup\$

Uiua, 27 characters

˜∊¯1\+˜--⟜¬⊜(∩⌟≍⊸˙⊂".")⊸≠@/

Explanation:

  • ⊜(∩⌟≍⊸˙⊂".")⊸≠@/ Split on slashes, ignoring empty segments, and check each segment for equality with ".." and "."
  • ˜--⟜¬ Convert ".." to negative one, "." to zero, and anything else to positive one
  • \+ Cumulative sum
  • ˜∊¯1 Is negative one part of the result?
\$\endgroup\$
5
\$\begingroup\$

R, 76 75 bytes

or 68 bytes in R≥4.1 using \ instead of function

function(x)all(cumsum((el(strsplit(paste0("/",x),"/(\\./|/)*"))=="..")-.5))

Try it online!

Outputs FALSE for attack, TRUE for non-attack.
Prepends / and splits on path elements, counting the first (empty) element as a directory, then calculates cumulative sum of up (+.5) and down moves (-.5). If this is ever zero, it's an attack.


R, 130 bytes

function(x)grepl("^((?:^|/)*([^/.][^/]*|\\.(?:\\.[^/.]|[^/.])[^/]*)(?1)*((?:^|/)*\\.\\.(?=$|/)))*((/|\\./)*\\.\\.(?:$|/))",x,pe=T)

Try it online!

Regexp-only solution.
Uses the recursive regexp ^((?:^|/)*([^/.][^/]*|\\.(?:\\.[^/.]|[^/.])[^/]*)(?1)*((?:^|/)*\\.\\.(?=$|/)))* to match balanced sets of down + up moves, and then checks using ((/|\\./)*\\.\\.(?:$|/)) whether the next move is up.

\$\endgroup\$
3
  • \$\begingroup\$ I tried writing a regex for it (see comments on main post) but i don't think this is actually possible if you allow infinite a/b/[n folders deep]/[n times ..]/../../.. \$\endgroup\$ Commented Jan 26 at 16:11
  • 2
    \$\begingroup\$ i stand corrected. +1 for regex solution \$\endgroup\$ Commented Jan 27 at 9:44
  • \$\begingroup\$ @Themoonisacheese regular regex can't balance, but recursive and balancing regex can. (In this case balancing is shorter of course, but there are challenges where recursion is shorter.) \$\endgroup\$ Commented Jan 28 at 10:57
4
\$\begingroup\$

Python using os.path.normpath, 58 bytes

-6 thanks to @Toby Speight

lambda p:"/../"in"/%s/"%os.path.normpath("./"+p)
import os

Attempt This Online!

Former Python using os.path.normpath, 67 bytes

lambda p:".."in os.path.normpath(p.strip("/")).split("/")
import os

Attempt This Online!

\$\endgroup\$
0
3
\$\begingroup\$

Retina 0.8.2, 32 bytes

S`/
A`^\.?$
%M`^\.\.$
+`0¶1¶

^1

Try it online! Link includes test cases. Explanation:

S`/

Split the path into components.

A`^\.?$

Delete empty and . components.

%M`^\.\.$

Map .. components to 1 and other components to 0.

+`0¶1¶

Remove non-.. components with their following .. component.

^1

See whether the now leading component was originally ...

I was able to do this in a single 58-byte .NET regex:

^/*((?(?!\.(/|$))(?(?=\.\.(/|$))(?<-2>)|()))[^/]+(/+|$))*$

I would like to shorten it but I can't think of a better way of handling . and ...

\$\endgroup\$
3
\$\begingroup\$

Google Sheets, 60 bytes

=min(scan(,split(A1,"/"),lambda(a,c,a+(c<>".")-2*(c=".."))))

Splits the path into components and uses Boolean arithmetic to calculate a cumulative running count of directory names. The count stays the same (±0) when a name equals ., increases (+1) when it differs from ., and decreases (-1) when it equals ... The array of cumulative running counts is then aggregated to its minimum value.

Gets the max negative depth for true and an integer ≥ 0 for false. To get a clean Boolean result, append <0.

screenshot

Prettified:

=min(
  scan(, split(A1, "/"), lambda(a, c,
    a + (c <> ".") - 2 * (c = "..")
  ))
)
\$\endgroup\$
2
  • \$\begingroup\$ This made me laugh. I don't think I've ever seen a Google Sheets answer before. Good job 👍 \$\endgroup\$ Commented Jan 28 at 22:29
  • 1
    \$\begingroup\$ The Google Sheets formula language is nowadays Turing complete thanks to the let() and lambda() functions. See How to implement recursion in a Google Sheets formula for some examples. \$\endgroup\$ Commented Jan 29 at 7:46
2
\$\begingroup\$

Jelly, 17 bytes

ṣ”/¹Ƈ“.“..”iⱮ’ÄRƇ

A monadic Link that accepts a list of characters and yields an empty list (falsey) if we remain inside, or a non-empty list (truthy) if we don't.

Try it online!

How?

ṣ”/¹Ƈ“.“..”iⱮ’ÄRƇ - Link: list of characters, S
ṣ”/               - split at '/' characters
   ¹Ƈ             - keep those which are non-empty
            Ɱ     - map across the remaining parts with:
     “.“..”i      -   1-index of part in [".", ".."], or 0 if not found
             ’    - decrement these values -> -1 (down), 0, or 1 (up)
              Ä   - cumulative sums
               RƇ - keep those which are positive
\$\endgroup\$
2
\$\begingroup\$

Haskell, 73 bytes

elem 1.scanl1(+).map h.words.map f;h".."=1;h"."=0;h _= -1;f '/'=' ';f a=a

Try it online!

Going to a parent increases the accumulator by 1 and going to a child by -1, if we see anything greater than 0 we have gone above the root and return True.

\$\endgroup\$
1
\$\begingroup\$

Dyalog APL, 41 bytes

{¯1∊+\¯1 0 1[('..')(,'.')⍳'/'(≠⊆⊢)'/',⍵]}

Try it online!

Explanation:

{¯1∊+\¯1 0 1[('..')(,'.')⍳'/'(≠⊆⊢)'/',⍵]}­⁡​‎‎⁡⁠⁡‏‏​⁡⁠⁡‌⁢​‎‎⁡⁠⁣⁢⁣‏‏​⁡⁠⁡‌⁣​‎‎⁡⁠⁣⁡⁣‏⁠‎⁡⁠⁣⁡⁤‏⁠‎⁡⁠⁣⁢⁡‏⁠‎⁡⁠⁣⁢⁢‏‏​⁡⁠⁡‌⁤​‎‎⁡⁠⁢⁤⁢‏⁠‎⁡⁠⁢⁤⁣‏⁠‎⁡⁠⁢⁤⁤‏⁠‎⁡⁠⁣⁡⁡‏⁠‎⁡⁠⁣⁡⁢‏‏​⁡⁠⁡‌⁢⁡​‎‎⁡⁠⁢⁣⁣‏⁠‎⁡⁠⁢⁣⁤‏⁠‎⁡⁠⁢⁤⁡‏‏​⁡⁠⁡‌⁢⁢​‎⁠‎⁡⁠⁢⁣‏⁠‎⁡⁠⁢⁤‏⁠‎⁡⁠⁣⁡‏⁠‎⁡⁠⁣⁢‏⁠‎⁡⁠⁣⁣‏⁠‎⁡⁠⁣⁤‏⁠‎⁡⁠⁤⁡‏⁠‎⁡⁠⁤⁢‏⁠‎⁡⁠⁤⁣‏⁠‎⁡⁠⁤⁤‏⁠‎⁡⁠⁢⁡⁡‏⁠‎⁡⁠⁢⁡⁢‏⁠‎⁡⁠⁢⁡⁣‏⁠‎⁡⁠⁢⁡⁤‏⁠‎⁡⁠⁢⁢⁡‏⁠‎⁡⁠⁢⁢⁢‏⁠‎⁡⁠⁢⁢⁣‏⁠‎⁡⁠⁢⁢⁤‏⁠‎⁡⁠⁢⁣⁡‏⁠‎⁡⁠⁢⁣⁢‏⁠⁠‎⁡⁠⁣⁢⁤‏‏​⁡⁠⁡‌⁢⁣​‎‎⁡⁠⁢⁡‏⁠‎⁡⁠⁢⁢‏‏​⁡⁠⁡‌⁢⁤​‎‎⁡⁠⁢‏⁠‎⁡⁠⁣‏⁠‎⁡⁠⁤‏‏​⁡⁠⁡‌⁣⁡​‎‎⁡⁠⁣⁣⁡‏‏​⁡⁠⁡‌­
{                                          ⍝ ‎⁡Begin function.
                                      ⍵    ⍝ ‎⁢The original input...
                                  '/',     ⍝ ‎⁣with a slash prepended.
                                           ⍝ ‎⁣This is to avoid RANK ERROR with the input '/'.
                             (≠⊆⊢)         ⍝ ‎⁤Split the string...
                          '/'              ⍝ ‎⁢⁡by slash characters.
      ¯1 0 1[('..')(,'.')⍳             ]   ⍝ ‎⁢⁢Replace '..' with -1, '.' with 0, anything else with 1.
    +\                                     ⍝ ‎⁢⁣Cumulative sum.
 ¯1∊                                       ⍝ ‎⁢⁤Does it contain -1?
                                        }  ⍝ ‎⁣⁡End of function.
💎

Created with the help of Luminespire.

\$\endgroup\$
1
\$\begingroup\$

R, 61 bytes

Using regex as suggested by Dominic.

\(x)grepl("^(\\.\\.|(/?|.*:)\\.\\.)(/|$)|.*/\\.\\./\\.\\.",x)

Attempt This Online!

\$\endgroup\$
4
  • \$\begingroup\$ Great. Would you mind posting an explanation? I'm terrible at regexes, but I'd like to become better (or at least be able to understand them better...). \$\endgroup\$ Commented Jan 26 at 18:10
  • 1
    \$\begingroup\$ It also isn't clear to me whether this works for 'deep' paths: try it... \$\endgroup\$ Commented Jan 26 at 18:16
  • \$\begingroup\$ The language is not regular, so I don't think the simple regex can work. But if there is an extended language that is capable of checking the matching brackets or parentheses it might be capable of doing this challenge. \$\endgroup\$ Commented Jan 27 at 12:12
  • \$\begingroup\$ Fail: f('a/b/../..') -> TRUE (expected FALSE) \$\endgroup\$ Commented Jan 27 at 15:53
1
\$\begingroup\$

Charcoal, 17 bytes

F⪪S/≡№..ι⁰M→¹«¬ⅈ←

Try it online! Link is to verbose version of code. Outputs a Charcoal boolean, i.e. - if an attack was detected, nothing if not. Explanation:

F⪪S/

Split the input on /s and loop over each part.

≡№..ι

Count the number of times the part appears as a substring of ...

⁰M→

If it does not appear then increment the directory depth.

¹«

If it appears once then this is a .. part, so...

¬ⅈ←

... if the directory depth is zero then set the output to true, otherwise decrement the directory depth.

\$\endgroup\$
0
\$\begingroup\$

JavaScript (ES6), 52 bytes

Returns true for safe or false for unsafe.

s=>s.split`/`.every(s=>n-=s=='..'||s!='.'&&-!!s,n=1)

Try it online!

Method

We traverse each part of the path, starting with \$n=1\$. We decrement \$n\$ on each .. and increment \$n\$ on each part that is neither . nor the empty string. If we have \$n\le0\$, we are outside the working directory. In practice, we stop as soon as \$n=0\$.

Commented

s =>          // s = input string
s.split`/`    // split the input string on slashes
.every(s =>   // for each part s:
  n -=        //   subtract from n:
    s == '..' //     1 if s is '..'
    ||        //     or
    s != '.'  //     -1 if s is neither '.'
    && -!!s,  //     nor an empty string
  n = 1       //   start with n = 1
)             // end of every()
\$\endgroup\$

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.