Skip to content

zipfile: file type issues #133324

Open
Open
@calestyo

Description

@calestyo

Bug report

Bug description:

Hey.

I think there exist a number of issues (which may, depending on how code is used, in principle even be security relevant).

First, AFAIU, ZIP files may contain (at least) regular files, directories (as "standalone" items in the archive, like an empty directory) and symbolic links.

For example using the zip program:

$ mkdir empty
$ zip d.zip empty
  adding: empty/ (stored 0%)
$ ln -s /dev/null bar
$ zip --symlinks  s.zip bar 
  adding: bar (stored 0%)
$

When I open the one with the symlink in Python:

>>> import zipfile

>>> z = zipfile.ZipFile("s.zip", "r")
>>> z.namelist()
['bar']

>>> f = z.open("bar", "r")
>>> f.read()
b'/dev/null'

>>> i = z.getinfo("bar")
>>> i.is_dir()
False

>>> p = zipfile.Path(z, "bar")

>>> p.filename
PosixPath('s.zip/bar')

>>> p.is_dir()
False

>>> p.is_file()
True

>>> p.is_symlink()
True

>>>
  1. It's IMO debatable whether z.open("<symlink>", "r") should succeed or not. IMO zipfile.ZipFile.open() quite clearly is ZIP's open(), but that AFAIK, never opens symlinks but only follows them.
  2. Even if that behaviour is desired (i.e. like a os.readlink() for ZIPs), then it's still completely unexpected and the ZipFile object has no is_symlink()-method ... (only zipfile.Path has such, so one needs to create that first, which seems quite unhandy).
  3. Speaking of which zipfile.Path’s is_dir(), is_file() and is_symlink() functions seem either buggy or semantically inconsistent and/or badly documented.
    Usually, "file" means either any type of file (directory, symlink, device, etc.) or regular files (and sometimes also symlinks if they point to regular files).
    Here, the symlink points to nothing, so it cannot be the latter case. Also - see below - a directory wouldn't return True for is_file(), so the it's not the former either.
    The docs also don't meantion what "file" means.

Now the same with d.zip:

>>> import zipfile
>>> z = zipfile.ZipFile("d.zip", "r")
>>> z.namelist()
['empty/']

>>> f = z.open("empty/", "r")
>>> f.read()
b''

>>> i = archive_file.getinfo("empty/")
>>> i.is_dir()
True

>>> p = zipfile.Path(z,"empty/")
>>> p.filename
PosixPath('d.zip/empty')

>>> p.is_dir()
True

>>> p.is_file()
False

>>> p.is_symlink()
False

>>>
  1. IMO, that zipfile.ZipFile.open() succeeds on a directory (and gives an empty bytes) seems pretty strange at best. It does so even if the directory isn't empty but contains files.
  2. There's also that thing that sometimes that directory pathnames are suffixed by / and sometimes not. Maybe I've missed it but that doesn't seem to be documented, but may be crucial when e.g. matching filenames - and is IMO unexpected.
  3. As mentioned above, is_file() here is False, which would imply that the meaning of that function should be that a file is either a regular file or a regular file or a symbolic link pointing to such.

Cheers,
Chris.

CPython versions tested on:

3.13

Operating systems tested on:

Linux

Metadata

Metadata

Assignees

No one assigned

    Labels

    stdlibPython modules in the Lib dirtype-bugAn unexpected behavior, bug, or error

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions