3
\$\begingroup\$

This code below aim to listing all folders in one directory and get size of each in byte. It uses only pathlib module for that.

#encoding:utf-8
from pathlib import Path


# # # # # Functions # # # # #
 
def run() :
    print("Hello world")
    

def write_into_file(folder, file_name:str="stats_dossiers.txt") -> None :
    """
    __Summary__ : this function write into file_name the name of each folder in directory targeted and weight associated in byte.
    folder      : directory targeted for analysis.
    file_name   : name of file in .txt format, which will contain informations.
    """
    # Check if file_name is .txt format :
    if file_name.endswith(".txt") :

        # Write into a file name of folder and its weight.
        new_file = Path(folder, file_name)
        # Check if file already exists.
        if new_file in folder.iterdir() :
            print(f"The file {new_file} already exists. Choose another filename or delete existing file.")

            print("End of script.")
        else :
            stats_folder = open(new_file, "w", encoding="utf-8")
            print(f"The file {new_file} was created with success")
            stats_folder.write("Name of folder ; Weight (Byte) \n")
            for file in folder.iterdir() :
                name = str(file)
                size = file.stat().st_size
                stats_folder.write(f"{name} ; {size} \n")
            stats_folder.close()
    else :
        print(f"{file_name} is not a .txt format. Choose another file in .txt format.")
    

    

# # # # # GLOBAL VARIABLES # # # # #
# Define directory to analyze.
FOLDER = Path.cwd()

# Listing all folders in current directory.

    
    
if __name__ == "__main__" :
    run()
    write_into_file(folder=FOLDER)

Could you help me in order to identify bad practices in this code? For example: bad overlaps of if/else, break function into many functions instead of using many if/else statement ...

\$\endgroup\$
4
  • \$\begingroup\$ I don't think the code does what it puports to do because the comments and docstrings say that it outputs information about the subdirectories of the target directory but the code prints info about all the files and directories contained in the target dir (iterdir iterates over the whole contents of the directory). \$\endgroup\$ Commented Nov 16 at 9:36
  • \$\begingroup\$ I changed the title so that it describes what the code does per site goals: "State what your code does in your title, not your main concerns about it.". Feel free to edit and give it a different title if there is something more appropriate. \$\endgroup\$ Commented Nov 17 at 17:15
  • \$\begingroup\$ Nit: .txt is not a format, it's just a file extension sometimes used to denote "plain text" file content. You may enforce use of that extension, but it's not always desirable: for example, I often use extension-less files, especially when they are short-lived like cd "$(mktemp -d)"; do_smth >./out. It's also sometimes convenient to use other extensions (.out, .log, .temp, .tmp are the few that come to my mind) - why do you need an explicit check to reject those? \$\endgroup\$ Commented Nov 17 at 23:48
  • \$\begingroup\$ Also #encoding:utf-8 is weird. If you think of coding magic pragma, it is typically spelled as # -*- coding: utf-8 -*-, and is not needed for many years - utf8 is the default source encoding since 3.0 (circa 2007). \$\endgroup\$ Commented Nov 17 at 23:54

5 Answers 5

6
\$\begingroup\$

Notes:

  • The run function is entirely superfluous. Get rid of it.
  • Comments like # # # # # Functions # # # # # are also extraneous.
  • Use context managers when opening files.
  • Print error messages to sys.stderr. This allows the user to redirect that output if they so desire.
  • Messages like "End of script." are almost certainly extraneous, unless specifically stated as requirements.
  • You can write your checks of input as guards and return early from the function, reducing conditional nesting. This is a matter of taste.
  • Specifying the folder=FOLDER given that there's one argument, and the keyword and the argument have the same name seems superfluous.
  • The wording of your second error message, "Choose another filename or delete existing file." suggests further user interaction, but instead the program just ends. You may wish to reword this.

If we take some of these things into consideration and remove comments for brevity...

from pathlib import Path
from sys import stderr

def write_into_file(folder, file_name:str="stats_dossiers.txt") -> None:
    if not file_name.endswith(".txt"):
        print(f"{file_name} is not a .txt format. Choose another file in .txt format.", file=stderr)
        return

    new_file = Path(folder, file_name)

    if new_file in folder.iterdir():
        print(f"The file {new_file} already exists. Choose another filename or delete existing file.", file=stderr)
        return
    
    with open(new_file, "w", encoding="utf-8") as stats_folder:
        print(f"The file {new_file} was created with success")
        stats_folder.write("Name of folder ; Weight (Byte) \n")
        for file in folder.iterdir() :
            name = str(file)
            size = file.stat().st_size
            stats_folder.write(f"{name} ; {size} \n")
        
FOLDER = Path.cwd()
    
if __name__ == "__main__":
    write_into_file(FOLDER)

Further, having write_into_file directly handle printing error messages may be giving it too much to do. Rather we can have it return custom exceptions (if the standard library does not provide a suitable exception) and let the caller determine how to deal with these situations.

E.g.

from pathlib import Path
from sys import stderr

class IncorrectExtensionException(Exception):
    def __init__(self, expected_ext, filename):
        super().__init__()
        self.expected_ext = expected_ext
        self.filename = filename

class FileAlreadyExistsException(Exception):
    def __init__(self, filename):
        super().__init__()
        self.filename = filename

def write_into_file(folder, file_name:str="stats_dossiers.txt") -> None:
    if not file_name.endswith(".txt"):
        raise IncorrectExtensionException(".txt", file_name)

    new_file = Path(folder, file_name)

    if new_file in folder.iterdir():
        raise FileAlreadyExistsException(new_file)
    
    with open(new_file, "w", encoding="utf-8") as stats_folder:
        print(f"The file {new_file} was created with success.", file=stderr)
        stats_folder.write("Name of folder ; Weight (Byte) \n")
        for file in folder.iterdir() :
            name = str(file)
            size = file.stat().st_size
            stats_folder.write(f"{name} ; {size} \n")

FOLDER = Path.cwd()

if __name__ == "__main__":
    try:
        write_into_file(FOLDER)
    except IncorrectExtensionException as e:
        print(f"{e.filename} is not a {e.expected_ext} format. Choose another file in {e.expected_ext} format.", file=stderr)
    except FileAlreadyExistsException as e:
        print(f"The file {e.filename} already exists. Choose another filename or delete existing file.", file=stderr)
\$\endgroup\$
5
\$\begingroup\$

PEP8 and coding style

Since you are asking about bad practices, you can make code more PEP8-compliant, for example:

if file_name.endswith(".txt") :

should be:

if file_name.endswith(".txt"):

And:

def write_into_file(folder, file_name:str="stats_dossiers.txt") -> None :

would become:

def write_into_file(folder, file_name: str = "stats_dossiers.txt") -> None:

These are minor points, you are mostly doing file. Spacing is consistent

Logging

Rather than using print across the code, I tend to favor the logging module. It is very convenient to output results to console but also to write to a log file at the same time, to keep history. Also, you can silence messages selectively based on severity (level), which is useful for testing. Not to mention filters.

Misc

  • It's good that you are using the __name__ guard. That means your code can be imported without side effects if you want to reuse your functions elsewhere. You just need to get rid of the hello world which does not belong. Generally speaking, it is a good idea to try to import your modules as stand-alone, to verify that they do not crash (For example, relying on other parts of the app for proper functioning).
  • However, there is a hardcoded extension check in your code, which discourages reuse. If I want to process any file type, or other extensions, then I must either adapt your code, or build my own function because of that artificial limitation. To fix this, the extension(s) to parse should be an optional function argument. Make it a list. Then your code becomes more versatile and reusable at little cost.
  • Maybe you should make the extension check case-insensitive. Your function could miss files in uppercase eg TEST.TXT. This might be more common on Windows perhaps?

Suggestions

  • use Git for versioning
  • then use pre-commit hooks to perform automated checks on your code, before you commit. For example, my pre-commit recipe includes a linter to reformat code, gitleaks to detect inadvertent committing of secrets, plus a security scanner and some other custom scripts.
  • also, use a good editor, it should automatically reformat code or suggest improvements
\$\endgroup\$
4
\$\begingroup\$

Let me add some additional suggestions:

Allow Arguments to write_into_file Be Path or str

  1. For greater flexibility you should allow the arguments to write_into_file be a string or Path instance.
  2. It is also possible that the passed file_name argument refers to an exiting directory rather than a file. In this case a distinct exception should be raised.
  3. Since the second argument to write_into_file can be a Path instance or a string, a better name for it is file.
  4. The current code will adds statistics for the file being created (i.e. itself) showing a 0 byte length since it has not yet been closed. We should skip reporting for this file.

Putting it all together (based on the post by Chris):

#encoding:utf-8
from pathlib import Path
from sys import stderr

class IncorrectExtensionException(Exception):
    def __init__(self, expected_ext, filename):
        super().__init__()
        self.expected_ext = expected_ext
        self.filename = filename

class FileAlreadyExistsException(Exception):
    def __init__(self, filename):
        super().__init__()
        self.filename = filename

class FileIsADirectoryException(Exception):
    def __init__(self, filename):
        super().__init__()
        self.filename = filename

def write_into_file(folder:str|Path=".", file:str|Path="stats_dossiers.txt") -> None:
    folder = Path(folder)  # ensure it is a path
    file = Path(file)  # ensure it is a Path
    # Check if file is .txt format:
    if file.suffix != ".txt":
        raise IncorrectExtensionException(".txt", file)

    new_file = Path(folder, file)
    if new_file.is_file():
        raise FileAlreadyExistsException(new_file)
    if new_file.is_dir():
        raise FileIsADirectoryException(new_file)

    with new_file.open("w", encoding="utf-8") as stats_folder:
        print(f"The file {new_file} was created with success.", file=stderr)
        stats_folder.write("Name of folder ; Weight (Byte)\n")
        for file in folder.iterdir():
            if file == new_file:
                continue  # No point in reporting on this file
            size = file.stat().st_size
            # Write into a file name of folder and its weight:
            stats_folder.write(f"{file} ; {size}\n")

if __name__ == "__main__":
    try:
        write_into_file()
    except IncorrectExtensionException as e:
        print(f"{e.filename} is not a {e.expected_ext} format. Choose another file in {e.expected_ext} format.", file=stderr)
    except FileAlreadyExistsException as e:
        print(f"The file {e.filename} already exists. Choose another filename or delete existing file.", file=stderr)
    except FileIsADirectoryException as e:
        print(f"The file {e.filename} specifies an existing folder. Choose another filename.", file=stderr)
\$\endgroup\$
0
4
\$\begingroup\$

write_into_file doesn't tell me a whole lot about what the function does. I have to read about two thirds into the method to figure out that it compiles some kind of folder statistics and writes them into a file. If the name instead was write_folder_stats, I get the general idea behind it already by reading the method call in your main guard. Splitting a program into distinct functions is good, giving these functions descriptive names is better.

\$\endgroup\$
3
\$\begingroup\$

Here are my tips

  1. The encoding line is unnecessary as default is utf8
  2. Similarly when opening a file, utf8 is assumed by default.
  3. Instead of printing error messages, raise exceptions. Leave it to the code calling the function to decide what it does when an exception occurs.
  4. Instead of code that writes lines to a file, write code generating lines. This separates the concerns. This code does not need to handle the IO, and the output to the file can easily be done with a single call to writelines()
  5. As this is a script, add a section using the argparse module to handle command line arguments such as the target directory, and also to provide a help string for the user
  6. Use Black to reformat your code!

from pathlib import Path


def write_into_file(
    folder, file_name: str = "stats_dossiers.txt", exist_ok: bool = False
) -> None:
    """
    __Summary__ : this function write into file_name the name of each file in directory targeted and weight associated in bytes.
    folder      : directory targeted for analysis.
    file_name   : name of file in .txt format, which will contain informations.
    """

    if not file_name.endswith(".txt"):
        raise ValueError(f"Expected file name with '.txt' extension, got: {file_name}")

    new_file = Path(folder, file_name)
    # Check if file already exists.
    if (not exist_ok) and new_file.exists():
        raise FileExistsError(new_File)

    def generate_stats(files, header="Name of file ; Size (Byte)\n"):
        "generates a sequence of lines each containing information about a file"
        if header:
            yield header
        for f in files:
            yield f"{str(f)} ; {f.stat().st_size}\n"

    with new_file.open("w") as stats_file:
        stats_file.writelines(generate_stats(Path(folder).iterdir()))


# # # # # GLOBAL VARIABLES # # # # #
# Define directory to analyze.
FOLDER = Path.cwd()

# Listing all folders in current directory.


if __name__ == "__main__":
    write_into_file(folder=FOLDER)
\$\endgroup\$

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.