1

I have a program I am writing for an internal employee that takes a CSV file and searches a file server for the files listed in the CSV then copys each file to a folder in the desktop. The issue I am running into with my current code is that the CSV must hold the exact names but instead I need to regex search this and copy the files with file names like the ones in the CSV.

file name in excel looks like: D6957-QR-1452

file name on server looks like: WM_QRLabels_D6957-QR-1452_11.5x11.5_M.pdf

from tkinter import filedialog, messagebox
import openpyxl
import tkinter as tk
from pathlib import Path
import shutil
import os


desktop = Path.home() / "Desktop/Comps"

tk.messagebox.showinfo("Select a directory","Select a directory" )
folder = filedialog.askdirectory()
root = tk.Tk()


root.title("Title")


lbl = tk.Label(
    root, text="Open the excel file that includes files to search for")
lbl.pack()

frame = tk.Frame(root)
frame.pack()
scrollbar = tk.Scrollbar(frame)
scrollbar.pack(side=tk.RIGHT, fill=tk.Y)
listbox = tk.Listbox(frame, yscrollcommand=scrollbar.set)


def load_file():
    wb_path = filedialog.askopenfilename(filetypes=[('Excel files', '.xlsx')])
    wb = openpyxl.load_workbook(wb_path)
    global sheet
    sheet = wb.active

    listbox.pack()
    file_names = [cell.value for row in sheet.rows for cell in row]
    for file_name in file_names:
        listbox.insert('end', file_name)
    return file_names  # <--- return your list 
    
            



def search_folder(folder, file_name):
    # Create an empty list to store the found file paths
    found_files = []
    for root, dirs, files in os.walk(folder):
        for file in files:
            if file in file_name:
                found_files.append(os.path.join(root, file))
                shutil.copy2(file, desktop)

    return found_files


excelBtn = tk.Button(root, text="Open Excel File",
                     command=None)
excelBtn.pack()


zipBtn = tk.Button(root, text="Copy to Desktop",
                   command=search_folder(folder, load_file()))
zipBtn.pack()


root.mainloop()


Program is able to find and copy exact file names but unable to file *like* files. 
2
  • So if I got it right... your excel can have entries with wildcards like 'foo*', 'foo.*', or 'foo%', and you should be able to locate 'foobar' and 'foobaz' on the server. Is that correct?
    – Julio
    Commented Feb 1, 2023 at 15:23
  • Correct. Basically just need to strip the additional "WM_QRLabels_" and "_11.5x11.5_M.pdf" from the file name that lives on the server to get the right paths. The excel file it searches only has the D6957-QR-1452 part. Commented Feb 1, 2023 at 16:07

1 Answer 1

0

Change your search_folder method to something like this:

def search_folder(folder, file_name):
    # Create an empty list to store the found file paths
    found_files = []
    for root, dirs, files in os.walk(folder):
        for file in files:
            for file_pattern in file_name:
                if file.find(file_pattern) > -1:
                    replaced_backslashes = re.sub(r"\\+", "/", os.path.join(root, file), 0, re.MULTILINE)
                    found_files.append(replaced_backslashes)
                    shutil.copy2(file, desktop)

    return found_files

Basically, for every file we iterate over the file_name array to test all the patterns.

For the test we use find method, that in case it matches, returns the position of the found substring.

That is, if searching for 'foo', all these files would be returned:

zazfoo
foobar
zazfoobar
foo
6
  • That makes sense, thanks! However when I make the changes to the code I get the following error: Traceback (most recent call last): File "c:\Users\fretzm\Desktop\CodeProjects\ForRick\ProdSearch01.py", line 55, in search_folder pattern = re.compile(re.escape(file_name)) File "C:\Users\fretzm\AppData\Local\Programs\Python\Python310\lib\re.py", line 276, in escape pattern = str(pattern, 'latin1') TypeError: decoding to str: need a bytes-like object, list found Commented Feb 1, 2023 at 16:04
  • @SynfulAcktor I thougt file_name was a scalar :). I edited my answer to fix it. Now I basically iterate over the file_name array and test every occurrence with find
    – Julio
    Commented Feb 2, 2023 at 8:00
  • @Juilo After doing some playing with the code I notice that the path of the files trying to be copied looks like ['Y:/Walmart_WAL/WM_SwipeUp_ExtBanner_QRLabels/Comps\\WM_QRLabels_D6957-QR-0283_11.5x11.5_M.pdf'] which is causing it to fail in the shutil.copy. I know I need to grep and replace the \\ with / but also not sure if it will still error due to it being in a list and not str. Any added help would be much appreciated! Commented Feb 15, 2023 at 14:34
  • @SynfulAcktor If your excel file with patterns has regular slashes, then yes, you probably should replace all backslashes with a single slash. Something like this should work: replaced_backslashes = re.sub(r"\\+", "/", whatever, 0, re.MULTILINE)
    – Julio
    Commented Feb 16, 2023 at 7:36
  • I am having issues understanding how to work with the list returned in found_files. replaced_backslashes = re.sub(r"\\+", "/", found_files[0], 0, re.MULTILINE) This will do exactly what I need it to do on the first file but does only the first file. Do I need a for loop to pick out each file in found_files and perform the re.sub for each in the list? If so what would that look like? Commented Feb 16, 2023 at 14:25

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.