1

Good day.

I wrote a script that imports Excel files and then compares the rows. Each file contains about 13K rows. It is taking about 3 hours to process, which seems too long. This is happening because I am looping through every 13K rows from fileb for each row in filea.

Is there a more efficient way to do this?

Here is sample code:

#Import rows as customObject
rowsa = Import-Excel $filea
rowsb = Import-Excel $fileb

#Loop through each filea rows
foreach ($rowa in $rowsa)
{
    #Loop through each fileb row. If the upc code matches rowa, check if other fields match
    foreach ($rowb in $rowsb)
        {
                    $rowb | Where-Object -Property "UPC Code" -Like $rowa.upc |  
                Foreach-Object {
                    if (( $rowa.uom2 -eq 'INP') -and ( $rowb.'Split Quantity' -ne $rowa.qty1in2 )) 
                    {
                      #Do Something
                    }
        }
}
1

2 Answers 2

1

Seems like you can leverage Group-Object -AsHashtable for this. See about Hash Tables for more info on why this should be faster.

$mapB = Import-Excel $fileb | Group-Object 'UPC Code' -AsHashTable -AsString
foreach($row in Import-Excel $filea) {
    if($mapB.ContainsKey($row.upc)) {
        $value = $mapB[$row.upc]
        if($row.uom2 -eq 'INP' -and $row.qty1in2 -ne $value.'Split Quantity') {
            $value # => has the row matching on UPC (FileA) / UPC Code (FileB)
            $row   # => current row in FileA
        }
    }
}
Sign up to request clarification or add additional context in comments.

Comments

0

A few tricks:

  1. The Object Pipeline may be easy, but it's not as fast as a statement

Try changing your code use to foreach statements, not Where-Object and Foreach-Object.

  1. Use Hashtables to group.

While you can use Group-Object to do this, Group-Object suffers from the same performance problems as anything else in the pipeline.

  1. Try to limit looping within looping.

As a general rule, looping within looping will be o(n^2). If you can avoid loops within loops, this is great. So switching the code around to loop thru A, then loop thru B, will be more efficient. So will exiting your loops as quickly as possible.

  1. Consider using a benchmarking tool

There's a little module I make called Benchpress that can help you test multiple approaches to see which is faster. The Benchpress docs have a number of general PowerShell performance benchmarks to help you determine the fastest way to script a given thing.

Updated Script Below:

#Import rows as customObject
$rowsa = Import-Excel $filea
$rowsb = Import-Excel $fileb
$rowAByUPC = @{}
foreach ($rowA in $rowsa) {
   # This assumes there will only be one row per UPC.
   # If there is more than one, you may need to make a list here instead
   $rowAByUPC[$rowA.UPC] = $rowA 
}

foreach ($rowB in $rowsB) {
   # Skip any rows in B that don't have a UPC code.
   $rowBUPC = $rowB."UPC Code"
   if (-not $rowBUPC) { continue }
   $RowA = $rowAByUPC[$rowBUPC]
   # It seems only rows that have 'INP' in uom2 are important
   # so continue if missing
   if ($rowA.uom2 -ne 'INP') { continue }
   if ($rowA.qty1in2 -ne $rowB.'Split Quantity') {
      # Do what you came here to do.
   }
}

Please note that as you have not shared the code within the matching condition, you may need to take the advice contained in this answer and apply it to the inner code.

1 Comment

I tried the hash table, but does not work. Please see screenshot. Any ideas why? Thanks.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.