3

Good evening. Stack Overflow and Powershell and RegEx novice here.

The below Powershell code is returning the correct value as well as a little pre context and post context. However, I am looking to enhance this program and capture the ID value that is several hundred lines prior to the result. The problem is, the ID is not always exactly 322 lines above. Sometimes it can be 250 lines, sometimes it can be 150 lines above depending on the JSON file.

For example, it is capturing variables in a JSON file for a variable called "RatedOI" on line #2028 enter image description here

This variable is found in the JSON file, line #2028 When I select the checkbox in PowerShell, the result immediately appears enter image description here

However I want to enhance this program and capture the ID string which is literally on line #1706 (322 lines above the result).

enter image description here

I am a novice. Notice the curly braces are yellow instead of teal. I am trying to return that ID string, which happens to be a4be181b-a747-4901-a37d-fb7537fd4c19

Here is the powershell/regex code. I included snippets to give you just enough information. Please let me know if you have any questions and if you need more code. I appreciate all of you

set-location C:\Users\A187515\Downloads\
$file = Get-ChildItem -Path C:\Users\A187515\Downloads\ -filter *.json | Sort-Object 
LastAccessTime -Descending | Select-Object -First 1


[xml]$xaml = @"
<Window 
xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
x:Name="Window" Title="Swagger Variable Lookup" WindowStartupLocation = "CenterScreen"
SizeToContent = "WidthAndHeight" ShowInTaskbar = "True" Background = "lightgray"> 
<StackPanel >
    
    <CheckBox x:Name="Item23" Content = 'RatedOI 1+'/>
      
    <TextBox x:Name='Item1_txt' />      
    <TextBox x:Name='Item2_txt' />      
    <TextBox x:Name='Item3_txt' />      
    
           
</StackPanel>
</Window>
"@

$reader=(New-Object System.Xml.XmlNodeReader $xaml)

#Connect to Controls
$Window=[Windows.Markup.XamlReader]::Load( $reader )


# define 1 textbox beneath the variables only
$Item1_txt = $Window.FindName('Item1_txt')
$Item2_txt = $Window.FindName('Item2_txt')
$Item3_txt = $Window.FindName('Item3_txt')


#define each item in the UI. Connect to Control
$Item23 = $Window.FindName('Item23')

#RatedOI which contains multiple lines begins here, copy and paste this
$Item23.Add_Checked({




If ($Item23.IsChecked) { $RatedOI = '"RatedOI"'
Get-Content $file | Select-String -Pattern $RatedOI -Context 1 | ForEach-Object {
$_.LineNumber
$_.Line
$_.Pattern
$_.Context.PreContext
$_.Context.PostContext
    

    
}
#I closed the ForEach-Object Loop. 
#I output my results outside the loop using [string] $_.LineNumber + ' ' +  $_.Line + ' 
' + $_.Context.PostContext
#I also use -join
$Item1_txt.Text = 
(
Select-String -LiteralPath $file -Pattern $RatedOI -Context 2 | 
ForEach-Object {
  [string] $_.LineNumber + ' ' +  $_.Line + ' ' + $_.Context.PostContext
}
) -join [Environment]::NewLine


}

else {$RatedOI ="!$!"}

})
$Item23.Add_UnChecked({
$Item1_txt.Text = ""
#$Item2_txt.Text = ""
})
#close the RatedOI code


$Window.Showdialog() | Out-Null

Let's say I have a new application or an existing application when applying for a college. Every time i make a change to that (new or existing) application, such as last name, first name, etc a new JSON file is created and pulled from Swagger. It has lots of variables and line numbers Inside that JSON file are the 2 properties of interest ID and RatedOI. ID will be a unique string of 36 characters including 4 hyphens, such as (a4be181b-a747-4901-a37d-fb7537fd4c19},it can reside 200 lines or even 400 lines depending on change made to the college application Once that ID string is generated the RatedOI underneath it will update. RatedOI is a boolean. It depends whats changing in the front end college application. Most of the time its false by default, but can be true depending on end User inputs to the college application. So essentially there several scenarios which show the relationship between the 2 variables of interest

  1. A change is made to the college application. The ID is updated to a unique 36 character string (including 4 hyphens) and the RatedOI updates from false to true
  2. A change is made to the college application. The ID is updated to a unique 36 character string (including 4 hyphens) and the RatedOI updates from false to false
  3. A change is made to the college application. The ID is updated to a unique 36 character string (including 4 hyphens) and the RatedOI updates from true to false
  4. A change is made to the college application. The ID is updated to a unique 36 character string (including 4 hyphens) and the RatedOI updates from from true to true

Here is a condensed sample of the code. There are hundreds of lines in between ID and RatedOI and it can vary. Please note that I want to capture only the ID string immediately found above the RatedOI variable (ID is found hundreds of times throughout the json file)

sample JSON:

    {
"id": "e0b8a6aa-2675-4d93-86ff-6a51a0daea85",
"_etag": "\"ba00b64e-0000-0100-0000-671029b20000\"",
"LOB": "A",
"PID": "9219377869",
"PNum": "9219377869",
"SC": "TX",
"RD": {
"SM": {
  "productId": "XYZ",
  "RRID": "XYZ",
  "PF": "XYZ",
  "PV": "1",
  "RR": "202401",
  "PTCH": "0.0",
  "state": "TX",
  "LOB": "A",
   "userType": ""
   },
   "PSM": {
   "productId": "XYZ",
   "RRID": "XYZ",
   "PF": "XYZ",
   "PV": "1",
   "rateRevision": "202401",
   "PTCH": "0.0",
   "state": "TX",
   "LOB": "Auto",
   "userType": ""
   },
   "resultsRequested": false,
   "resultsReceived": false,
   "entities": [
   {
   },
   {
   },
   {
   },
   {
   },
   {
   },
   {
   },
   {
   },
   {
   },
   {
   },
   {
   },
   {
   },
   {
   },
   {
   },
   {
   }
   ],
 "qId": "638ee347-42ed-441c-a6ff-d6c99cf62032",
 "totalP": 498,
 "dateRateCalculated": "2024-10-16T21:01:30.8295181Z",
 "peakData": {
 },
 "premDiff": []
},
"productSelectionTimeStamp": "2024-10-16T20:57:23.9369158Z",
"sessionId": "a451b6b2-f7e1-448f-bc44-38c6a0cfbffb",
"RK": "999999999", 
"historyId": "",
"preApplicationVerificationData": {
},
"GReject": {
},
"NR": {
},
"appliedExternalSourcedData": [],
"TFSD": {
},
"outOfSequenceStatus": {
},
"dateStarted": "2024-10-16T20:52:13.9427222Z",
"applicationEffectiveDate": "2024-10-17T04:00:00+00:00",
"dateApplied": "2024-10-16T21:01:38.1017792Z",
"applicationStatusCode": "Committed",
"entitiesData": {
"entities": [
  {
  },
  {
  },
  {
  },
  {
  },
  {
  },
  {
  },
  {
  },
  {
  },
  {
"id": "b0000000-0000-0000-0000-00000000000b",
"entityType": "IHUR",
"ordinal": 0,
"characteristics": {
"PBILR": {
"stringValue": "XYZ"
},
"PBILRC": {
"stringValue": "XYZ"
},
"PISR": {
"stringValue": "XYZ"
}
                 }
  },
  {
    "id": "7af50dbb-7ea3-4458-adfe-a82198a8d96c",
    "entityType": "Operator",
    "ordinal": 0,
    "characteristics": {
      "FirstName": {
        "stringValue": "Joe"
      },
      "ODLNUR": {
        "stringValue": "A999999"
      },
      "RatedOI": {
        "boolValue": true
      }
    }
                }
]
}
}

I want it to pull "id": "7af50dbb-7ea3-4458-adfe-a82198a8d96c", which is the id just above the RatedOI It seems to pull all the IDs after Entities. The output for id is so long that is looks like this and has an elipsis

id RatedOI


{f68fd8eb-bbf1-4b81-9ada-9ca5dffae120, 00000000-0000-0000-0000-000000000000, bea9da3c-4712-4522-b29c-6a03d7553c65, bfe05560-8fde-48cb-bf19-dfd09b18eb38...} True

17
  • 2
    It is generally a bad practice to directly peek and poke in serialized files (as e.g. Json files) using string commands as regular expressions. Instead deserialize the file with the ConvertFrom-Json cmdlet . To explore the resulted ObjectGraph or even get the desired property, you might consider to use this custom ObjectGraphTools cmdlet. Which results to something like: Get-Content $file | ConvertTo-Json | Get-Node ~entities.id
    – iRon
    Commented Feb 6 at 7:53
  • 1
    The sourcecode can be found under the project site link at the PowerShell Gallery.
    – iRon
    Commented Feb 6 at 20:23
  • 1
    Your problem can be reduced to querying a JSON document for specific, related values. As iRon recommends, this is best done with parsing the JSON into an object model using ConvertTo-Json. iRon's module (which you'd have to install) simplifies such queries. Either way, if you want to know how to extract the specific information of interest, you'll have to add a minimal, but representative sample JSON document to your question.
    – mklement0
    Commented Feb 6 at 21:56
  • 1
    @mklement0 Thank you. I am about to open a new question and tag you. The coffee will come your way soon, please be patient with me Commented Feb 13 at 15:30
  • 1
    @mklement0 I posted a new question here. I tried my best to condense the JSON examples. stackoverflow.com/questions/79437094/… Commented Feb 13 at 16:44

2 Answers 2

2

Use ConvertFrom-Json to parse your JSON document into an object graph (comprised of nested [pscustomobject] instances), which allows you to access the objects and properties represented in the JSON document using regular dot notation, via ., the member-access operator.

The following solution outputs a [pscustomobject] for each object in the input JSON, with .id and .RatedOI properties (only), using the Select-Object cmdlet; note that extracting a nested property value requires use of a calculated property:

# Substitute your real file name/path for 'file.json'
(Get-Content -Raw file.json | ConvertFrom-Json) | ForEach-Object {
  $useNext = $false
  foreach ($entity in $_.entitiesData.entities) {
    if ($entity.entityType -eq 'IHUR') {
      $useNext = $true # The very next object contains the data of interest.
    }
    elseif ($useNext) {
      # Extract the values of interest and output them as a 
      # [pscustomobject] instance.
      $entity | Select-Object `
        id, 
        @{ Name = 'RatedOI'; Expression = { $_.characteristics.RatedOI.boolValue } }    
      # Exit after having found the value of interest.
      # NOTE: If there can be *multiple* 'IHUR' object pairs,
      #       and you want to extract information from all of them,
      #       replace `break` with `$useNext = $false`
      break
    }
  }
}

With your sample JSON document, this yields the following display output:

id                                   RatedOI
--                                   -------
7af50dbb-7ea3-4458-adfe-a82198a8d96c    True
1

I like the idea of using Regex to solve this problem. Here is what it would look like using Python, and Python's re module:

The solution is pretty straight forward, using a cool and simple regex pattern (?:(?!"id").)*? (explained more below). That pattern make sure that there is no literal "id" string (with another id value) between the id value that we are capturing and the literal "RatedOI", which means we will capture the last "id" value before the "RatedOI, which is what we want.

See Test String included at the end.

I captured the id value using Python. Please find below the links for the .NET 7.0 and PRCE2 regex flavors that may be used with Powershell.

PATTERN LOGIC:

'"id":' {capture the 36 digit id } { Make sure NO other '"id"' }  '"RatedOI"'

PYTHON & re module CODE:

import re

pattern = r'"id":\s?"([^-\s]{8}-[^-\s]{4}-[^-\s]{4}-[^-\s]{4}-[^-\s]{12})(?:(?!"id").)*?"RatedOI"'

pattern_re = re.compile(pattern, flags=re.S)    # re.S (dot matches all)

id = pattern_re.findall(text)[0]

print(id)

REGEX PATTERN DEMO: https://regex101.com/r/oe81dA/5

RESULT:

7af50dbb-7ea3-4458-adfe-a82198a8d96c

NOTES:
r'"id":\s?"([^-\s]{8}-[^-\s]{4}-[^-\s]{4}-[^-\s]{4}-[^-\s]{12})(?:(?!"id").)*?"RatedOI"'

  • "id": Literal '"id":'

  • \s? Optional white space character (including newline)

  • (...) Capture Group: The parenthesis enclose the character pattern to be captured.

  • [...] Character Class: Match any character listed here.

  • [^...] Negated Character Class: Match any character that is not listed here.

  • [...]{x} Quantifier {x}: x is an integer representing how may instances of the previous are matched.

  • [^-\s] Matches any character that is not a dash,-, or \s, a whitespace character (including \n). (Note: better practice would be to have the dash - at the end of the character class after \s, however, it works here.)

  • [^-\s]{8}-[^-\s]{4}-[^-\s]{4}-[^-\s]{4}-[^-\s]{12} The pattern for 36 characters separated including four dashes -.

  • [^-\s]{8}- : Match any eight(8) characters in a row that are not - or a whitespace character. Must be immediately followed with a literal dash-.

  • (?:...) Non-capturing group is signified by (?: and ends in the next ). Match characters but do not capture in a group.

  • (?!"id") Cannot match literal "id"

  • . The dot (.), special character. Matches any character. When the single Line flag re.S is on, the dot . also matches the newline character \n.

  • (?:(?!"id:).) Match any character (.) as long as it is not " in "id", if it is the " in "id", stop, this is not a match. This prevents another "id" expression before the "RatedIO".

  • (...)*? Repeat the pattern in the parenthesis 0 or more times (* special quantifier character), but be lazy about it (*?), only match the minimum number of characters necessary to make a match.

  • (?:(?!"id").)*?"RatedOI" Keep matching every character with the dot (.) -- except the " in "id", do not match "id" -- until you come to the first literal "RatedOI". Match RateOI. This completes the match. You can now retreive the value of the capture group i.e. the last 'id' value before 'RateOI'


In re.compile(pattern, flags=re.S):

  • re.S single line flag means that the special chracter dot (.) matches all characters now including newline.

.NET 7.0 (C#) Flavor regex pattern:

.NET 7.0 DEMO: https://regex101.com/r/oe81dA/7

""""id":\s?"([^-\s]{8}-[^-\s]{4}-[^-\s]{4}-[^-\s]{4}-[^-\s]{12})(?:(?!"id").)*?"RatedOI""""sg

PRCE2 (PHP | Perl | Powershell (?)) Regex flavor pattern:

/"id":\s?"([^-\s]{8}-[^-\s]{4}-[^-\s]{4}-[^-\s]{4}-[^-\s]{12})(?:(?!"id").)*?"RatedOI"/sg

Flags "gs":

g = "global" (Don't return after first match)

s = "single line" (Dot matches newline)

PRCE2 DEMO: https://regex101.com/r/oe81dA/6


TEXT STRING (Python):

text = '''
 {
"id": "e0b8a6aa-2675-4d93-86ff-6a51a0daea85",
"_etag": "\"ba00b64e-0000-0100-0000-671029b20000\"",
"LOB": "A",
"PID": "9219377869",
"PNum": "9219377869",
"SC": "TX",
"RD": {
"SM": {
  "productId": "XYZ",
  "RRID": "XYZ",
  "PF": "XYZ",
  "PV": "1",
  "RR": "202401",
  "PTCH": "0.0",
  "state": "TX",
  "LOB": "A",
   "userType": ""
   },
   "PSM": {
   "productId": "XYZ",
   "RRID": "XYZ",
   "PF": "XYZ",
   "PV": "1",
   "rateRevision": "202401",
   "PTCH": "0.0",
   "state": "TX",
   "LOB": "Auto",
   "userType": ""
   },
   "resultsRequested": false,
   "resultsReceived": false,
   "entities": [
   {
   },
   {
   },
   {
   },
   {
   },
   {
   },
   {
   },
   {
   },
   {
   },
   {
   },
   {
   },
   {
   },
   {
   },
   {
   },
   {
   }
   ],
 "qId": "638ee347-42ed-441c-a6ff-d6c99cf62032",
 "totalP": 498,
 "dateRateCalculated": "2024-10-16T21:01:30.8295181Z",
 "peakData": {
 },
 "premDiff": []
},
"productSelectionTimeStamp": "2024-10-16T20:57:23.9369158Z",
"sessionId": "a451b6b2-f7e1-448f-bc44-38c6a0cfbffb",
"RK": "999999999", 
"historyId": "",
"preApplicationVerificationData": {
},
"GReject": {
},
"NR": {
},
"appliedExternalSourcedData": [],
"TFSD": {
},
"outOfSequenceStatus": {
},
"dateStarted": "2024-10-16T20:52:13.9427222Z",
"applicationEffectiveDate": "2024-10-17T04:00:00+00:00",
"dateApplied": "2024-10-16T21:01:38.1017792Z",
"applicationStatusCode": "Committed",
"entitiesData": {
"entities": [
  {
  },
  {
  },
  {
  },
  {
  },
  {
  },
  {
  },
  {
  },
  {
  },
  {
"id": "b0000000-0000-0000-0000-00000000000b",
"entityType": "IHUR",
"ordinal": 0,
"characteristics": {
"PBILR": {
"stringValue": "XYZ"
},
"PBILRC": {
"stringValue": "XYZ"
},
"PISR": {
"stringValue": "XYZ"
}
                 }
  },
  {
    "id": "7af50dbb-7ea3-4458-adfe-a82198a8d96c",
    "entityType": "Operator",
    "ordinal": 0,
    "characteristics": {
      "FirstName": {
        "stringValue": "Joe"
      },
      "ODLNUR": {
        "stringValue": "A999999"
      },
      "RatedOI": {
        "boolValue": true
      }
    }
                }
]
}
}

'''

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.