0

Trying to find the most efficient way to extract values from a large string.

EXT-X-DATERANGE:ID="PreRoll_Ident_Open",START-DATE="2016-12-14T120000.000z",DURATION=3,X-PlayHeadStart="0.000",X-AdID="AA-1QPN49M9H2112",X-TRANSACTION-VPRN-ID="1486060788",X-TrackingDefault="1",X-TrackingDefaultURI="http,//606ca.v.fwmrm.net/ad/l/1?s=g015&n=394953%3B394953&t=1485791181366184015&f=&r=394953&adid=15914070&reid=5469372&arid=0&auid=&cn=defaultImpression&et=i&_cc=15914070,5469372,,,1485791181,1&tpos=0&iw=&uxnw=394953&uxss=sg579054&uxct=4&metr=1031&init=1&vcid2=394953%3A466c5842-0cce-4a16-9f8b-a428e479b875&cr="s=0&iw=&uxnw=394953&uxss=sg579054&uxct=4&metr=1031&init=1&vcid2=394953%3A466c5842-0cce-4a16-9f8b-a428e479b875&cr="

I have the above as an example. The idea is to extract all caps string before : as object key, and everything in between quotes until next comma as its value. Then iterate entire string until this object is created.

nonParsed.substring(nonParsed.lastIndexOf("="")+1, nonParsed.lastIndexOf("","));

I had this concept as a start, but some help iterating through this and making it more efficient would be appreciated.

Final output would be something like --

{
  'EXT-X-DATERANGE:ID': 'PreRoll_Ident_Open',
  'START-DATE': '2016-12-14T120000.000z',
  'DURATION': '3',
  ...
}
5
  • Maybe this helps: developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/… Commented Sep 11, 2017 at 20:49
  • 1
    It'll be a little tougher than usual since you seem to have a comma in your X-TrackingDefaultURI header value where the colon should be. This will make a naive split more difficult. Commented Sep 11, 2017 at 20:51
  • What would be the final output? Commented Sep 11, 2017 at 20:53
  • @revo I updated with sample object Commented Sep 11, 2017 at 21:02
  • The idea is to extract all caps string before : ... so why doesn't it apply to EXT-X-DATERANGE:ID which should extract EXT-X-DATERANGE part only! Commented Sep 11, 2017 at 21:06

3 Answers 3

2

It looks like the only property that messes up a predictable pattern is DURATION, which is followed by a number. Otherwise, you can rely on a naive pattern of alternating =" and ",.

You could do something like

str = str.replace(/DURATION=(\d+)/, `DURATION="$1"`);
return str.split('",').reduce((acc, entry) => {
    let key = `'${entry.split('="')[0]}'`;
    let value = `'${entry.split('="')[1]}'`;
    acc[key] = value;
    return acc;
}, {});

Then add a bit of logic to the end to sort out the Duration if you needed to.

Sign up to request clarification or add additional context in comments.

1 Comment

this is great! i can sanitize before this filter, so i can have duration follow the same string rules as the others
1

It looks like you have mixed case strings for the headers, not just uppercase. I would instead look for key-value pairs based on the = character. You can construct a regex and use the exec() method to then iterate and build your object.

var input = 'EXT-X-DATERANGE:ID="PreRoll_Ident_Open",START-DATE="2016-12-14T120000.000z",DURATION=3,X-PlayHeadStart="0.000",X-AdID="AA-1QPN49M9H2112",X-TRANSACTION-VPRN-ID="1486060788",X-TrackingDefault="1",X-TrackingDefaultURI="http,//606ca.v.fwmrm.net/ad/l/1?s=g015&n=394953%3B394953&t=1485791181366184015&f=&r=394953&adid=15914070&reid=5469372&arid=0&auid=&cn=defaultImpression&et=i&_cc=15914070,5469372,,,1485791181,1&tpos=0&iw=&uxnw=394953&uxss=sg579054&uxct=4&metr=1031&init=1&vcid2=394953%3A466c5842-0cce-4a16-9f8b-a428e479b875&cr="s=0&iw=&uxnw=394953&uxss=sg579054&uxct=4&metr=1031&init=1&vcid2=394953%3A466c5842-0cce-4a16-9f8b-a428e479b875&cr='

// Regex looks for any alpha character, colon, or hyphen before a =, then captures anything between the quotes and an optional comma after
var pattern = /([A-Za-z:-]+)="([^"]+)",?/g;

// Iterate the string using exec() and build the object along the way
var match;
var output = {};
while (match = pattern.exec(input)) {
    output[match[1]] = match[2];
}

console.dir(output);

2 Comments

May I suggest /(?:^|,)([A-Za-z\-]+?)(?::[A-Z\-]+)?=(".+?"|\d+?)(?=,|$)/gm as a more comprehensive regex? regex101.com/r/5XLR1O/1
it allows capture of that messy URL and the unquoted integer values
1

Here is a possible solution. You split the string on the double quotes (this of course presumes that you do not have an escaped double quote within a value). Then you cycle through the resulting array setting the ith value to the key and the ith+1 value to the value of that key. Here would be the code:

strings=nonparsed.split('"');
myObj={};
myObj[strings[0].slice(0,-1)]=strings[1];
for(i=2;i<strings.length;i+=2)myObj[strings[i].slice(1,-1)]=strings[i+1];

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.