1
\$\begingroup\$

I have a parser which consumes an ordered list of tokens (based on Vim's command grammar) and returns either:

  • an error object, or
  • an object that can be directly eval-ed by a separate function (not shown)

The code that calls parse handles both of these cases. I am looking for help with the structure/architecture of the parser function below.

This cascading if-else chain works, and I personally find it easy to read.

However, it seems like encoding the grammar precedence rules into an if-else chain is very brittle. For example, it seems like it will be difficult to add the visual-mode rules to this structure.

/*  Given an array of tokens, return a multiplier (mult),
    and the command to perform (cmd),
    otherwise return an error message.

    NOTE: The order of the `has('foo')` statements reflects the
    precedence rules of the grammar.
*/
var parse=(tokens)=>{
    var cmd={verb:'g',mult:1,original:tokens.map(x=>x[1]).reduce((x,y)=>x.concat(y)),},
        err={original:tokens,error:'PARSE ERROR'},// default error message
        has=(str)=>tokens.map(x=>x.includes(str)).some(x=>x);

    // error
    if(has('UNKNOWN TOKEN')){err.error='TOKENIZER ERROR';return err;}

    // rule: [count] operator [count] modifier object
    else if(has('modifier')){
        var t=tokens.shift();
        if(t[0]==='count'){cmd.mult*=parseInt(t[1],10);t=tokens.shift();}
        if(t[0]==='operator'){cmd.verb=t[1];t=tokens.shift();}else{return err;}
        if(t[0]==='count'){cmd.mult*=parseInt(t[1],10);t=tokens.shift();}
        if(t[0]==='modifier'){cmd.mod=t[1];t=tokens.shift();}else{return err;}
        if(t[0]==='object'){cmd.noun=t[1];if(t=tokens.shift()){return err;}}else{return err;}
        return cmd;
    }

    // rule: [count] operator [count] (motion|object)
    else if(has('operator')){
        var t=tokens.shift();
        if(t[0]==='count'){cmd.mult*=parseInt(t[1],10);t=tokens.shift();}
        if(t[0]==='operator'){cmd.verb=t[1];t=tokens.shift();}else{return err;}
        if(t[0]==='count'){cmd.mult*=parseInt(t[1],10);t=tokens.shift();}
        if(t[0]==='motion'){cmd.noun=t[1];if(t=tokens.shift()){return err;};}
        else if(t[0]==='object'){cmd.noun=t[1];if(t=tokens.shift()){return err;}}else{return err;}
        return cmd;
    }

    // rule: [count] motion
    else if(has('motion')){
        var t=tokens.shift();
        if(t[0]==='count'){cmd.mult*=parseInt(t[1],10);t=tokens.shift();}
        if(t[0]==='motion'){cmd.noun=t[1];if(t=tokens.shift()){return err;}}else{return err;}
        return cmd;
    }

    // rule: [count] object
    else if(has('object')){
        var t=tokens.shift();
        if(t[0]==='count'){cmd.mult*=parseInt(t[1],10);t=tokens.shift();}
        if(t[0]==='object'){cmd.noun=t[1];if(t=tokens.shift()){return err;}}else{return err;}
        return cmd;
    }
    else{return err;}
},
\$\endgroup\$

1 Answer 1

1
\$\begingroup\$

You are using (probably abusing) arrow functions. Arrow functions are good in quick situations like map callbacks that just return the value's property, or a short expression for a filter callback and similar situations. For larger cases, you need to either use an arrow function with a block or just a regular function.

In the case of parse, one problem with arrow functions is that they're anonymous and they become hard to debug. Stack traces usually output names of functions in the trace, but since anonymous functions are well... anonymous, you will not see the name in the trace. Although you'll still see the line number, that's like half the benefit of the stack trace. So consider converting parse into a regular function.

var cmd={verb:'g',mult:1,original:tokens.map(x=>x[1]).reduce((x,y)=>x.concat(y)),},
    err={original:tokens,error:'PARSE ERROR'},// default error message
    has=(str)=>tokens.map(x=>x.includes(str)).some(x=>x);

Not very readable. Code is for humans to read, and for machines to translate. Code that's unreadable to a human is overhead. Consider formatting the objects with properties in its own line and use whitespace more. Further, it's not easily spotted that has is not just a variable, but a function. Consider rewriting it as a function and better, outside and independent of parse.

// rule: [count] operator [count] modifier object
else if(has('modifier')){
    var t=tokens.shift();
    if(t[0]==='count'){cmd.mult*=parseInt(t[1],10);t=tokens.shift();}
    if(t[0]==='operator'){cmd.verb=t[1];t=tokens.shift();}else{return err;}
    if(t[0]==='count'){cmd.mult*=parseInt(t[1],10);t=tokens.shift();}
    if(t[0]==='modifier'){cmd.mod=t[1];t=tokens.shift();}else{return err;}
    if(t[0]==='object'){cmd.noun=t[1];if(t=tokens.shift()){return err;}}else{return err;}
    return cmd;
}

Again, code is for humans. This is barely readable in its current state. Also, it appears that this section is repeated throughout the code. Consider splitting this off to another function for reuse. One way to do it is to have a map of functions whose keys are the tokens. This is usually a way to simplify multiple if statements or a huge switch block, assuming it's just an equality comparison.

var ops = {
  count: function(cmd, tokens){...},
  operator: function(cmd, tokens){...},
  ...
};

// In your function
var t=tokens.shift();
var cmd = ops[t[0]].call(null, cmd, tokens);
return cmd;

Lastly...

else{return err;}

Don't you want to throw instead?

\$\endgroup\$
1
  • \$\begingroup\$ I'm using the 'revealing module' pattern heavily; parse is a top-level module. Does that change your advice about arrow functions? \$\endgroup\$ Commented Jun 14, 2016 at 3:06

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.