Implementing String.Contains

After writing up the Finite State Machine from SoCalCodeCamp, I had CoffeeScript on the brain and decided to go back and look at my first CoffeeScript experiment to see if there was any room for improvement.  Actually, I knew there was room for improvement.  One of the times when I was stuck in my presentation I flipped over to this machine and I’m pretty sure I blanched visibly and the counted loop in the “read” function.

So lets revisit my first CoffeeScript state machine.  This machine determines if a specified substring exists within a string. In other words, it implements bool String.Contains(String).  Besides all the room for improvement, I think there is an opportunity for some meaningful use of CoffeeScript classes.

First we’ll warm up by adding a shim for alert.

alert = (value) -> console.log value 

Now we can get started with our conversion, we’ll start from the bottom up this time.

Remove Test Duplication

Here is our current test code.

test1 = "Windows is an operating system installed on many machines."
test2 = "Welcome to the operating room, the Doctor is just finishing his martini."

getMachineResult = (b) -> 
  result = ""
  if b == true 
   result = " Accepted. "
  else
   result = " Not Accepted. "
  result

machine = createMachine(rules, 0, finalStates)
machine.readAll test1
alert getMachineResult(machine.acceptedInput()) + test1

machine2 = createMachine(rules, 0, finalStates)
machine2.readAll test2
alert getMachineResult(machine2.acceptedInput()) + test2

Lets remove the duplication and the weirdo getMachineResult method.

testMachine = (input) ->
  machine = createMachine(rules, 0, finalStates)
  machine.readAll input  
  alert [machine.acceptedInput(), input].join '--'

testMachine "Windows is an operating system installed on many machines."
testMachine "Welcome to the operating room, the Doctor is just finishing his martini."

Encapsulate Machine Creation

Our testMachine function currently closes over rules and finalStates from the CoffeeScript “module” scope. We also have these five lines which represent the logic necessary for getting ready to create our machine.

source = "operating system".split '' 
rules = (createRule x, source for x in [0..source.length])
altRules = (createAltRule x, source for x in [0..source.length])
rules.concat altRules
finalStates = [ source.length ]

And the machine is actually created in testMachine

machine = createMachine(rules, 0, finalStates)

Lets encapsulate all this logic into one function, get it out of the module scope, and generalize it so that it can work with any search string.

createSearchMachine = (searchString) ->
  source = searchString.split '' 
  states = [0..source.length]
  rules = (createRule x, source for x in states)
    .concat (createAltRule x, source for x in states)
  createMachine rules, 0, [ source.length ]

testMachine = (input, searchString) ->
  machine = createSearchMachine searchString
  # ...

I did a little experimenting with this one and I’m pretty pleased with the results.  Because the character array indexes actually represent the states the search machine can take, we cant just iterate over the characters.  I think the new version makes it clear that the use of the index range is not a mistake.  I also decided to see if I could chain the call to concat right off of the list comprehension and that worked. Then the line was really long, so i decided to see if significant whitespace would prevent me from continuing the statement on the next line, but it worked just fine. Note that the () around the list comprehension are necessary both times. The parenthesis aren’t part of calling concat, they are part of the list comprehension.

I saved another line by moving finalStates inline, then updated the signature and call site in testMachine.

Creating Machines

Here is our current createMachine implementation. I don’t see any advantage to making it into a class, so I’ll just clean it up.

createMachine = (rules, initialState, finalStates) ->
 {
   currentState: initialState
   transitionRules: rules
   acceptanceStates: finalStates
   read: (c) ->
    for i in [0..this.transitionRules.length - 1]
      if this.transitionRules[i].match this.currentState, c
        return this.currentState = this.transitionRules[i].resultState
   readAll: (s) -> this.read x for x in s.split ''
   acceptedInput: () -> this.currentState in this.acceptanceStates
 }

I got rid of the object properties and captured with closure instead. I also converted the counted loop into a cleaner for..in loop. Finally, I picked some better variable names.

createMachine = (transitionRules, currentState, acceptanceStates) ->
  readAll: (input) -> this.read inputChar for inputChar in input.split ''
  read: (inputChar) ->
    for rule in transitionRules
      if rule.match currentState, inputChar
        return currentState = rule.resultState  
  acceptedInput: -> currentState in acceptanceStates

After testing I see that this still works as expected.

Creating Rules

Our rule creation is really spread out and contains a bit of duplication:

createRule = (index, source) -> {
 initialState: index
 inputChar: source[index]
 resultState: index + 1
 match: (q, c) -> true if (this.initialState == q && this.inputChar == c)
 }

createAltRule = (index, source) -> 
 if index == source.length
  {
    # trap in acceptance state
    initialState: index
    inputChar: null
    resultState: index
    match: (q, c) -> true if (this.initialState == q) 
  }
 else
  {
    # return to start state
    initialState: index
    inputChar: source[index]
    resultState: 0
    match: (q, c) -> true if (this.initialState == q && this.inputChar != c)
  }

Primarily these rules only vary by their method for evaluating match. Our POR (plain old rule) takes an index (which in this case represents a state) and a source string (of which we are only interested in one character) then computes the next state. The match function is true if the passed in state and input character match the stored values. This method can definitely be cleaned up.

createRule = (CurrentState, InputChar, NextState) ->
  getNextState: -> NextState
  match: (machineState, inputChar) ->
    machineState is CurrentState and inputChar is InputChar

With createRule defined this way, we need to make some changes to createSearchMachine so that it passes the needed data, and to createMachine so that it calls getNextState. If the machine is going to query getNextState instead of accessing a property, then the alternate rules will need functions with the same name and semantics.

createAltRule = (index, source) -> 
 if index == source.length
  {
    # trap in acceptance state
    initialState: index
    inputChar: null
    resultState: index
    getNextState: -> @resultState
    match: (q, c) -> true if (this.initialState == q) 
  }
 else
  {
    # return to start state
    initialState: index
    inputChar: source[index]
    resultState: 0
    getNextState: -> @resultState
    match: (q, c) -> true if (this.initialState == q && this.inputChar != c)
  }

So I’m beginning to sense a relationship between the POR and the alternate rules. First, I test the new code and once again find out that it works.  Lets convert createRule into a class, then use the createRule method as a shim to call the constructor.

class Rule
  constructor: (@CurrentState, @InputChar, @NextState) ->
  getNextState: => @NextState
  match: (machineState, inputChar) =>
    machineState is @CurrentState and inputChar is @InputChar

createRule = (CurrentState, InputChar, NextState) ->
  new Rule CurrentState, InputChar, NextState

The purpose of the shim is just to ensure that we don’t need to update all the call sites while we are refactoring the rules. With this new code in place, I test and everything still works. Now we can look at extending Rule to take over parts of createAltRule. Lets look at the top half of createAltRule first.

createAltRule = (index, source) -> 
 if index == source.length
  {
    # trap in acceptance state
    initialState: index
    inputChar: null
    resultState: index
    getNextState: -> @resultState
    match: (q, c) -> true if (this.initialState == q) 
  }

The purpose of this branch is to define a rule that traps the machine in a certain state. In this case, it will be the acceptance state. If we count the number of characters in the search string, we find a number that represents the machine’s acceptance state.  If we manage to reach that acceptance state, we want to stay in it.  Once we have determined that the source string contains the search string, the machine should accept the input, no matter what the rest of the string says.  The trap state implementation just compares the machine state to the acceptance state, and returns the acceptance state when the machine calls getNextState.

Here is how we can extend Rule to support this.

class TrapRule extends Rule
  constructor: (@CurrentState) ->  
    super @CurrentState, null, @CurrentState  
  match: (machineState, inputChar) =>
    machineState is @CurrentState

The TrapRule class extends Rule by replacing the match implementation with the same logic comparison implemented in the old code. I use super to call the Rule constructor with the @CurrentState value as both the current and next state. I pass null as the character to make it explicit that the input character is not used.

Now we can update the createAltRule function.

createAltRule = (index, source) -> 
 if index == source.length
    new TrapRule index
 else
  {
    # return to start state
    initialState: index
    inputChar: source[index]
    resultState: 0
    getNextState: -> @resultState
    match: (q, c) -> true if (this.initialState == q && this.inputChar != c)
  }

The bottom half of createAltRule describes a “return rule”. This rule handles any scenario where the machine reads a character that is not in the search string, and also any scenario where the character is in the search string, but not in the correct position. For example, both of our test strings start with the character W. The return rule would apply because W does not appear in operating system. As the machine reads along it will encounter o at position 5. Finding this character, it will advance from state 0 to state 1. The next character it wants to read is p. But in either test string, p is not the character at position 6. So, the return rule will apply and return the machine to state 0. Without the return rules, the machine could only accept strings that start with the search string, but we want a machine that accept strings which contain the search string. So, the match function looks at the input character passed in by the machine, and matches when the character does not match the stored character, and getNextState always returns 0.

Here is a ReturnRule class that implements the same logic.

class ReturnRule extends Rule
  constructor: (@CurrentState, @InputChar) ->
    super @CurrentState, @InputChar, 0
  match: (machineState, inputChar) =>
    machineState is @CurrentState and inputChar isnt @InputChar

And we can reduce our createAltRule method to this:

createAltRule = (index, source) -> 
 if index is source.length
    new TrapRule index
 else
    new ReturnRule index, source[index]

And this works too. Now that we have cleaned up our rules we can see that our shim is only called in one place, and does very little, so we can inline the Rule constructor and delete createRule.

createSearchMachine = (searchString) ->
  source = searchString.split '' 
  states = [0..source.length]
  rules = (new Rule x, source[x], x + 1 for x in states)
    .concat (createAltRule x, source for x in states)
  createMachine rules, 0, [ source.length ]

Lets try to get rid of the call to createAltRule with better list comprehension.

createSearchMachine = (searchString) ->
  source = searchString.split '' 
  states = [0..source.length]
  rules = (new Rule x, source[x], x + 1 for x in states)
    .concat (new ReturnRule x, source[x] for x in states when x isnt source.length)
  createMachine rules, 0, [ source.length ]

As an intermediate step, I simply filtered out the state that would have led to creating the trap rule and replaced the call to createAltRule with a direct instantiation of ReturnRule. I expect this to fail, but it doesn’t! Turns our our machine has a bug in it, since I designed it “knowing” that every possible state/character combination would have a rule, I did not handle the case in read where no rules matched. Just to be on the safe side, I’ll update read by moving the current state to -1 when no rules match.  Although it’s conventional to use non-negative integer states when simulating state machines, the truth is that you can use any symbol, positive, negative, or Celtic runes, it’s up to you.  Here is the updated Read implementation:

  read: (inputChar) ->
    for rule in transitionRules
      if rule.match currentState, inputChar
        return currentState = rule.getNextState()  
    currentState = -1

Now, the machine fails as expected. We can repair it by creating the TrapRule and pushing it into the rules collection.

createSearchMachine = (searchString) ->
  source = searchString.split '' 
  states = [0..source.length]
  rules = (new Rule x, source[x], x + 1 for x in states)
    .concat (new ReturnRule x, source[x] for x in states when x isnt source.length)
  rules.push new TrapRule source.length
  createMachine rules, 0, [ source.length ]

This works so we can get rid of the createAltRule function.  I made all these changes to the existing gist and I committed a few intermediate steps along the way.  Enjoy.