Implementing String.Contains

After writing up the Finite State Machine from SoCalCodeCamp, I had CoffeeScript on the brain and decided to go back and look at my first CoffeeScript experiment to see if there was any room for improvement.  Actually, I knew there was room for improvement.  One of the times when I was stuck in my presentation I flipped over to this machine and I’m pretty sure I blanched visibly and the counted loop in the “read” function.

So lets revisit my first CoffeeScript state machine.  This machine determines if a specified substring exists within a string. In other words, it implements bool String.Contains(String).  Besides all the room for improvement, I think there is an opportunity for some meaningful use of CoffeeScript classes.

First we’ll warm up by adding a shim for alert.

alert = (value) -> console.log value 

Now we can get started with our conversion, we’ll start from the bottom up this time.

Remove Test Duplication

Here is our current test code.

test1 = "Windows is an operating system installed on many machines."
test2 = "Welcome to the operating room, the Doctor is just finishing his martini."

getMachineResult = (b) -> 
  result = ""
  if b == true 
   result = " Accepted. "
  else
   result = " Not Accepted. "
  result

machine = createMachine(rules, 0, finalStates)
machine.readAll test1
alert getMachineResult(machine.acceptedInput()) + test1

machine2 = createMachine(rules, 0, finalStates)
machine2.readAll test2
alert getMachineResult(machine2.acceptedInput()) + test2

Lets remove the duplication and the weirdo getMachineResult method.

testMachine = (input) ->
  machine = createMachine(rules, 0, finalStates)
  machine.readAll input  
  alert [machine.acceptedInput(), input].join '--'

testMachine "Windows is an operating system installed on many machines."
testMachine "Welcome to the operating room, the Doctor is just finishing his martini."

Encapsulate Machine Creation

Our testMachine function currently closes over rules and finalStates from the CoffeeScript “module” scope. We also have these five lines which represent the logic necessary for getting ready to create our machine.

source = "operating system".split '' 
rules = (createRule x, source for x in [0..source.length])
altRules = (createAltRule x, source for x in [0..source.length])
rules.concat altRules
finalStates = [ source.length ]

And the machine is actually created in testMachine

machine = createMachine(rules, 0, finalStates)

Lets encapsulate all this logic into one function, get it out of the module scope, and generalize it so that it can work with any search string.

createSearchMachine = (searchString) ->
  source = searchString.split '' 
  states = [0..source.length]
  rules = (createRule x, source for x in states)
    .concat (createAltRule x, source for x in states)
  createMachine rules, 0, [ source.length ]

testMachine = (input, searchString) ->
  machine = createSearchMachine searchString
  # ...

I did a little experimenting with this one and I’m pretty pleased with the results.  Because the character array indexes actually represent the states the search machine can take, we cant just iterate over the characters.  I think the new version makes it clear that the use of the index range is not a mistake.  I also decided to see if I could chain the call to concat right off of the list comprehension and that worked. Then the line was really long, so i decided to see if significant whitespace would prevent me from continuing the statement on the next line, but it worked just fine. Note that the () around the list comprehension are necessary both times. The parenthesis aren’t part of calling concat, they are part of the list comprehension.

I saved another line by moving finalStates inline, then updated the signature and call site in testMachine.

Creating Machines

Here is our current createMachine implementation. I don’t see any advantage to making it into a class, so I’ll just clean it up.

createMachine = (rules, initialState, finalStates) ->
 {
   currentState: initialState
   transitionRules: rules
   acceptanceStates: finalStates
   read: (c) ->
    for i in [0..this.transitionRules.length - 1]
      if this.transitionRules[i].match this.currentState, c
        return this.currentState = this.transitionRules[i].resultState
   readAll: (s) -> this.read x for x in s.split ''
   acceptedInput: () -> this.currentState in this.acceptanceStates
 }

I got rid of the object properties and captured with closure instead. I also converted the counted loop into a cleaner for..in loop. Finally, I picked some better variable names.

createMachine = (transitionRules, currentState, acceptanceStates) ->
  readAll: (input) -> this.read inputChar for inputChar in input.split ''
  read: (inputChar) ->
    for rule in transitionRules
      if rule.match currentState, inputChar
        return currentState = rule.resultState  
  acceptedInput: -> currentState in acceptanceStates

After testing I see that this still works as expected.

Creating Rules

Our rule creation is really spread out and contains a bit of duplication:

createRule = (index, source) -> {
 initialState: index
 inputChar: source[index]
 resultState: index + 1
 match: (q, c) -> true if (this.initialState == q && this.inputChar == c)
 }

createAltRule = (index, source) -> 
 if index == source.length
  {
    # trap in acceptance state
    initialState: index
    inputChar: null
    resultState: index
    match: (q, c) -> true if (this.initialState == q) 
  }
 else
  {
    # return to start state
    initialState: index
    inputChar: source[index]
    resultState: 0
    match: (q, c) -> true if (this.initialState == q && this.inputChar != c)
  }

Primarily these rules only vary by their method for evaluating match. Our POR (plain old rule) takes an index (which in this case represents a state) and a source string (of which we are only interested in one character) then computes the next state. The match function is true if the passed in state and input character match the stored values. This method can definitely be cleaned up.

createRule = (CurrentState, InputChar, NextState) ->
  getNextState: -> NextState
  match: (machineState, inputChar) ->
    machineState is CurrentState and inputChar is InputChar

With createRule defined this way, we need to make some changes to createSearchMachine so that it passes the needed data, and to createMachine so that it calls getNextState. If the machine is going to query getNextState instead of accessing a property, then the alternate rules will need functions with the same name and semantics.

createAltRule = (index, source) -> 
 if index == source.length
  {
    # trap in acceptance state
    initialState: index
    inputChar: null
    resultState: index
    getNextState: -> @resultState
    match: (q, c) -> true if (this.initialState == q) 
  }
 else
  {
    # return to start state
    initialState: index
    inputChar: source[index]
    resultState: 0
    getNextState: -> @resultState
    match: (q, c) -> true if (this.initialState == q && this.inputChar != c)
  }

So I’m beginning to sense a relationship between the POR and the alternate rules. First, I test the new code and once again find out that it works.  Lets convert createRule into a class, then use the createRule method as a shim to call the constructor.

class Rule
  constructor: (@CurrentState, @InputChar, @NextState) ->
  getNextState: => @NextState
  match: (machineState, inputChar) =>
    machineState is @CurrentState and inputChar is @InputChar

createRule = (CurrentState, InputChar, NextState) ->
  new Rule CurrentState, InputChar, NextState

The purpose of the shim is just to ensure that we don’t need to update all the call sites while we are refactoring the rules. With this new code in place, I test and everything still works. Now we can look at extending Rule to take over parts of createAltRule. Lets look at the top half of createAltRule first.

createAltRule = (index, source) -> 
 if index == source.length
  {
    # trap in acceptance state
    initialState: index
    inputChar: null
    resultState: index
    getNextState: -> @resultState
    match: (q, c) -> true if (this.initialState == q) 
  }

The purpose of this branch is to define a rule that traps the machine in a certain state. In this case, it will be the acceptance state. If we count the number of characters in the search string, we find a number that represents the machine’s acceptance state.  If we manage to reach that acceptance state, we want to stay in it.  Once we have determined that the source string contains the search string, the machine should accept the input, no matter what the rest of the string says.  The trap state implementation just compares the machine state to the acceptance state, and returns the acceptance state when the machine calls getNextState.

Here is how we can extend Rule to support this.

class TrapRule extends Rule
  constructor: (@CurrentState) ->  
    super @CurrentState, null, @CurrentState  
  match: (machineState, inputChar) =>
    machineState is @CurrentState

The TrapRule class extends Rule by replacing the match implementation with the same logic comparison implemented in the old code. I use super to call the Rule constructor with the @CurrentState value as both the current and next state. I pass null as the character to make it explicit that the input character is not used.

Now we can update the createAltRule function.

createAltRule = (index, source) -> 
 if index == source.length
    new TrapRule index
 else
  {
    # return to start state
    initialState: index
    inputChar: source[index]
    resultState: 0
    getNextState: -> @resultState
    match: (q, c) -> true if (this.initialState == q && this.inputChar != c)
  }

The bottom half of createAltRule describes a “return rule”. This rule handles any scenario where the machine reads a character that is not in the search string, and also any scenario where the character is in the search string, but not in the correct position. For example, both of our test strings start with the character W. The return rule would apply because W does not appear in operating system. As the machine reads along it will encounter o at position 5. Finding this character, it will advance from state 0 to state 1. The next character it wants to read is p. But in either test string, p is not the character at position 6. So, the return rule will apply and return the machine to state 0. Without the return rules, the machine could only accept strings that start with the search string, but we want a machine that accept strings which contain the search string. So, the match function looks at the input character passed in by the machine, and matches when the character does not match the stored character, and getNextState always returns 0.

Here is a ReturnRule class that implements the same logic.

class ReturnRule extends Rule
  constructor: (@CurrentState, @InputChar) ->
    super @CurrentState, @InputChar, 0
  match: (machineState, inputChar) =>
    machineState is @CurrentState and inputChar isnt @InputChar

And we can reduce our createAltRule method to this:

createAltRule = (index, source) -> 
 if index is source.length
    new TrapRule index
 else
    new ReturnRule index, source[index]

And this works too. Now that we have cleaned up our rules we can see that our shim is only called in one place, and does very little, so we can inline the Rule constructor and delete createRule.

createSearchMachine = (searchString) ->
  source = searchString.split '' 
  states = [0..source.length]
  rules = (new Rule x, source[x], x + 1 for x in states)
    .concat (createAltRule x, source for x in states)
  createMachine rules, 0, [ source.length ]

Lets try to get rid of the call to createAltRule with better list comprehension.

createSearchMachine = (searchString) ->
  source = searchString.split '' 
  states = [0..source.length]
  rules = (new Rule x, source[x], x + 1 for x in states)
    .concat (new ReturnRule x, source[x] for x in states when x isnt source.length)
  createMachine rules, 0, [ source.length ]

As an intermediate step, I simply filtered out the state that would have led to creating the trap rule and replaced the call to createAltRule with a direct instantiation of ReturnRule. I expect this to fail, but it doesn’t! Turns our our machine has a bug in it, since I designed it “knowing” that every possible state/character combination would have a rule, I did not handle the case in read where no rules matched. Just to be on the safe side, I’ll update read by moving the current state to -1 when no rules match.  Although it’s conventional to use non-negative integer states when simulating state machines, the truth is that you can use any symbol, positive, negative, or Celtic runes, it’s up to you.  Here is the updated Read implementation:

  read: (inputChar) ->
    for rule in transitionRules
      if rule.match currentState, inputChar
        return currentState = rule.getNextState()  
    currentState = -1

Now, the machine fails as expected. We can repair it by creating the TrapRule and pushing it into the rules collection.

createSearchMachine = (searchString) ->
  source = searchString.split '' 
  states = [0..source.length]
  rules = (new Rule x, source[x], x + 1 for x in states)
    .concat (new ReturnRule x, source[x] for x in states when x isnt source.length)
  rules.push new TrapRule source.length
  createMachine rules, 0, [ source.length ]

This works so we can get rid of the createAltRule function.  I made all these changes to the existing gist and I committed a few intermediate steps along the way.  Enjoy.

CoffeeScript Crash Course

CoffeeScript Crash Course | SoCalCodeCamp @ UCSD | 6/24/2012

As I mentioned in recent posts, SoCalCodeCamp was last weekend.  I presented two sessions:

  • Stop Guessing About MEF Composition, Start Testing
  • CoffeeScript Crash Course

I did not record video for the MEF session, but you can find a recording of my MEF session from SoCalCodeCamp @ CSU Fullerton linked from this previous post.  The video for the CoffeeScript session is embedded above.  Below, find links to the presentation materials.

Since I’m not an expert in CoffeeScript or JavaScript, this session was targeted at beginners and was meant to be my fun or challenging session.  It was also an excuse to sneak in a Computer Science topic and talk about Finite State Machines.  During the session, I live coded a FSM that accepts even length strings.  Here’s what it looks like:

UCSD.FSM

I explain a bit about how this works in the video and I was able to work through it during the session.  But there were two bugs that prevented the machine from providing the correct results.  Oh well, people were mostly there to see CoffeeScript, getting the machine to work was secondary.  After getting home, I figured out the bugs and posted the code.  I also wanted to create this supplemental post to show a cleaner version of the working code.  In this article, I’ll also show how to implement the machine using CoffeeScript classes, which I wasn’t able to do in the demo.

The code is available as a gist, and if you are really interested in its evolution, you can step back and forth between the different commits and study the changes.

Let’s get started.

Instant Feedback With Node.js

First install node.js and the coffee-script NPM module. You can find instructions on coffeescript.org.  Watch-compile your source file with this command:

coffee --watch --compile .\coffeescriptcrashcourse.coffee 

A *.js file should appear with the same name as the CoffeeScript file. In our case, coffeescriptcrashcourse.js. Open both files in Sublime Text 2, side by side.

When you make changes in the CoffeeScript file and save, the watch compiler will update the JavaScript as soon as the file hits the disk. To see the changes, just click in the Sublime window displaying the js, and Sublime should instantly refresh (as long as you haven’t made any changes in that window.)

Shrinking the Implementation

Creating rules

My initial implementation of createRule (created live during my SoCalCodeCamp session) looked like this:

createRule = (currentState, inputCharacter, nextState) ->
 state: currentState
 char: inputCharacter
 next: nextState
 Match: (machineState, i) ->   
   @state is machineState and @char is i

This works fine, and its the most obvious way to setup a function that does what we want. Like I told my session attendees, just try what you think will work and it probably will. But if we try, we can take advantage of some additional CoffeeScript language features to make this implementation shorter:

createRule = (CurrentState, InputChar, NextState) ->
  GetNextState: () -> NextState
  Match: (machineState, inputChar) -> 
    machineState is CurrentState and inputChar is InputChar

CoffeeScript doesn’t require us to manually create fields to store the state passed into the function, the variables will be automatically visible to our object. But, they wont be available later on unless we close over them. Hence, while we removed three lines, we added one back in order to make NextState accessible. I also renamed some of the variables, and then updated the machine implementation to use the new GetNextState function instead of directly accessing data.

I test, and everything works.

Creating machines

Likewise, I can tighten up the createMachine implementation with the same technique. Here is the original:

createMachine = (initialState, rules, finalStates) ->
  currentState: initialState
  ruleSet: rules
  acceptance: finalStates 
  ReadAll: (s) ->
    @Read x for x in s.split ''
    return undefined

  Read: (c) ->
    return if @currentState == 2 
    for rule in @ruleSet
      if rule.Match @currentState, c
        return @Update rule
    @currentState = 2

  Update: (r) ->
    @currentState = r.GetNextState()
  AcceptedInput: () ->
    @currentState in @acceptance

And the new:

createMachine = (CurrentState, Rules, FinalStates) ->
  ReadAll: (value) ->
    @Read x for x in value.split ''
    return undefined
  Read: (inputChar) ->
    return if CurrentState == 2 
    for rule in Rules
      return @Update rule if rule.Match CurrentState, inputChar        
    CurrentState = 2   
  Update: (rule) ->
    CurrentState = rule.GetNextState()
  AcceptedInput: () ->
    CurrentState in FinalStates

I also eliminated one level of nesting by turning the loop’s inner conditional statement into a post fixed condition. Since nothing outside the object manipulated the object’s fields, I don’t need to provide any getters for CurrentState, Rules, or FinalStates. Although I can just take advantage of closure to capture my data members, I still need to use the @ symbol when referring to non-closure data (in this case the non-closure fields are all functions, but the same should be true if they were just data values).

I test this and everything still works.

Machine Factory

During the demo I created one machine instance by writing down the rules and then invoking createMachine. Once I had the machine, I used it to evaluate a string and query whether the machine accepted the input. Due to a couple bugs, the machine rejected input that it should have accepted. However, once I got home and debugged it, the machine worked as expected.

Creating a one off machine was fine for the demo, but not very useful when we want to test the machine with different values. This is because a new machine must be created for each input string, since there is no way to reset a machine after it has run. createMachine is (almost) a method for creating any finite state machine. I say “almost” because the “dead” state is hard coded as 2. To make createMachine useful, we must create a rule set and pass it to the machine along with the start state, final state and (ideally) the dead state.

Lets fix the dead state issue first.

createMachine = (CurrentState, Rules, FinalStates, DeadState) ->
  ReadAll: (value) ->
    @Read x for x in value.split ''
    return undefined
  Read: (inputChar) ->
    return if CurrentState == DeadState 
    for rule in Rules
      return @Update rule if rule.Match CurrentState, inputChar        
    CurrentState = DeadState   
  Update: (rule) ->
    CurrentState = rule.GetNextState()
  AcceptedInput: () ->
    CurrentState in FinalStates

By promoting the dead state to a parameter, we should be able to use this code to create just about any (deterministic) Finite State Machine. Of Course, we need to pass 2 when we call createMachine for our current example to continue to work.

myMachine = createMachine 0, rules, [0], 2 

Now, we can wrap up the code that creates the specific FSM from our demo.

createMyMachine = () ->
  rules = [
    createRule 0, '0', 1
    createRule 0, '1', 1
    createRule 1, '0', 0
    createRule 1, '1', 0
  ]
  createMachine 0, rules, [0], 2

With createMyMachine we can test with blocks like this:

myMachine = createMyMachine()
myMachine.ReadAll "00"
alert myMachine.AcceptedInput()

myMachine2 = createMyMachine()
myMachine2.ReadAll "1"
alert myMachine2.AcceptedInput()

But clearly, we have an opportunity to remove duplication, lets do that. While we’re in there we can make the output a little more informative too.

testMyMachine = (input) ->
  myMachine = createMyMachine()
  myMachine.ReadAll input
  alert [input, myMachine.AcceptedInput()].join ','

Now we have a prettier implementation that’s more useful that the starting implementation, but just happens to use the same number of lines.

Get Classy

During the demo I couldn’t show the attendees CoffeeScript’s class support. When I watched the video I realized why. I forgot to put the skinny arrow after my constructor! I should have kept a closer eye on the top right corner of coffeescript.org, there was a message up there which probably would have taken me to the right line.

Since I couldn’t do it then, lets do it now, and convert our two objects into classes. Starting with the machine.

class Machine
  constructor: (@CurrentState, @Rules, @FinalStates, @DeadState) ->
  ReadAll: (value) =>
    @Read x for x in value.split ''
    return undefined
  Read: (inputChar) => 
    return if @CurrentState is @DeadState
    for rule in @Rules
      return @Update rule if rule.Match @CurrentState, inputChar
    @CurrentState = @DeadState
    return undefined
  Update: (rule) =>
    @CurrentState = rule.GetNextState()
    return undefined
  AcceptedInput: () => 
    @CurrentState in @FinalStates

Here is our Machine class. Its not that different than our createMachine implementation. However, we do go back to using @ everywhere and we are using the fat arrow. The fat arrow seems like the right thing to do, according to my reading of the CoffeeScript documentation:

When used in a class definition, methods declared with the fat arrow will be automatically bound to each instance of the class when the instance is constructed.

coffeescript.org

However, I tried it with the skinny arrow and in our use case, it still worked. So, I still can’t claim to understand this binding. Oh well, we also need to update createMyMachine to use the new operator:

createMyMachine = () ->
  rules = [
    createRule 0, '0', 1
    createRule 0, '1', 1
    createRule 1, '0', 0
    createRule 1, '1', 0
  ]
  new Machine 0, rules, [0], 2

Now lets turn createRule into a class.

class Rule 
  constructor: (@CurrentState, @InputChar, @NextState) ->
  GetNextState: () => @NextState  
  Match: (machineState, inputChar) =>
    machineState is @CurrentState and inputChar is @InputChar

Nothing very exiting, just more of what we saw when we converted Machine. We just need to update createMyMachine to get this to work with our test.

createMyMachine = () ->
  rules = [
    new Rule 0, '0', 1
    new Rule 0, '1', 1
    new Rule 1, '0', 0
    new Rule 1, '1', 0
  ]
  new Machine 0, rules, [0], 2

In our scenario, there really isn’t a big advantage to using classes, because we have no need to extend Rule or Machine. But, I wanted to show what this looks like because I’m sure it will be interesting to someone getting started with CoffeeScript.

Someday…

Someday I’ll do one of these machines without iterating over a set of rules  The approach in this example has big O complexity of at least n*n.  If we had long strings and large rule sets, we would spend a lot of time looping in the “Read” method.

Since there was so much interest in this session, maybe I’ll do it again at the next CodeCamp, but there are still plenty of other languages I’d like to play with including .NET languages like Nemerle and F#.  Whatever crash course I do in the future, it will be for beginners and it will be for fun.