Coding

Implementing String.Contains

After writing up the Finite State Machine from SoCalCodeCamp, I had CoffeeScript on the brain and decided to go back and look at my first CoffeeScript experiment to see if there was any room for improvement.  Actually, I knew there was room for improvement.  One of the times when I was stuck in my presentation I flipped over to this machine and I’m pretty sure I blanched visibly and the counted loop in the “read” function.

So lets revisit my first CoffeeScript state machine.  This machine determines if a specified substring exists within a string. In other words, it implements bool String.Contains(String).  Besides all the room for improvement, I think there is an opportunity for some meaningful use of CoffeeScript classes.

First we’ll warm up by adding a shim for alert.

alert = (value) -> console.log value 

Now we can get started with our conversion, we’ll start from the bottom up this time.

Remove Test Duplication

Here is our current test code.

test1 = "Windows is an operating system installed on many machines."
test2 = "Welcome to the operating room, the Doctor is just finishing his martini."

getMachineResult = (b) -> 
  result = ""
  if b == true 
   result = " Accepted. "
  else
   result = " Not Accepted. "
  result

machine = createMachine(rules, 0, finalStates)
machine.readAll test1
alert getMachineResult(machine.acceptedInput()) + test1

machine2 = createMachine(rules, 0, finalStates)
machine2.readAll test2
alert getMachineResult(machine2.acceptedInput()) + test2

Lets remove the duplication and the weirdo getMachineResult method.

testMachine = (input) ->
  machine = createMachine(rules, 0, finalStates)
  machine.readAll input  
  alert [machine.acceptedInput(), input].join '--'

testMachine "Windows is an operating system installed on many machines."
testMachine "Welcome to the operating room, the Doctor is just finishing his martini."

Encapsulate Machine Creation

Our testMachine function currently closes over rules and finalStates from the CoffeeScript “module” scope. We also have these five lines which represent the logic necessary for getting ready to create our machine.

source = "operating system".split '' 
rules = (createRule x, source for x in [0..source.length])
altRules = (createAltRule x, source for x in [0..source.length])
rules.concat altRules
finalStates = [ source.length ]

And the machine is actually created in testMachine

machine = createMachine(rules, 0, finalStates)

Lets encapsulate all this logic into one function, get it out of the module scope, and generalize it so that it can work with any search string.

createSearchMachine = (searchString) ->
  source = searchString.split '' 
  states = [0..source.length]
  rules = (createRule x, source for x in states)
    .concat (createAltRule x, source for x in states)
  createMachine rules, 0, [ source.length ]

testMachine = (input, searchString) ->
  machine = createSearchMachine searchString
  # ...

I did a little experimenting with this one and I’m pretty pleased with the results.  Because the character array indexes actually represent the states the search machine can take, we cant just iterate over the characters.  I think the new version makes it clear that the use of the index range is not a mistake.  I also decided to see if I could chain the call to concat right off of the list comprehension and that worked. Then the line was really long, so i decided to see if significant whitespace would prevent me from continuing the statement on the next line, but it worked just fine. Note that the () around the list comprehension are necessary both times. The parenthesis aren’t part of calling concat, they are part of the list comprehension.

I saved another line by moving finalStates inline, then updated the signature and call site in testMachine.

Creating Machines

Here is our current createMachine implementation. I don’t see any advantage to making it into a class, so I’ll just clean it up.

createMachine = (rules, initialState, finalStates) ->
 {
   currentState: initialState
   transitionRules: rules
   acceptanceStates: finalStates
   read: (c) ->
    for i in [0..this.transitionRules.length - 1]
      if this.transitionRules[i].match this.currentState, c
        return this.currentState = this.transitionRules[i].resultState
   readAll: (s) -> this.read x for x in s.split ''
   acceptedInput: () -> this.currentState in this.acceptanceStates
 }

I got rid of the object properties and captured with closure instead. I also converted the counted loop into a cleaner for..in loop. Finally, I picked some better variable names.

createMachine = (transitionRules, currentState, acceptanceStates) ->
  readAll: (input) -> this.read inputChar for inputChar in input.split ''
  read: (inputChar) ->
    for rule in transitionRules
      if rule.match currentState, inputChar
        return currentState = rule.resultState  
  acceptedInput: -> currentState in acceptanceStates

After testing I see that this still works as expected.

Creating Rules

Our rule creation is really spread out and contains a bit of duplication:

createRule = (index, source) -> {
 initialState: index
 inputChar: source[index]
 resultState: index + 1
 match: (q, c) -> true if (this.initialState == q && this.inputChar == c)
 }

createAltRule = (index, source) -> 
 if index == source.length
  {
    # trap in acceptance state
    initialState: index
    inputChar: null
    resultState: index
    match: (q, c) -> true if (this.initialState == q) 
  }
 else
  {
    # return to start state
    initialState: index
    inputChar: source[index]
    resultState: 0
    match: (q, c) -> true if (this.initialState == q && this.inputChar != c)
  }

Primarily these rules only vary by their method for evaluating match. Our POR (plain old rule) takes an index (which in this case represents a state) and a source string (of which we are only interested in one character) then computes the next state. The match function is true if the passed in state and input character match the stored values. This method can definitely be cleaned up.

createRule = (CurrentState, InputChar, NextState) ->
  getNextState: -> NextState
  match: (machineState, inputChar) ->
    machineState is CurrentState and inputChar is InputChar

With createRule defined this way, we need to make some changes to createSearchMachine so that it passes the needed data, and to createMachine so that it calls getNextState. If the machine is going to query getNextState instead of accessing a property, then the alternate rules will need functions with the same name and semantics.

createAltRule = (index, source) -> 
 if index == source.length
  {
    # trap in acceptance state
    initialState: index
    inputChar: null
    resultState: index
    getNextState: -> @resultState
    match: (q, c) -> true if (this.initialState == q) 
  }
 else
  {
    # return to start state
    initialState: index
    inputChar: source[index]
    resultState: 0
    getNextState: -> @resultState
    match: (q, c) -> true if (this.initialState == q && this.inputChar != c)
  }

So I’m beginning to sense a relationship between the POR and the alternate rules. First, I test the new code and once again find out that it works.  Lets convert createRule into a class, then use the createRule method as a shim to call the constructor.

class Rule
  constructor: (@CurrentState, @InputChar, @NextState) ->
  getNextState: => @NextState
  match: (machineState, inputChar) =>
    machineState is @CurrentState and inputChar is @InputChar

createRule = (CurrentState, InputChar, NextState) ->
  new Rule CurrentState, InputChar, NextState

The purpose of the shim is just to ensure that we don’t need to update all the call sites while we are refactoring the rules. With this new code in place, I test and everything still works. Now we can look at extending Rule to take over parts of createAltRule. Lets look at the top half of createAltRule first.

createAltRule = (index, source) -> 
 if index == source.length
  {
    # trap in acceptance state
    initialState: index
    inputChar: null
    resultState: index
    getNextState: -> @resultState
    match: (q, c) -> true if (this.initialState == q) 
  }

The purpose of this branch is to define a rule that traps the machine in a certain state. In this case, it will be the acceptance state. If we count the number of characters in the search string, we find a number that represents the machine’s acceptance state.  If we manage to reach that acceptance state, we want to stay in it.  Once we have determined that the source string contains the search string, the machine should accept the input, no matter what the rest of the string says.  The trap state implementation just compares the machine state to the acceptance state, and returns the acceptance state when the machine calls getNextState.

Here is how we can extend Rule to support this.

class TrapRule extends Rule
  constructor: (@CurrentState) ->  
    super @CurrentState, null, @CurrentState  
  match: (machineState, inputChar) =>
    machineState is @CurrentState

The TrapRule class extends Rule by replacing the match implementation with the same logic comparison implemented in the old code. I use super to call the Rule constructor with the @CurrentState value as both the current and next state. I pass null as the character to make it explicit that the input character is not used.

Now we can update the createAltRule function.

createAltRule = (index, source) -> 
 if index == source.length
    new TrapRule index
 else
  {
    # return to start state
    initialState: index
    inputChar: source[index]
    resultState: 0
    getNextState: -> @resultState
    match: (q, c) -> true if (this.initialState == q && this.inputChar != c)
  }

The bottom half of createAltRule describes a “return rule”. This rule handles any scenario where the machine reads a character that is not in the search string, and also any scenario where the character is in the search string, but not in the correct position. For example, both of our test strings start with the character W. The return rule would apply because W does not appear in operating system. As the machine reads along it will encounter o at position 5. Finding this character, it will advance from state 0 to state 1. The next character it wants to read is p. But in either test string, p is not the character at position 6. So, the return rule will apply and return the machine to state 0. Without the return rules, the machine could only accept strings that start with the search string, but we want a machine that accept strings which contain the search string. So, the match function looks at the input character passed in by the machine, and matches when the character does not match the stored character, and getNextState always returns 0.

Here is a ReturnRule class that implements the same logic.

class ReturnRule extends Rule
  constructor: (@CurrentState, @InputChar) ->
    super @CurrentState, @InputChar, 0
  match: (machineState, inputChar) =>
    machineState is @CurrentState and inputChar isnt @InputChar

And we can reduce our createAltRule method to this:

createAltRule = (index, source) -> 
 if index is source.length
    new TrapRule index
 else
    new ReturnRule index, source[index]

And this works too. Now that we have cleaned up our rules we can see that our shim is only called in one place, and does very little, so we can inline the Rule constructor and delete createRule.

createSearchMachine = (searchString) ->
  source = searchString.split '' 
  states = [0..source.length]
  rules = (new Rule x, source[x], x + 1 for x in states)
    .concat (createAltRule x, source for x in states)
  createMachine rules, 0, [ source.length ]

Lets try to get rid of the call to createAltRule with better list comprehension.

createSearchMachine = (searchString) ->
  source = searchString.split '' 
  states = [0..source.length]
  rules = (new Rule x, source[x], x + 1 for x in states)
    .concat (new ReturnRule x, source[x] for x in states when x isnt source.length)
  createMachine rules, 0, [ source.length ]

As an intermediate step, I simply filtered out the state that would have led to creating the trap rule and replaced the call to createAltRule with a direct instantiation of ReturnRule. I expect this to fail, but it doesn’t! Turns our our machine has a bug in it, since I designed it “knowing” that every possible state/character combination would have a rule, I did not handle the case in read where no rules matched. Just to be on the safe side, I’ll update read by moving the current state to -1 when no rules match.  Although it’s conventional to use non-negative integer states when simulating state machines, the truth is that you can use any symbol, positive, negative, or Celtic runes, it’s up to you.  Here is the updated Read implementation:

  read: (inputChar) ->
    for rule in transitionRules
      if rule.match currentState, inputChar
        return currentState = rule.getNextState()  
    currentState = -1

Now, the machine fails as expected. We can repair it by creating the TrapRule and pushing it into the rules collection.

createSearchMachine = (searchString) ->
  source = searchString.split '' 
  states = [0..source.length]
  rules = (new Rule x, source[x], x + 1 for x in states)
    .concat (new ReturnRule x, source[x] for x in states when x isnt source.length)
  rules.push new TrapRule source.length
  createMachine rules, 0, [ source.length ]

This works so we can get rid of the createAltRule function.  I made all these changes to the existing gist and I committed a few intermediate steps along the way.  Enjoy.

Advertisements
Coding, Presentations

CoffeeScript Crash Course

CoffeeScript Crash Course | SoCalCodeCamp @ UCSD | 6/24/2012

As I mentioned in recent posts, SoCalCodeCamp was last weekend.  I presented two sessions:

  • Stop Guessing About MEF Composition, Start Testing
  • CoffeeScript Crash Course

I did not record video for the MEF session, but you can find a recording of my MEF session from SoCalCodeCamp @ CSU Fullerton linked from this previous post.  The video for the CoffeeScript session is embedded above.  Below, find links to the presentation materials.

Since I’m not an expert in CoffeeScript or JavaScript, this session was targeted at beginners and was meant to be my fun or challenging session.  It was also an excuse to sneak in a Computer Science topic and talk about Finite State Machines.  During the session, I live coded a FSM that accepts even length strings.  Here’s what it looks like:

UCSD.FSM

I explain a bit about how this works in the video and I was able to work through it during the session.  But there were two bugs that prevented the machine from providing the correct results.  Oh well, people were mostly there to see CoffeeScript, getting the machine to work was secondary.  After getting home, I figured out the bugs and posted the code.  I also wanted to create this supplemental post to show a cleaner version of the working code.  In this article, I’ll also show how to implement the machine using CoffeeScript classes, which I wasn’t able to do in the demo.

The code is available as a gist, and if you are really interested in its evolution, you can step back and forth between the different commits and study the changes.

Let’s get started.

Instant Feedback With Node.js

First install node.js and the coffee-script NPM module. You can find instructions on coffeescript.org.  Watch-compile your source file with this command:

coffee --watch --compile .\coffeescriptcrashcourse.coffee 

A *.js file should appear with the same name as the CoffeeScript file. In our case, coffeescriptcrashcourse.js. Open both files in Sublime Text 2, side by side.

When you make changes in the CoffeeScript file and save, the watch compiler will update the JavaScript as soon as the file hits the disk. To see the changes, just click in the Sublime window displaying the js, and Sublime should instantly refresh (as long as you haven’t made any changes in that window.)

Shrinking the Implementation

Creating rules

My initial implementation of createRule (created live during my SoCalCodeCamp session) looked like this:

createRule = (currentState, inputCharacter, nextState) ->
 state: currentState
 char: inputCharacter
 next: nextState
 Match: (machineState, i) ->   
   @state is machineState and @char is i

This works fine, and its the most obvious way to setup a function that does what we want. Like I told my session attendees, just try what you think will work and it probably will. But if we try, we can take advantage of some additional CoffeeScript language features to make this implementation shorter:

createRule = (CurrentState, InputChar, NextState) ->
  GetNextState: () -> NextState
  Match: (machineState, inputChar) -> 
    machineState is CurrentState and inputChar is InputChar

CoffeeScript doesn’t require us to manually create fields to store the state passed into the function, the variables will be automatically visible to our object. But, they wont be available later on unless we close over them. Hence, while we removed three lines, we added one back in order to make NextState accessible. I also renamed some of the variables, and then updated the machine implementation to use the new GetNextState function instead of directly accessing data.

I test, and everything works.

Creating machines

Likewise, I can tighten up the createMachine implementation with the same technique. Here is the original:

createMachine = (initialState, rules, finalStates) ->
  currentState: initialState
  ruleSet: rules
  acceptance: finalStates 
  ReadAll: (s) ->
    @Read x for x in s.split ''
    return undefined

  Read: (c) ->
    return if @currentState == 2 
    for rule in @ruleSet
      if rule.Match @currentState, c
        return @Update rule
    @currentState = 2

  Update: (r) ->
    @currentState = r.GetNextState()
  AcceptedInput: () ->
    @currentState in @acceptance

And the new:

createMachine = (CurrentState, Rules, FinalStates) ->
  ReadAll: (value) ->
    @Read x for x in value.split ''
    return undefined
  Read: (inputChar) ->
    return if CurrentState == 2 
    for rule in Rules
      return @Update rule if rule.Match CurrentState, inputChar        
    CurrentState = 2   
  Update: (rule) ->
    CurrentState = rule.GetNextState()
  AcceptedInput: () ->
    CurrentState in FinalStates

I also eliminated one level of nesting by turning the loop’s inner conditional statement into a post fixed condition. Since nothing outside the object manipulated the object’s fields, I don’t need to provide any getters for CurrentState, Rules, or FinalStates. Although I can just take advantage of closure to capture my data members, I still need to use the @ symbol when referring to non-closure data (in this case the non-closure fields are all functions, but the same should be true if they were just data values).

I test this and everything still works.

Machine Factory

During the demo I created one machine instance by writing down the rules and then invoking createMachine. Once I had the machine, I used it to evaluate a string and query whether the machine accepted the input. Due to a couple bugs, the machine rejected input that it should have accepted. However, once I got home and debugged it, the machine worked as expected.

Creating a one off machine was fine for the demo, but not very useful when we want to test the machine with different values. This is because a new machine must be created for each input string, since there is no way to reset a machine after it has run. createMachine is (almost) a method for creating any finite state machine. I say “almost” because the “dead” state is hard coded as 2. To make createMachine useful, we must create a rule set and pass it to the machine along with the start state, final state and (ideally) the dead state.

Lets fix the dead state issue first.

createMachine = (CurrentState, Rules, FinalStates, DeadState) ->
  ReadAll: (value) ->
    @Read x for x in value.split ''
    return undefined
  Read: (inputChar) ->
    return if CurrentState == DeadState 
    for rule in Rules
      return @Update rule if rule.Match CurrentState, inputChar        
    CurrentState = DeadState   
  Update: (rule) ->
    CurrentState = rule.GetNextState()
  AcceptedInput: () ->
    CurrentState in FinalStates

By promoting the dead state to a parameter, we should be able to use this code to create just about any (deterministic) Finite State Machine. Of Course, we need to pass 2 when we call createMachine for our current example to continue to work.

myMachine = createMachine 0, rules, [0], 2 

Now, we can wrap up the code that creates the specific FSM from our demo.

createMyMachine = () ->
  rules = [
    createRule 0, '0', 1
    createRule 0, '1', 1
    createRule 1, '0', 0
    createRule 1, '1', 0
  ]
  createMachine 0, rules, [0], 2

With createMyMachine we can test with blocks like this:

myMachine = createMyMachine()
myMachine.ReadAll "00"
alert myMachine.AcceptedInput()

myMachine2 = createMyMachine()
myMachine2.ReadAll "1"
alert myMachine2.AcceptedInput()

But clearly, we have an opportunity to remove duplication, lets do that. While we’re in there we can make the output a little more informative too.

testMyMachine = (input) ->
  myMachine = createMyMachine()
  myMachine.ReadAll input
  alert [input, myMachine.AcceptedInput()].join ','

Now we have a prettier implementation that’s more useful that the starting implementation, but just happens to use the same number of lines.

Get Classy

During the demo I couldn’t show the attendees CoffeeScript’s class support. When I watched the video I realized why. I forgot to put the skinny arrow after my constructor! I should have kept a closer eye on the top right corner of coffeescript.org, there was a message up there which probably would have taken me to the right line.

Since I couldn’t do it then, lets do it now, and convert our two objects into classes. Starting with the machine.

class Machine
  constructor: (@CurrentState, @Rules, @FinalStates, @DeadState) ->
  ReadAll: (value) =>
    @Read x for x in value.split ''
    return undefined
  Read: (inputChar) => 
    return if @CurrentState is @DeadState
    for rule in @Rules
      return @Update rule if rule.Match @CurrentState, inputChar
    @CurrentState = @DeadState
    return undefined
  Update: (rule) =>
    @CurrentState = rule.GetNextState()
    return undefined
  AcceptedInput: () => 
    @CurrentState in @FinalStates

Here is our Machine class. Its not that different than our createMachine implementation. However, we do go back to using @ everywhere and we are using the fat arrow. The fat arrow seems like the right thing to do, according to my reading of the CoffeeScript documentation:

When used in a class definition, methods declared with the fat arrow will be automatically bound to each instance of the class when the instance is constructed.

coffeescript.org

However, I tried it with the skinny arrow and in our use case, it still worked. So, I still can’t claim to understand this binding. Oh well, we also need to update createMyMachine to use the new operator:

createMyMachine = () ->
  rules = [
    createRule 0, '0', 1
    createRule 0, '1', 1
    createRule 1, '0', 0
    createRule 1, '1', 0
  ]
  new Machine 0, rules, [0], 2

Now lets turn createRule into a class.

class Rule 
  constructor: (@CurrentState, @InputChar, @NextState) ->
  GetNextState: () => @NextState  
  Match: (machineState, inputChar) =>
    machineState is @CurrentState and inputChar is @InputChar

Nothing very exiting, just more of what we saw when we converted Machine. We just need to update createMyMachine to get this to work with our test.

createMyMachine = () ->
  rules = [
    new Rule 0, '0', 1
    new Rule 0, '1', 1
    new Rule 1, '0', 0
    new Rule 1, '1', 0
  ]
  new Machine 0, rules, [0], 2

In our scenario, there really isn’t a big advantage to using classes, because we have no need to extend Rule or Machine. But, I wanted to show what this looks like because I’m sure it will be interesting to someone getting started with CoffeeScript.

Someday…

Someday I’ll do one of these machines without iterating over a set of rules  The approach in this example has big O complexity of at least n*n.  If we had long strings and large rule sets, we would spend a lot of time looping in the “Read” method.

Since there was so much interest in this session, maybe I’ll do it again at the next CodeCamp, but there are still plenty of other languages I’d like to play with including .NET languages like Nemerle and F#.  Whatever crash course I do in the future, it will be for beginners and it will be for fun.

Coding

Turing Machine in CoffeeScript

langsIf you’ve ever taken a formal languages class and you’ve been following along with my CoffeeScript experiments, then you probably already guessed that this post was coming.  Well, here it is, a CoffeeScript implementation of the Standard Turing Machine.

This is the last CoffeeScript post that I have on my stack.  Not that I won’t ever write about CoffeeScript again, but I don’t have any immediate ideas about what to write next, while I have plenty of ideas about other subjects.  For me, this post is very different than my last two CoffeeScript posts, because this time I am not translating from C/C++.  We haven’t had a homework assignment to implement a Turing Machine, so I don’t have a reference implementation to work from.  This machine was implemented in CoffeeScript from the start, but lessons learned from implementing the deterministic finite automaton and the pushdown automaton certainly gave me a head start.

Node.js

This implementation is also a little different because I finished the experiment using node.js.  I started out on coffeescript.org, but as the number of machines and test cases grew, I got tired of clicking OK on each alert.  Getting started with node.js and CoffeeScript was really easy.   I simply downloaded the windows installer from http://nodejs.org and ran the msi.  After installation, I started a command interpreter and typed “node” at the prompt.  On one machine, node came right up, but on two others I needed to reboot for node to show up in the path.  Maybe I just rebooted the first machine and forgot, but if you are having trouble, try a reboot.  I followed the instructions on http://coffeescript.org to install the coffee-script module using the node package manager.  After that, I was able to run scripts locally on my machine.

Turing Machine

The graphic at the top of this entry describes two languages and one function, each of which can be implemented using a Turing Machine.  I’m not qualified to assert that only a Turing Machine could accept the languages.  I’m pretty sure that the language of even-length strings could be accepted by a finite state machine, which would mean it could also be accepted by pushdown automata.  Certainly it seems likely that the function could not be computed by the more primitive machines, since they have no way to write the computation result.  The second language violates the pumping lemma for context free languages (actually violates the pumping lemma for regular languages too), so we know from an “academic” perspective that neither finite or pushdown automata could accept it.  But I spent a few minutes trying to figure out why a pushdown would fail to accept this language.

If we tried to accept L={a^nb^nc^n:n>=0} with a pushdown, we could start by counting the ‘a’ characters.  We could count them by pushing some symbol onto the pushdown’s stack, lets use X as that symbol.  So we would push some number of X onto the stack, until we encountered the first “b” character.  We would then want to remove an X from the stack, to account for the fact that the language requires the same number of as and bs.  But this wont work.  We still need to know how many as or bs there were so we can use that knowledge to verify the correct number of “c” characters.  So we should put that X back on the stack.  But if we put the X on the stack and keep reading, then we have no way of accounting for the fact that we just read a “b”.  So we are stuck.  This thought experiment might not be academically exhaustive, but it gives me enough intuitive comfort to agree with the textbook that this language cannot be accepted by a PDA.

blockquoteThe Turing Machine moves past this limitation by getting rid of the stack and replacing it with a “tape”.  You can think of a tape as a combination of a list, a stack, and a queue.  CoffeeScript arrays are well suited for emulating Turing’s tape.  You can iterate over the array or access elements at random, which allows the machine to move back and forth over it’s tape.  You have the unshift function, which lets the tape extend infinitely to the left, and the push function, which allows the machine to extend infinitely to the right.  Of course, we are bounded by the real word constraint of needing to store the array’s index, which limits us to 32-bit array sizes if I recall correctly, but for the purposes of this experiment, that boundary is irrelevant.

The tape is presented to the machine with input data already written on it and bounded on the left and right by “blank” symbols.  But, the machine is allowed to update the tape if it wants to, which sets it apart from state machines and pushdowns.  Also, the machine is allowed to move back and forth over the tape, whereas simpler machines are only allowed to read it from beginning to end.  These two differences allow the Turing Machine to keep track of all the data needed to recognize the language of equal numbers of as, bs and cs.  After working through a couple examples, I’ll finish by showing my solution for recognizing this language.  Be forewarned, this post is really long.

Implementation

Here is the gist.

As I mentioned, this implementation started in the CoffeeScript REPL, and ended on node.js.  The REPL runs as a browser and you can use the alert method to show results in message boxes, while node.js runs with a console attached and has no alert method.  The first little problem after migrating was getting output from the alerts into the console.  So we start with a little shim:

alert = (s) -> console.log s

To run the code in the REPL, just remove or comment this line.  With that out of the way, we start by defining the result object for the Turing Machine’s transition function:

createResult = (state, symbol, direction) ->
 nextState: state
 writeSymbol: symbol
 moveTo: direction
 toString: () -> [this.nextState, this.writeSymbol, this.moveTo].join ", "

When a the machine finds a rule that matches, it will need to update its state, write something to the tape and move left or right.  This object delivers that data to the machine, and includes an override of the toString method to facilitate debugging.

createRule = (state, symbol, result) ->
 inState: state
 readSymbol: symbol
 transitionResult: result
 match: (s, y) -> true if (this.inState == s && this.readSymbol == y)
 toString: () -> [this.inState, this.readSymbol, this.transitionResult].join ", "

The Turing Machine rules must match the machine’s current state and the symbol read off the tape.  The match function checks the member data against the data passed in by the machine and returns true when they are the same.  Again, toString is overridden.

createTape = (s, blank) ->
 t = [ blank ].concat s.split ''
 t.push blank
 pos = 0
 pos++ while t[pos] == blank
 pos-- while t[pos] == undefined
 return {
  currentPosition: pos
  buffer: t
  currentSym: () -> this.buffer[this.currentPosition]
  write: (s) -> this.buffer[this.currentPosition] = s
  moveRight: () ->
   p = this.currentPosition
   p++
   this.buffer.push blank if p == this.buffer.length
   this.currentPosition = p
   return null
  moveLeft: () ->
   p = this.currentPosition
   p--
   if p < 0
    this.buffer.unshift blank
    p = 0
   this.currentPosition = p
   return null
  move: (d) ->
   this.moveRight() if d == 'R'
   this.moveLeft() if d == 'L'
   return null
  toString: () -> (this.buffer.join ", ") + " pos: " + this.currentPosition
  }

Next, I’ve defined the behavior of the tape.  Although CoffeeScript arrays are well suited for use as Turing tapes, they need a little help in order to make the behavior more transparent to the machine.  We construct a tape object by providing two pieces of data, the first is a string loaded with the input that we want to pass to the machine.  The second piece of data is the symbol used to indicate to the machine that a cell is empty.  This can be whatever you want it to be, as long as it doesn’t interfere with the symbols used in the machine’s computations.  I found it very convenient to use a ‘B’ rather than null or undefined or some other option.  In C/C++ I would not hesitate to use (char)0 since this is still just a regular character.

Anyway, when we provide this data to the constructor function the function allocates a new array and puts the blank symbol in position 0.  Then, it splits the string into an array of characters and adds that to the buffer.  Finally, it adds another blank to the end of the buffer.  The conventional way to reason about Turing Machines starts the machine with the tape advanced to the leftmost non-blank character.  To support this convention, the tape constructor scans right until it finds a non-blank, or hits an undefined cell (which happens if the empty string “” is passed to the method).  If the constructor does hit the undefined area, it scans left again until it finds the initialized portion of the buffer.  Now, it is ready to construct the object to return by assigning the index and the buffer to a new object.  The methods on the tape object support the methods defined for Turing’s tape, currentSym reads the symbol at the index, write will update the buffer at the index.  The moveRight method will increase the index, and if the new index exceeds the length of the buffer, a new blank is added to the right side of the tape.  Likewise, the moveLeft method will decrease the index, and if the index becomes negative, the buffer is padded on the left side with a blank, and the index is moved back to 0 so that it points at the newly added blank.  The move method decides which of these two methods to use based on the value passed in by the machine, and finally the toString method will return the current contents of the tape along with the position of the read/write head.

Finally we come to the machine:

createMachine = (rules, initialState, blank, finalStates) ->
 transitionRules: rules
 currentState: initialState
 blankSymbol: blank
 acceptanceStates: finalStates
 tape: createTape '', blank
 isHalted: false
 toString: () -> 
  r = this.transitionRules.join "|| "
  f = this.acceptanceStates.join ", "
  [ r, this.currentState, this.blankSymbol, f, this.tape].join ":: "
 acceptedInput: () -> this.currentState in this.acceptanceStates
 loadTape: (s) -> this.tape = createTape s, this.blankSymbol
 readTape: () -> this.tape
 update: (r) ->
  this.currentState = r.nextState
  this.tape.write r.writeSymbol
  this.tape.move r.moveTo
  return null
 read: () ->
  q = this.currentState
  s = this.tape.currentSym()
  for rule in this.transitionRules
   if rule.match q, s
    return this.update rule.transitionResult
  return this.isHalted = true
 run: (s) ->
  this.loadTape s
  this.read() until this.isHalted
  return null

Like most of my machines, it takes a set of rules, the initial state and the set of acceptance states.  The Turing Machine also takes the symbol to use for blanks.  Like I mentioned above, a ‘B’ worked out just fine for me.  The tape is initialized with the empty string, and we also have a flag to indicate that the machine has halted.  Previously we simply read all the characters from the input and halted at the end, even if we entered a dead configuration before reading all the characters.  Since the Turing Machine can move back and forth, reaching the end of the input is not automatically a good place to stop processing, so this flag allows the machine to signal its done.  Like the other machines we have an acceptedInput method which allows the machine to indicate if it stopped in an acceptance state.  New to the Turing Machine is the loadTape method, which takes a string initializes a new tape instance, and a readTape method which allows us to check the contents of the tape.  We need this for machines that compute functions, because we will need to know more than whether the machine accepted the input, we want to look at the tape to see the results of the computation.

The update method applies the changes to the machine and the tape supplied by a rule result when the machine finds a matching rule.  On several of these methods I went ahead and returned null explicitly because I found it kind of annoying that CoffeeScript generated returns for methods where I don’t intend the results to convey useful information.  I could have reordered the instructions in the update method so that the assignment to current state occurred last, as I did with the pushdown automaton.  However, in methods like the run method, CoffeeScript allocates an array and generates a collection as the loop implied by the “until” expression is processed.  But it only does this because it thinks you want to automatically return the results from the last expression in the method.   By returning null, that loop is no longer the last statement, and CoffeeScript doesn’t bother generating the array.  Does this really matter?  I’m no expert, I just thought the extra array was kind of annoying.

I’ll wrap up the description of the machine functions by saying that the read method reads the current tape symbol and tries to find a matching rule.  If it can’t find a match, it flips the isHalted flag.  The run method loads the tape with the user input, then calls the read method until the halt flag is set.

I mentioned that we didn’t have a homework assignment to implement a Turing Machine, which is not to say that we did not have an assignment to describe a few machines.  It was my intention from the get-go to use this machine to work out the answers to my homework, so I create a method that would facilitate running different machine configurations with different test strings.

runMachine = (f, i) ->
 m = f()
 m.run i
 alert "input: " + i + " accepted: " + m.acceptedInput() + " tape: " + m.readTape()

This method expects a factory method that produces a fully configured machine, runs it, then alert/prints the results.

Now we are ready to build some machines.

Example

The first machine was a simple machine from our class lecture notes.  This factory method produces it by creating the necessary rule set, then returning the results from createMachine with appropriate parameters.

# Example from lecture
exampleMachine = () -> 
 exampleRuleSet = [ 
  createRule 0, 'a', createResult 0, 'b', 'R'
  createRule 0, 'b', createResult 0, 'b', 'R'
  createRule 0, 'B', createResult 1, 'B', 'L'
 ]
 createMachine exampleRuleSet, 0, 'B', [ 1 ]

This machine reads a tape full of as and bs, and converts all the as into bs.  It accepts the input if it manages to reach the rightmost blank.  We can test it with these lines.

runMachine exampleMachine, "aaa"
runMachine exampleMachine, "aba"
runMachine exampleMachine, "bbb"
runMachine exampleMachine, "caa"

With the three simple rules, the machine is able to convert the first three strings to all bs, but can’t do anything with the fourth since it has no rules for dealing with cs.

L = { w : |w| is even }

Our next machine accepts even length strings composed of as and bs.

m1 = () ->
 ruleSet = [
   createRule 0, 'B', createResult 5, 'B', 'L' # accept the empty string
   createRule 0, 'a', createResult 1, 'X', 'R' # replace the first a or b with an X
   createRule 0, 'b', createResult 1, 'X', 'R'
   createRule 0, 'Y', createResult 4, 'Y', 'R' # possibly located the tail
   createRule 1, 'a', createResult 1, 'a', 'R' # scan right until {B, Y}
   createRule 1, 'b', createResult 1, 'b', 'R'
   createRule 1, 'B', createResult 2, 'B', 'L'
   createRule 1, 'Y', createResult 2, 'Y', 'L' 
   createRule 2, 'a', createResult 3, 'Y', 'L' # replace first a or b with Y
   createRule 2, 'b', createResult 3, 'Y', 'L'   
   createRule 3, 'a', createResult 3, 'a', 'L' # scan left until X
   createRule 3, 'b', createResult 3, 'b', 'L'
   createRule 3, 'X', createResult 0, 'X', 'R'   
   createRule 4, 'Y', createResult 4, 'Y', 'R' # tail should be all Y
   createRule 4, 'B', createResult 5, 'B', 'L' # accept
 ]
 createMachine ruleSet, 0, 'B', [ 5 ]

We replace valid characters on the left side with X symbols.  Every time we write an X, we move to the right side and replace the rightmost a or b with Y, then we repeat.  If the string is even length, the X and Y string should meet in the middle.  The machine states will let us know if we have found the middle.  Lets look at a failing case:

runMachine m1, "a"

When the machine starts, the tape is positioned at this “a”, and the machine is in state 0.  The second rule in the list will match.  The machine will move to state 1, it will overwrite the “a” with an “X”, and it will move the tape to the right.  Now it is in state 1 and looking at a blank.  The matching rule will move the machine to state 2 and move the tape to the left.  Now the tape is positioned over the “X” again, and it is in state 2.  However, there is no rule that explains what to do with an X in state 2.  If the string had an even length, then there would be an a or b under the head right now, and the machine would replace that symbol with a Y.  But it found an X before finding an a or b to replace, therefore the string length must be odd.  The machine halts in state 2.  State 2 is not in the set of acceptance states, so the string is rejected.

Now consider an passing case:

runMachine m1, "ab"

We start off by replacing the “a” with an X.  Then we scan past the “b” until we hit the right side blank.  We bounce off the blank and end up looking at the “b” again, in state 2 as we were in the last example.  This time, we can still act because we have a rule that matches b in state 2.  We write a Y, go left, and change to state 3.  In state 3, we find an X, switch to sate 0 and move right again.  We are back looking at the Y we just wrote.  This time we are fine though, because in state 0, if we find a Y, it could mean we are done, we just need to check that we have all Ys to the right of our current position.  If we scan past all the Ys and reach the right blank again, we go to state 5, which is the acceptance state.  There are no transitions out of state 5, so the machine will halt.  The gist has a few more example strings that should be enough to exercise the entire rule set.

f(x) = 3x

This machine will multiply the input by 3.  The input will be expressed as strings of “1″ characters.   So 1 = “1”, 2 = “11” and so on.  This notation is called unary notation, its not particularly fun for humans, or as efficient as binary notation, but as we will see, it makes life easy on the machine.  We’re optimizing for compact rule sets when we can, and introducing more numerals just makes the number of states and rules grow.

m2 = () ->
 ruleSet = [
  createRule 0, 'B', createResult 1, 'B', 'L' # accept empty string
  createRule 0, '1', createResult 2, 'X', 'R' # replace all 1 with X
  createRule 2, '1', createResult 2, 'X', 'R'
  createRule 2, 'B', createResult 3, 'B', 'L' # Found the right side B, now return to leftside B
  createRule 3, 'X', createResult 3, 'X', 'L'
  createRule 3, '1', createResult 3, '1', 'L'
  createRule 3, 'B', createResult 4, 'B', 'R' # Found the left side B, now look for an X
  createRule 4, '1', createResult 9, '1', 'R' # Found 1 before finding X, check all '1's
  createRule 4, 'X', createResult 5, 'B', 'R' # Erase an X and look for the rightside B
  createRule 5, 'X', createResult 5, 'X', 'R' 
  createRule 5, '1', createResult 5, '1', 'R'
  createRule 5, 'B', createResult 6, '1', 'R' # Write 3 ones
  createRule 6, 'B', createResult 7, '1', 'R'
  createRule 7, 'B', createResult 8, '1', 'R'
  createRule 8, 'B', createResult 3, 'B', 'L' # Repeat search for additional X
  createRule 9, '1', createResult 9, '1', 'R' # Checking string for all '1's 
  createRule 9, 'B', createResult 1, 'B', 'L' # accept 
 ]

createMachine ruleSet, 0, 'B', [ 1 ]

The first thing we do in this algorithm is accept the empty string.  Why?  Because 0 times 3 is 0.  The tape starts with no “1”s written on it, and ends with no “1”s written on it, that’s the expected result.  But its more interesting if there is something written on the tape, so lets look at this example:

runMachine m2, "111" # 3 x 3 = 9

We replace all the 1s with X until we reach the right boundary.  This is how the machine counts.  Once it is done counting, it returns to the left boundary, then backs up a step so it is looking at the leftmost X.  During these operations it has transitioned from state 0 (replace all 1s) to state 2 (find the left most X).  Now that it found the left most X, it erases it by writing a blank, and moves right.  This time around it finds another X, and this means it is not done yet, so it keeps scanning to the right, past any more Xs or 1s, until it finds the right boundary.  When it finds the right side blank, it writes down three ones.  The tape expands to the right to allow the machine to do what it wants, always providing a new right side boundary whenever the machine decides to overwrite the current boundary.  When it is done writing the 1s, it returns to state 3, which causes the machine to scan left until it finds a blank (which is at the site of the first X we erased.)  The machine bounces off the blank and ends up looking at another X, so it erases this and moves to the right end again so that it can write down three more 1s.  It repeats this until it finds that there are no more Xs after scanning to the left boundary.  When it finds a 1 next to the left boundary, it will scan right again, verifying that the tape is filled entirely with 1s, if it scans all 1s on the way to the right boundary, it accepts the string.  We can then look at the tape and see that 9 1s are written on it.

L = { anbncn | n >= 0 }

So we are back to the textbook example of a language that a pushdown automaton cannot accept.  The Turing Machine can accept it, given the right rule set.  My solution involves the double application of the even-length string algorithm, with the erasure semantics of the multiplication algorithm.  It may not be necessary to erase as we go, but it seemed like an obvious way to compute the answer.

m3 = () ->
 ruleSet = [
  createRule 0, 'B', createResult 1, 'B', 'L' # accept empty string
  createRule 0, 'a', createResult 2, 'X', 'R' # replace all a with X
  createRule 2, 'a', createResult 2, 'X', 'R'
  createRule 2, 'b', createResult 3, 'b', 'L' # until we find a 'b'
  createRule 3, 'X', createResult 3, 'X', 'L' # scan left until Blank
  createRule 3, 'Y', createResult 3, 'Y', 'L'
  createRule 3, 'b', createResult 3, 'b', 'L'
  createRule 3, 'c', createResult 3, 'c', 'L'
  createRule 3, 'B', createResult 4, 'B', 'R'
  createRule 4, 'X', createResult 5, 'B', 'R' # Find and erase an X
  createRule 4, 'Y', createResult 7, 'B', 'R' # Find and erase a Y... done with X
  createRule 4, 'Z', createResult 9, 'Z', 'R' # Found Z, check that string is all Z
  createRule 5, 'X', createResult 5, 'X', 'R' # scan right until { c, Y }
  createRule 5, 'b', createResult 5, 'b', 'R'
  createRule 5, 'c', createResult 6, 'c', 'L'
  createRule 5, 'Y', createResult 6, 'Y', 'L'
  createRule 6, 'b', createResult 3, 'Y', 'L' # scan left to b, write Y, repeat X search
  createRule 7, 'Y', createResult 7, 'Y', 'R' # Scan right past {Y, c} until { B, Z }
  createRule 7, 'c', createResult 7, 'c', 'R'
  createRule 7, 'Z', createResult 8, 'Z', 'L'
  createRule 7, 'B', createResult 8, 'B', 'L'
  createRule 8, 'c', createResult 3, 'Z', 'L' # Scan left to c, Write Z, repeat Y search
  createRule 9, 'Z', createResult 9, 'Z', 'R' # scan right past Z until B
  createRule 9, 'B', createResult 1, 'B', 'L' # accept
 ]
 createMachine ruleSet, 0, 'B', [ 1 ]

We start by treating the c characters as a left side boundary, effectively ignoring them.  We don’t care how many cs there are unless we can first prove that there are an equal number of as and bs.  We can prove there are an equal number of as and bs by converting our “a” characters to a “stack” of Xs on the tape. Every time we pop (erase) an X, we should be able to find a b and replace it with a Y.  If we are unable to empty the stack of Xs, then we didn’t have enough bs.  If we are unable to replace all the bs with Ys, then we had too many bs.

If we are able to empty the stack of Xs, and we have no more bs, then we know that |a| == |b|, and we have a stack of Ys on the tape in the same quantity.  So, |a| == |b| == |Y|.  We will also have zero or more cs to the right of the Ys.  We can now use the same technique to start popping Ys, and converting cs to Zs.   If we cant pop all the Ys, that’s a fail.   If we run out of Ys before converting the last c, that’s a fail too.  If we pop the last Y right before converting the last remaining c, then we have a balanced string.  We scan past the Zs one last time, and accept the string once we confirm there are no stray cs.

runMachine m3, ""          # valid n = 0
runMachine m3, "a"         # invalid
runMachine m3, "ab"        # invalid
runMachine m3, "abc"       # valid n = 1

There’s a few examples, there are more in the gist.  I hope you enjoyed reading about my little CoffeeScript machines.  I had quite a bit of fun working on them, especially on the Turing Machine.  For some reason, the Turing Machine gave me the same sense of wonder that I felt when I first started programming.  I think that in working from Turing’s blueprints I felt connected in some corny way to an individual who played a critical role in making what I do every day possible.

I hope you’ll play with the machines and if you see any room for improvement, let me know.

Coding

Pushdown Automaton in CoffeeScript

grammar

I decided to continue my CoffeeScript education by translating another homework assignment from C/C++ to CoffeeScript.  For my first CoffeeScript experiment, I simulated a deterministic finite automaton.  This time the simulated machine will be a pushdown automaton that accepts the language described by the context free grammar pictured to the left.

If you are unfamiliar with reading context free grammars, don’t worry, this one is pretty easy to understand.  The language generated by this grammar is similar to selection statements in C/C++.  In other words, you can use this grammar to correctly generate pairs of “if-else” statements.  Its simplified so we don’t have to worry about the predicate that follows the “if” statements or the body of code to execute when the predicate is true or false.  The only thing we are concerned about is whether every “else” follows an “if” somewhere earlier in the input.  Here’s an example of a correct sequence:

if
  if
  else
else

But since we don’t want to worry about whitespace or newlines either, we’ll feed the input to the machine as a flattened string: “ififelseelse”.

Here’s an example of an incorrect sequence:

if
else
else

It’s easy to see that there are too many “else” tokens.  To tell the difference between the valid sequence and an invalid one, the machine will have to count the number of “if” tokens and make sure there is one for every “else”.  They don’t have to balanced though.  This is also a valid sequence:

if
  if
    if
    else

Finite automata cannot solve this problem because they have no way to keep track of how many “if” tokens were encountered.  But pushdown automata have a stack which we can use to track the “if”s.

For the sticklers out there, I’ll note that the grammar is ambiguous because there is no way to distinguish between the above sequence and one like this:

if
  if
    if
else

In other words, the grammar doesn’t capture the nesting.  Both sequences flatten to “ifififelse”.  This would make this grammar somewhat difficult to use if you actually wanted to express some logic with it.  However, it has no effect on our ability to create pushdown automata that accept “ifififelse” while rejecting “ifififelseelseelseelse”.

Implementation

You can download the source from this gist.

The transition function for deterministic finite automata only need to supply the machine with the next state to move to.  In the previous example, I built the result state right into the rule object.  For pushdown automata, the transition function must supply both a new state, and a set of symbols to add to the machine’s stack.  So, we start with a function that returns an object to hold this information for us:

createResult = (state, symbols) -> {
  resultState: state
  pushSymbols: symbols.reverse()
}

When specifying the stack symbols to push, the textbook way to do so is to write the symbols down in the opposite order they will be pushed.  In other words, in the array [ A, B, C ], “C” will be pushed, followed by “B” and finally “A”.  This function reverses the given symbols so that this convention can be kept when specifying result objects.

Next we have a function to generate rule objects:

createRule = (state, input, symbol, result) -> {
  currentState: state
  inputChar: input
  stackSymbol: symbol
  ruleResult: result
  match: (s, i, y) -> 
    true if (this.currentState == s &&
             this.inputChar == i &&
             this.stackSymbol == y)
}

The rule needs to know the machine’s current state, the character read from the input, and the symbol read from the top of the machine’s stack.  The match function compares its own values with the three values supplied by the machine and returns true when they match.  This time there is only one function to create rules, and no algorithm coded into the example for generating the rule set.  For the DFA, the set of rules was larger, and the logic for automating their generation was fairly obvious, so generating the whole set saved a lot of typing.  But the set of rules for the PDA is smaller, and the logic was not immediately obvious to me, so I didn’t bother.

Next we have a function that returns machine objects.

createMachine = (rules, startState, initialSymbol, finalStates) -> {
  transitionRules: rules
  currentState: startState
  symbolStack: [ initialSymbol ]
  acceptanceStates: finalStates
  update: (r) ->
    this.symbolStack.push x for x in r.pushSymbols
    this.currentState = r.resultState
  read: (c) ->    
    return -1 if this.currentState == -1
    y = this.symbolStack.pop()
    for rule in this.transitionRules
      if rule.match this.currentState, c, y
        return this.update rule.ruleResult
    # no rules matched, enter dead configuration
    this.symbolStack.push y
    this.currentState = -1
  readAll: (s) -> this.read x for x in s.split ''  
  acceptedInput: () -> this.currentState in this.acceptanceStates
}

The machine takes a set of rules, an initial state, the first symbol to push onto the stack and a set of final states.  Like the DFA, it has an acceptedInput method which indicates whether current state of the machine is a member of the acceptanceStates collection.  The readAll convenience function is the same as the it was in the simpler machine.  For the read function, we have the same basic logic, but with a couple of twists.

First, the machine does not require that a rule exists to match every possible input.  If it can’t match the input to a rule, it enters a “dead configuration” at the bottom of the read method.  A guard clause at the beginning of the method checks for the dead configuration and bails out if the machine is already dead.  Next, we remove the top symbol from the stack.  By always removing the symbol, it makes erasure very simple later on, just do nothing and the symbol is erased.  After retrieving the stack symbol, we have the info we need to match the transition rule.  In the DFA example, I used a counted loop to iterate over the collection, but this time I tried a for … in loop.  Not sure what the CoffeeScript lingo for that is, and I’d be interested to see if this loop could be rewritten as a list comprehension.  I think it’s a step up from the counted loop anyway.

I experimented with the CoffeeScript REPL and confirmed that JavaScript does indeed short-circuit out of the loop when it hits the return statement.  It’s a good thing too, since this time I relied on that behavior.  When no rule matches, the short circuit is not hit, the machine restores the stack and enters the dead configuration.  When a rule does match, the machine calls its own update method with the rule result as the argument.  The update method pushes the result symbols onto the stack, then updates the machine state.  While writing this method on coffeescript.org and watching the JavaScript as it generated, I noticed that every method is a function by default.  The result of the last statement is used as the return value if there is no explicit return statement.  I suppose I could simulate a void subroutine by returning null explicitly, but for this example I thought it would be neat to allow the updated state to propagate up the stack and into a collection returned from the readAll method.

Next we have a method that builds and runs the machine for a given test string.

runMachine = (s) ->
  selectStmtRules = [
   createRule 0, 'i', 'S', createResult 3, [ 'A', 'S' ]
   createRule 3, 'f', 'A', createResult 1, [ 'B' ]
   createRule 1, 'i', 'S', createResult 3, [ 'A', 'S' ]
   createRule 1, 'i', 'B', createResult 3, [ 'A', 'B' ]
   createRule 1, 'e', 'B', createResult 2, [ 'C', 'D', 'E' ]
   createRule 2, 'l', 'C', createResult 2, [ ]
   createRule 2, 's', 'D', createResult 2, [ ]
   createRule 2, 'e', 'E', createResult 1, [ ]
  ]

  machine = createMachine selectStmtRules, 0, 'S', [ 1 ]
  history = machine.readAll s
  alert "Accepted: " + machine.acceptedInput() + " States: " + history.join ", "

Here, we build the machine rules necessary to recognize our grammar, then pass the rules to the createMachine function to build the machine.  The grammar, as specified, is easy to translate into transition rules.  For example, the rules S –> iA|iAB and A->fS|f taken together mean that when we start reading we must read an “i” followed by an “f’.  Once we have read the “f” we can quit, or we can read another “if’, or process a “B”.  The B is the entry point into the rules that describe the “else” token.

Translating this into a PDA means that we should start the machine with an S on the stack.  When we start the machine, “S” is the only production rule in the grammar we can use.  The machine starts in state 0, so our first rule says that if we are in state 0, and we read an ‘S’ off the stack, then we can read an “i” from the input and transition.  Since there is only one rule for state 0, it follows that we must read an “i” or the machine will die.  If we do read an “i” then the machine goes to state 3 and pushes an ‘S’ followed by an ‘A’ onto the stack.  Now that the “A” is on the stack, we can only use production rules from the grammar that have “A” on the left side.  Both of these rules start with the letter “f”, so we only need to have one rule in order to require that the next expected character is read.  So, if we are in state 3 and we have an ‘A’ on the stack and read an “f” we transition to state 1, and push a “B” on the stack.  And so on.

The machine accepts the input whenever it reaches state 1.  After it reads the first “i”, “f”, it can stop reading and accept the input.  Once there is a “B” on the stack, the machine can read another “if”, or start reading an “else”.  It reads the else by replacing the “B” with the symbols that represent the grammar rules for the “lse” part of the “else” statement, then erasing them as each required character is read.  When the machine is done reading “else” the “B” is gone—the “if” it represented has been matched with its partner and can’t be used again.  Whenever  the number of “if” and “else” tokens are balanced, “S” will be back on the top of the stack, and so the machine is allowed to generate more “B”s from state 1.

After building the machine, the runMachine function feeds the test string into the readAll method, then lets us know whether the machine accepted the input.  It also shows us the history of the machine’s state changes.

Finally I run some test strings through the machine like this:

runMachine "ifelse"
runMachine "ifelseelse"
runMachine "ififelseelse"
runMachine "ifi"

In these examples, the first and third are correct and others aren’t.  As expected, the machine correctly identifies the acceptable strings.  This time around my implementation was a little cleaner and came in at 55 lines.  Like the DFA this didn’t take long and was easier than the C/C++ implementation.  I think its safe to say that it took me longer to write up this article than it took to write the code.

Coding

Finite State Machine in CoffeeScript

Update: I revised this machine on July 5, 2012.  Visit: Implementing String.Contains for a nicer implementation of this machine, or read on for the full story.

I’m not really a JavaScript guy.  I tend to lean heavily on Google, StackOverflow and trial and error when I need to do some work in the language.  The arrival of CoffeeScript peaked my interest, but since I spend so little time in the JavaScript domain, I had no immediate reason to jump in and give it a try.  Yesterday I had an idea that got me excited giving about CoffeeScript a try.  I decided to take a homework assignment I completed a few weeks ago for the Theory of Computing course I’m taking this semester, and implement it in CoffeeScript.

cs

I spend most of my time in C#, but for this assignment I needed to use C/C++ or Java.  So, initially I implemented the solution in C#, figuring it would be “easy” to translate to C/C++.  I haven’t really done any C/C++ since .NET came out, so it wasn’t as easy as I had hoped it would be.  I spent all of one Sunday flipping back and forth between MonoDevelop and http://www.cplusplus.com/, but it eventually worked out.

Translating to CoffeeScript from the C/C++ implementation went much more smoothly.  The REPL editor and reference material on http://coffeescript.org/ is a great resource.  Here are the results (you can also view/download the whole thing from this gist.

The machine is very simple, it accepts any input that contains the string “operating system”.  Informally, I mentally partitioned the transition function into three sets.  One set of rules advances the machine toward the acceptance state whenever it reads the next expected character.  The next rule set returns the machine to the initial state whenever an unexpected character is encountered.  The final rule traps the machine in the acceptance state after it has read all the characters in the string “operating system”

The first thing I did was to define some functions that would generate these transition rules for me based on an array of expected input characters.

createRule = (index, source) -> {
 initialState: index
 inputChar: source[index]
 resultState: index + 1
 match: (q, c) -> true if (this.initialState == q && this.inputChar == c)
 }

createAltRule = (index, source) -> 
 if index == source.length
  {
    # trap in acceptance state
    initialState: index
    inputChar: null
    resultState: index
    match: (q, c) -> true if (this.initialState == q) 
  }
 else
  {
    # return to start state
    initialState: index
    inputChar: source[index]
    resultState: 0
    match: (q, c) -> true if (this.initialState == q && this.inputChar != c)
  }

Each rule is implemented as an object with some data and a match function. For rules that advance the machine toward acceptance, the match function returns true when the rule’s state and input character match the values passed in by the machine.  For those that move the machine back to the initial state, the match function checks the state and any character other than the expected character.  Finally, the trap rule does not take the input character into account at all, it just checks that the machine has indicated that it has reached the final state.

createMachine = (rules, initialState, finalStates) ->
 {
   currentState: initialState
   transitionRules: rules
   acceptanceStates: finalStates
   read: (c) ->
    for i in [0..this.transitionRules.length - 1]
      if this.transitionRules[i].match this.currentState, c
        return this.currentState = this.transitionRules[i].resultState
   readAll: (s) -> this.read x for x in s.split ''
   acceptedInput: () -> this.currentState in this.acceptanceStates
 }

Next, we have a function that builds the state machine.  This function accepts a set of rules, the start state, and a set of final states.  However, in this example, the machine only has one final state.  When the machine has finished reading the input, clients can use the acceptedInput function to check that the machine’s current state is in the set of final states.  The readAll method takes a string and is provided for convenience, it simply splits the string into a character array and calls the read method an each character in the input.

The read method controls the machine’s state by iterating through each rule until it finds a rule that matches the machine’s current state and input character.  When it finds the match, it assigns the rule’s result state to its own currentState field.  I believe that calling return will short-circuit the loop after when a match is made, but I’m not familiar enough with JavaScript to say for sure.  In any case, it shouldn’t matter because the machine is deterministic and only one rule can match for any given combination of state and input character.

The list comprehension feature of CoffeeScript will also build a collection from the read method return values and this collection should contain each state the machine chose for each computation.  That might be interesting to look at but, it isn’t really necessary for using the machine.

 getMachineResult = (b) -> 
  if b == true 
   return " Accepted. "
  else
   return " Not Accepted. "

source = "operating system".split '' 
rules = (createRule x, source for x in [0..source.length])
altRules = (createAltRule x, source for x in [0..source.length])
rules.concat altRules
finalStates = [ source.length ]

test1 = "Windows is an operating system installed on many machines."
machine = createMachine(rules, 0, finalStates)
machine.readAll test1
alert getMachineResult(machine.acceptedInput()) + test1

test2 = "Welcome to the operating room, the Doctor is just finishing his martini."
machine2 = createMachine(rules, 0, finalStates)
machine2.readAll test2
alert getMachineResult(machine2.acceptedInput()) + test2

In this part, we create one more little function to translate Boolean results into text, and then we use have we have to create and run a couple machines.  As expected “test1” is accepted by the machine, and “test2” is not.

From scratch, translating this machine to CoffeeScript took about 3 hours and resulted in 60 lines of code.  Before this I had zero experience with CoffeeScript.  All in all, I’d say it was fun and easy.