Coding

Use Closures to Put C# on a Diet

I enjoy playing with state machines. They are simple enough to implement quickly, and complex enough to give the implementation language a little workout. Followers of this blog will know that I’ve enjoyed using Finite State Machines to explore CoffeeScript.

But, when I look at the search terms leading to my blog, I see that people want to see FSMs implemented in C#, C++, or even VB. About the only language I haven’t seen people looking for is CoffeeScript. I’m not too surprised, since most people searching for FSM implementations are probably students trying to cheat get help on their homework. I doubt any professors are accepting CoffeeScript implementations.

When I see these search results, I consider quickly posting a solution in the requested language. I don’t mind if students crib solutions from me. I’m a big believer in learning by example. Certainly if by looking at my FSMs I can save one student from writing a switch statement, then I’ll consider it a public service. On the other hand, I’m not really interested in producing solutions by language on demand. Its not so much that I’m worried about students reading my solutions, as much as it is that implanting FSMs over and over again in different languages isn’t very interesting on its own merits.

While I was refactoring the String.Contains FSM for my previous post, I saved a few lines by taking advantage of closure. My inspiration for this refactoring came from a question one of the attendee’s asked at my CoffeeScript Crash Course presentation at SoCalCodeCamp. If you watch the video you will hear him ask, “Does the @ imply closure, and if so why?” As I said in my presentation and on this blog, I’m no expert at CoffeeScript or JavaScript, so I can’t claim to have instantly grasped the context of his question. However, as I listened to it again, and listened to his comments to another attendee, I began to understand why he asked the question. The result was the new implementation I wrote about in my last post.

C# also supports closure, so I began to wonder what a C# String.Contains implementation would look like if we wanted to use as much closure as possible. I’ve implemented this machine in C# and C++ before, but I’m intrigued enough by the closure idea to try C# again.

Getting Started

This sample should be simple, Visual Studio seems like overkill, so I’m going to code it in Sublime Text 2 and use the C# compiler on the command line.

csc .\StringCompare.cs 

If StringCompare.cs contains this skeleton program, it should compile to a do-nothing program:

public class Program
{
    public static void Main(params string[] args)
    {
    }
}

Rules

In our CoffeeScript implementation we have a class called Rule with two subclasses. Lets see how this looks in C#:

public class Rule
{
    public Rule(int currentState, char inputChar, int nextState)
    {
        this.NextState = nextState;
        this.Match = (machineState, readChar) =>
        {
            return currentState == machineState && readChar == inputChar; 
        };
    }

    public int NextState { get; private set; }
    public Func<int, char, bool> Match { get; private set; }
}

Remember, our goal is to use as much closure as possible, so if we can avoid declaring data properties, we will. All of our data comes in through the constructor, so we can create a lambda expression that closes over the currentState and inputChar and use it as our Match function. We still need to declare the Match function and its signature, but there is no need to create instance properties for the data. For NextState we could define a Func<int>, but that would have no benefit compared to a simple int property, so we’ll use a property.

That’s all fine in theory, but does it work? Let’s write a little test in the Main function.

public class Program
{
    public static void Main(params string[] args)
    {
        var myRule = new Rule(0, '1', 1);
        Console.WriteLine(myRule.Match(0,'1'));
        Console.WriteLine(myRule.Match(1,'1'));
    }
}

As expected, this test prints True followed by False. It’s clear from this test that closure works just as well in constructors as anywhere else. Having satisfied my intuition, I can delete this test code If I want to and move on to implementing the specialized TrapRule and ReturnRule classes.

public class TrapRule : Rule
{
    public TrapRule(int currentState)
        : base(currentState, (char)0, currentState)
    {
        this.Match = (machineState, readChar) =>
        {
            return currentState == machineState;
        };
    }
}

Like the CoffeeScript implementation, I pass currentState to the base implementation twice. However, I can’t pass null for a char value in C#. Instead, I pass in the null character 0. Its also interesting to note that I don’t need to make Match virtual, and I don’t need to override it. All I need to do is make the setter protected so that the subclass can replace the implementation. We can update the test in Main to verify that this subclass works.

public class Program
{
    public static void Main(params string[] args)
    {
        var myRule = new TrapRule(0);
        Console.WriteLine(myRule.Match(0,'1'));
        Console.WriteLine(myRule.Match(1,'1'));
    }
}

As expected, we see True printed, followed by False. Next, we can implement the ReturnRule.

public class ReturnRule : Rule
{
    public ReturnRule(int currentState, char inputChar)
        : base(currentState, inputChar, 0)
    {
        this.Match = (machineState, readChar) =>
        {
            return currentState == machineState && readChar != inputChar;
        };
    }
}

We actually don’t need to pass currentState and inputChar to the base class, because it only needs to know the nextState. However, it doesn’t hurt to pass it either. Once again, the child class replaces the Match implementation with one appropriate for the return rule. As before, we can update our test to verify the ReturnRule behavior.

public static void Main(params string[] args)
{
    var myRule = new ReturnRule(0, '0');
    Console.WriteLine(myRule.Match(0,'0'));
    Console.WriteLine(myRule.Match(0,'1'));
}

As expected, this prints False followed by True. In theory, we’ve implemented Rule classes that should work with a String.Contains implementation. We need to see how this works with a machine implementation.

Machine

In the CoffeeScript implementation, I saw no benefit in converting the createMachine function into a class. In C#, I thought I might be able to do something similar with dynamic. It turns out that this doesn’t work though. The C# compiler produces error CS0828 because you cannot assign a lambda expression to an anonymous type property. I don’t use dynamic very often, so this is news to me. It seems I’m already learning more about C# through this exercise.

So, with this restriction in place, I will need to create a class to implement the machine.

public class Machine 
{
    public Machine(Rule[] transitionRules, int currentState, int[] acceptanceStates)
    {
        this.ReadAll = (input) =>
        {
            foreach(var inputChar in input)
            {
                Read(inputChar);
            }
        };
        this.Read = (inputChar) => 
        {
            foreach(var rule in transitionRules) 
            {
                if (rule.Match(currentState, inputChar))
                {
                    currentState = rule.NextState;
                    return;
                }
            }
            currentState = -1;
        };
        this.AcceptedInput = () => Array.IndexOf(acceptanceStates, currentState) >= 0;
    }

    public Action<string> ReadAll { get; private set; }
    public Action<char> Read { get; private set; }
    public Func<bool> AcceptedInput { get; private set; }
}

This implementation compiles, but does it work? We’ll need some test code to determine for sure. To do that we’ll need to implement createSearchMachine and testMachine. These implementations should be possible using Func<> instances in Main without having to create classes.

Here is createSearchMachine:

Func<string, Machine> createSearchMachine = (searchString) =>
{
    var source = searchString.ToCharArray();
    var states = Enumerable.Range(0, source.Length);
    var rules = states.Select(x => new Rule(x, source[x], x + 1))
        .Concat(states.Where(x => x != source.Length).Select(x => new ReturnRule(x, source[x])))
        .Concat(new []{ new TrapRule(source.Length) });
    return new Machine(rules.ToArray(), 0, new[]{ source.Length });
};

It’s interesting that, while slightly more verbose and noisy, C# supports a nearly identical implementation as CoffeeScript. Here is testMachine:

Action<string, string> testMachine = (input, searchString) =>
{
    var machine = createSearchMachine(searchString);
    machine.ReadAll(input);
    Console.WriteLine("{0}--{1}", machine.AcceptedInput(), input);
};

Now we can finally run the machine and see how it works.

var target = "operating system";
testMachine("Windows is an operating system installed on many machines.", target);
testMachine("Welcome to the operating room, the Doctor is just finishing his martini.", target);

And our output is just what we expect:

C:\...\gist-3063755> .\StringContains.exe 
True--Windows is an operating system installed on many machines. 
False--Welcome to the operating room, the Doctor is just finishing his martini. 

Great, Closure All The Things!

No, not really.  In CoffeeScript/JavaScript closures are the norm and expected.  In C# the question you have to ask yourself is why would you close over everything.  Keep in mind that, although you do not declare classes with data members, they still exist.  The compiler generates them for you, so its not like you’ve “saved” anything by using closure.  Maybe closures make your code harder to reflect into, since the compiler will use unspeakable names to refer to the classes it generates.  This obscurity might appeal to some, but for me it’s simply a learning exercise.  I hope you learned something too.

You can check out the finished code at this gist.

Advertisements
Coding

Implementing String.Contains

After writing up the Finite State Machine from SoCalCodeCamp, I had CoffeeScript on the brain and decided to go back and look at my first CoffeeScript experiment to see if there was any room for improvement.  Actually, I knew there was room for improvement.  One of the times when I was stuck in my presentation I flipped over to this machine and I’m pretty sure I blanched visibly and the counted loop in the “read” function.

So lets revisit my first CoffeeScript state machine.  This machine determines if a specified substring exists within a string. In other words, it implements bool String.Contains(String).  Besides all the room for improvement, I think there is an opportunity for some meaningful use of CoffeeScript classes.

First we’ll warm up by adding a shim for alert.

alert = (value) -> console.log value 

Now we can get started with our conversion, we’ll start from the bottom up this time.

Remove Test Duplication

Here is our current test code.

test1 = "Windows is an operating system installed on many machines."
test2 = "Welcome to the operating room, the Doctor is just finishing his martini."

getMachineResult = (b) -> 
  result = ""
  if b == true 
   result = " Accepted. "
  else
   result = " Not Accepted. "
  result

machine = createMachine(rules, 0, finalStates)
machine.readAll test1
alert getMachineResult(machine.acceptedInput()) + test1

machine2 = createMachine(rules, 0, finalStates)
machine2.readAll test2
alert getMachineResult(machine2.acceptedInput()) + test2

Lets remove the duplication and the weirdo getMachineResult method.

testMachine = (input) ->
  machine = createMachine(rules, 0, finalStates)
  machine.readAll input  
  alert [machine.acceptedInput(), input].join '--'

testMachine "Windows is an operating system installed on many machines."
testMachine "Welcome to the operating room, the Doctor is just finishing his martini."

Encapsulate Machine Creation

Our testMachine function currently closes over rules and finalStates from the CoffeeScript “module” scope. We also have these five lines which represent the logic necessary for getting ready to create our machine.

source = "operating system".split '' 
rules = (createRule x, source for x in [0..source.length])
altRules = (createAltRule x, source for x in [0..source.length])
rules.concat altRules
finalStates = [ source.length ]

And the machine is actually created in testMachine

machine = createMachine(rules, 0, finalStates)

Lets encapsulate all this logic into one function, get it out of the module scope, and generalize it so that it can work with any search string.

createSearchMachine = (searchString) ->
  source = searchString.split '' 
  states = [0..source.length]
  rules = (createRule x, source for x in states)
    .concat (createAltRule x, source for x in states)
  createMachine rules, 0, [ source.length ]

testMachine = (input, searchString) ->
  machine = createSearchMachine searchString
  # ...

I did a little experimenting with this one and I’m pretty pleased with the results.  Because the character array indexes actually represent the states the search machine can take, we cant just iterate over the characters.  I think the new version makes it clear that the use of the index range is not a mistake.  I also decided to see if I could chain the call to concat right off of the list comprehension and that worked. Then the line was really long, so i decided to see if significant whitespace would prevent me from continuing the statement on the next line, but it worked just fine. Note that the () around the list comprehension are necessary both times. The parenthesis aren’t part of calling concat, they are part of the list comprehension.

I saved another line by moving finalStates inline, then updated the signature and call site in testMachine.

Creating Machines

Here is our current createMachine implementation. I don’t see any advantage to making it into a class, so I’ll just clean it up.

createMachine = (rules, initialState, finalStates) ->
 {
   currentState: initialState
   transitionRules: rules
   acceptanceStates: finalStates
   read: (c) ->
    for i in [0..this.transitionRules.length - 1]
      if this.transitionRules[i].match this.currentState, c
        return this.currentState = this.transitionRules[i].resultState
   readAll: (s) -> this.read x for x in s.split ''
   acceptedInput: () -> this.currentState in this.acceptanceStates
 }

I got rid of the object properties and captured with closure instead. I also converted the counted loop into a cleaner for..in loop. Finally, I picked some better variable names.

createMachine = (transitionRules, currentState, acceptanceStates) ->
  readAll: (input) -> this.read inputChar for inputChar in input.split ''
  read: (inputChar) ->
    for rule in transitionRules
      if rule.match currentState, inputChar
        return currentState = rule.resultState  
  acceptedInput: -> currentState in acceptanceStates

After testing I see that this still works as expected.

Creating Rules

Our rule creation is really spread out and contains a bit of duplication:

createRule = (index, source) -> {
 initialState: index
 inputChar: source[index]
 resultState: index + 1
 match: (q, c) -> true if (this.initialState == q && this.inputChar == c)
 }

createAltRule = (index, source) -> 
 if index == source.length
  {
    # trap in acceptance state
    initialState: index
    inputChar: null
    resultState: index
    match: (q, c) -> true if (this.initialState == q) 
  }
 else
  {
    # return to start state
    initialState: index
    inputChar: source[index]
    resultState: 0
    match: (q, c) -> true if (this.initialState == q && this.inputChar != c)
  }

Primarily these rules only vary by their method for evaluating match. Our POR (plain old rule) takes an index (which in this case represents a state) and a source string (of which we are only interested in one character) then computes the next state. The match function is true if the passed in state and input character match the stored values. This method can definitely be cleaned up.

createRule = (CurrentState, InputChar, NextState) ->
  getNextState: -> NextState
  match: (machineState, inputChar) ->
    machineState is CurrentState and inputChar is InputChar

With createRule defined this way, we need to make some changes to createSearchMachine so that it passes the needed data, and to createMachine so that it calls getNextState. If the machine is going to query getNextState instead of accessing a property, then the alternate rules will need functions with the same name and semantics.

createAltRule = (index, source) -> 
 if index == source.length
  {
    # trap in acceptance state
    initialState: index
    inputChar: null
    resultState: index
    getNextState: -> @resultState
    match: (q, c) -> true if (this.initialState == q) 
  }
 else
  {
    # return to start state
    initialState: index
    inputChar: source[index]
    resultState: 0
    getNextState: -> @resultState
    match: (q, c) -> true if (this.initialState == q && this.inputChar != c)
  }

So I’m beginning to sense a relationship between the POR and the alternate rules. First, I test the new code and once again find out that it works.  Lets convert createRule into a class, then use the createRule method as a shim to call the constructor.

class Rule
  constructor: (@CurrentState, @InputChar, @NextState) ->
  getNextState: => @NextState
  match: (machineState, inputChar) =>
    machineState is @CurrentState and inputChar is @InputChar

createRule = (CurrentState, InputChar, NextState) ->
  new Rule CurrentState, InputChar, NextState

The purpose of the shim is just to ensure that we don’t need to update all the call sites while we are refactoring the rules. With this new code in place, I test and everything still works. Now we can look at extending Rule to take over parts of createAltRule. Lets look at the top half of createAltRule first.

createAltRule = (index, source) -> 
 if index == source.length
  {
    # trap in acceptance state
    initialState: index
    inputChar: null
    resultState: index
    getNextState: -> @resultState
    match: (q, c) -> true if (this.initialState == q) 
  }

The purpose of this branch is to define a rule that traps the machine in a certain state. In this case, it will be the acceptance state. If we count the number of characters in the search string, we find a number that represents the machine’s acceptance state.  If we manage to reach that acceptance state, we want to stay in it.  Once we have determined that the source string contains the search string, the machine should accept the input, no matter what the rest of the string says.  The trap state implementation just compares the machine state to the acceptance state, and returns the acceptance state when the machine calls getNextState.

Here is how we can extend Rule to support this.

class TrapRule extends Rule
  constructor: (@CurrentState) ->  
    super @CurrentState, null, @CurrentState  
  match: (machineState, inputChar) =>
    machineState is @CurrentState

The TrapRule class extends Rule by replacing the match implementation with the same logic comparison implemented in the old code. I use super to call the Rule constructor with the @CurrentState value as both the current and next state. I pass null as the character to make it explicit that the input character is not used.

Now we can update the createAltRule function.

createAltRule = (index, source) -> 
 if index == source.length
    new TrapRule index
 else
  {
    # return to start state
    initialState: index
    inputChar: source[index]
    resultState: 0
    getNextState: -> @resultState
    match: (q, c) -> true if (this.initialState == q && this.inputChar != c)
  }

The bottom half of createAltRule describes a “return rule”. This rule handles any scenario where the machine reads a character that is not in the search string, and also any scenario where the character is in the search string, but not in the correct position. For example, both of our test strings start with the character W. The return rule would apply because W does not appear in operating system. As the machine reads along it will encounter o at position 5. Finding this character, it will advance from state 0 to state 1. The next character it wants to read is p. But in either test string, p is not the character at position 6. So, the return rule will apply and return the machine to state 0. Without the return rules, the machine could only accept strings that start with the search string, but we want a machine that accept strings which contain the search string. So, the match function looks at the input character passed in by the machine, and matches when the character does not match the stored character, and getNextState always returns 0.

Here is a ReturnRule class that implements the same logic.

class ReturnRule extends Rule
  constructor: (@CurrentState, @InputChar) ->
    super @CurrentState, @InputChar, 0
  match: (machineState, inputChar) =>
    machineState is @CurrentState and inputChar isnt @InputChar

And we can reduce our createAltRule method to this:

createAltRule = (index, source) -> 
 if index is source.length
    new TrapRule index
 else
    new ReturnRule index, source[index]

And this works too. Now that we have cleaned up our rules we can see that our shim is only called in one place, and does very little, so we can inline the Rule constructor and delete createRule.

createSearchMachine = (searchString) ->
  source = searchString.split '' 
  states = [0..source.length]
  rules = (new Rule x, source[x], x + 1 for x in states)
    .concat (createAltRule x, source for x in states)
  createMachine rules, 0, [ source.length ]

Lets try to get rid of the call to createAltRule with better list comprehension.

createSearchMachine = (searchString) ->
  source = searchString.split '' 
  states = [0..source.length]
  rules = (new Rule x, source[x], x + 1 for x in states)
    .concat (new ReturnRule x, source[x] for x in states when x isnt source.length)
  createMachine rules, 0, [ source.length ]

As an intermediate step, I simply filtered out the state that would have led to creating the trap rule and replaced the call to createAltRule with a direct instantiation of ReturnRule. I expect this to fail, but it doesn’t! Turns our our machine has a bug in it, since I designed it “knowing” that every possible state/character combination would have a rule, I did not handle the case in read where no rules matched. Just to be on the safe side, I’ll update read by moving the current state to -1 when no rules match.  Although it’s conventional to use non-negative integer states when simulating state machines, the truth is that you can use any symbol, positive, negative, or Celtic runes, it’s up to you.  Here is the updated Read implementation:

  read: (inputChar) ->
    for rule in transitionRules
      if rule.match currentState, inputChar
        return currentState = rule.getNextState()  
    currentState = -1

Now, the machine fails as expected. We can repair it by creating the TrapRule and pushing it into the rules collection.

createSearchMachine = (searchString) ->
  source = searchString.split '' 
  states = [0..source.length]
  rules = (new Rule x, source[x], x + 1 for x in states)
    .concat (new ReturnRule x, source[x] for x in states when x isnt source.length)
  rules.push new TrapRule source.length
  createMachine rules, 0, [ source.length ]

This works so we can get rid of the createAltRule function.  I made all these changes to the existing gist and I committed a few intermediate steps along the way.  Enjoy.

Coding

Turing Machine in CoffeeScript

langsIf you’ve ever taken a formal languages class and you’ve been following along with my CoffeeScript experiments, then you probably already guessed that this post was coming.  Well, here it is, a CoffeeScript implementation of the Standard Turing Machine.

This is the last CoffeeScript post that I have on my stack.  Not that I won’t ever write about CoffeeScript again, but I don’t have any immediate ideas about what to write next, while I have plenty of ideas about other subjects.  For me, this post is very different than my last two CoffeeScript posts, because this time I am not translating from C/C++.  We haven’t had a homework assignment to implement a Turing Machine, so I don’t have a reference implementation to work from.  This machine was implemented in CoffeeScript from the start, but lessons learned from implementing the deterministic finite automaton and the pushdown automaton certainly gave me a head start.

Node.js

This implementation is also a little different because I finished the experiment using node.js.  I started out on coffeescript.org, but as the number of machines and test cases grew, I got tired of clicking OK on each alert.  Getting started with node.js and CoffeeScript was really easy.   I simply downloaded the windows installer from http://nodejs.org and ran the msi.  After installation, I started a command interpreter and typed “node” at the prompt.  On one machine, node came right up, but on two others I needed to reboot for node to show up in the path.  Maybe I just rebooted the first machine and forgot, but if you are having trouble, try a reboot.  I followed the instructions on http://coffeescript.org to install the coffee-script module using the node package manager.  After that, I was able to run scripts locally on my machine.

Turing Machine

The graphic at the top of this entry describes two languages and one function, each of which can be implemented using a Turing Machine.  I’m not qualified to assert that only a Turing Machine could accept the languages.  I’m pretty sure that the language of even-length strings could be accepted by a finite state machine, which would mean it could also be accepted by pushdown automata.  Certainly it seems likely that the function could not be computed by the more primitive machines, since they have no way to write the computation result.  The second language violates the pumping lemma for context free languages (actually violates the pumping lemma for regular languages too), so we know from an “academic” perspective that neither finite or pushdown automata could accept it.  But I spent a few minutes trying to figure out why a pushdown would fail to accept this language.

If we tried to accept L={a^nb^nc^n:n>=0} with a pushdown, we could start by counting the ‘a’ characters.  We could count them by pushing some symbol onto the pushdown’s stack, lets use X as that symbol.  So we would push some number of X onto the stack, until we encountered the first “b” character.  We would then want to remove an X from the stack, to account for the fact that the language requires the same number of as and bs.  But this wont work.  We still need to know how many as or bs there were so we can use that knowledge to verify the correct number of “c” characters.  So we should put that X back on the stack.  But if we put the X on the stack and keep reading, then we have no way of accounting for the fact that we just read a “b”.  So we are stuck.  This thought experiment might not be academically exhaustive, but it gives me enough intuitive comfort to agree with the textbook that this language cannot be accepted by a PDA.

blockquoteThe Turing Machine moves past this limitation by getting rid of the stack and replacing it with a “tape”.  You can think of a tape as a combination of a list, a stack, and a queue.  CoffeeScript arrays are well suited for emulating Turing’s tape.  You can iterate over the array or access elements at random, which allows the machine to move back and forth over it’s tape.  You have the unshift function, which lets the tape extend infinitely to the left, and the push function, which allows the machine to extend infinitely to the right.  Of course, we are bounded by the real word constraint of needing to store the array’s index, which limits us to 32-bit array sizes if I recall correctly, but for the purposes of this experiment, that boundary is irrelevant.

The tape is presented to the machine with input data already written on it and bounded on the left and right by “blank” symbols.  But, the machine is allowed to update the tape if it wants to, which sets it apart from state machines and pushdowns.  Also, the machine is allowed to move back and forth over the tape, whereas simpler machines are only allowed to read it from beginning to end.  These two differences allow the Turing Machine to keep track of all the data needed to recognize the language of equal numbers of as, bs and cs.  After working through a couple examples, I’ll finish by showing my solution for recognizing this language.  Be forewarned, this post is really long.

Implementation

Here is the gist.

As I mentioned, this implementation started in the CoffeeScript REPL, and ended on node.js.  The REPL runs as a browser and you can use the alert method to show results in message boxes, while node.js runs with a console attached and has no alert method.  The first little problem after migrating was getting output from the alerts into the console.  So we start with a little shim:

alert = (s) -> console.log s

To run the code in the REPL, just remove or comment this line.  With that out of the way, we start by defining the result object for the Turing Machine’s transition function:

createResult = (state, symbol, direction) ->
 nextState: state
 writeSymbol: symbol
 moveTo: direction
 toString: () -> [this.nextState, this.writeSymbol, this.moveTo].join ", "

When a the machine finds a rule that matches, it will need to update its state, write something to the tape and move left or right.  This object delivers that data to the machine, and includes an override of the toString method to facilitate debugging.

createRule = (state, symbol, result) ->
 inState: state
 readSymbol: symbol
 transitionResult: result
 match: (s, y) -> true if (this.inState == s && this.readSymbol == y)
 toString: () -> [this.inState, this.readSymbol, this.transitionResult].join ", "

The Turing Machine rules must match the machine’s current state and the symbol read off the tape.  The match function checks the member data against the data passed in by the machine and returns true when they are the same.  Again, toString is overridden.

createTape = (s, blank) ->
 t = [ blank ].concat s.split ''
 t.push blank
 pos = 0
 pos++ while t[pos] == blank
 pos-- while t[pos] == undefined
 return {
  currentPosition: pos
  buffer: t
  currentSym: () -> this.buffer[this.currentPosition]
  write: (s) -> this.buffer[this.currentPosition] = s
  moveRight: () ->
   p = this.currentPosition
   p++
   this.buffer.push blank if p == this.buffer.length
   this.currentPosition = p
   return null
  moveLeft: () ->
   p = this.currentPosition
   p--
   if p < 0
    this.buffer.unshift blank
    p = 0
   this.currentPosition = p
   return null
  move: (d) ->
   this.moveRight() if d == 'R'
   this.moveLeft() if d == 'L'
   return null
  toString: () -> (this.buffer.join ", ") + " pos: " + this.currentPosition
  }

Next, I’ve defined the behavior of the tape.  Although CoffeeScript arrays are well suited for use as Turing tapes, they need a little help in order to make the behavior more transparent to the machine.  We construct a tape object by providing two pieces of data, the first is a string loaded with the input that we want to pass to the machine.  The second piece of data is the symbol used to indicate to the machine that a cell is empty.  This can be whatever you want it to be, as long as it doesn’t interfere with the symbols used in the machine’s computations.  I found it very convenient to use a ‘B’ rather than null or undefined or some other option.  In C/C++ I would not hesitate to use (char)0 since this is still just a regular character.

Anyway, when we provide this data to the constructor function the function allocates a new array and puts the blank symbol in position 0.  Then, it splits the string into an array of characters and adds that to the buffer.  Finally, it adds another blank to the end of the buffer.  The conventional way to reason about Turing Machines starts the machine with the tape advanced to the leftmost non-blank character.  To support this convention, the tape constructor scans right until it finds a non-blank, or hits an undefined cell (which happens if the empty string “” is passed to the method).  If the constructor does hit the undefined area, it scans left again until it finds the initialized portion of the buffer.  Now, it is ready to construct the object to return by assigning the index and the buffer to a new object.  The methods on the tape object support the methods defined for Turing’s tape, currentSym reads the symbol at the index, write will update the buffer at the index.  The moveRight method will increase the index, and if the new index exceeds the length of the buffer, a new blank is added to the right side of the tape.  Likewise, the moveLeft method will decrease the index, and if the index becomes negative, the buffer is padded on the left side with a blank, and the index is moved back to 0 so that it points at the newly added blank.  The move method decides which of these two methods to use based on the value passed in by the machine, and finally the toString method will return the current contents of the tape along with the position of the read/write head.

Finally we come to the machine:

createMachine = (rules, initialState, blank, finalStates) ->
 transitionRules: rules
 currentState: initialState
 blankSymbol: blank
 acceptanceStates: finalStates
 tape: createTape '', blank
 isHalted: false
 toString: () -> 
  r = this.transitionRules.join "|| "
  f = this.acceptanceStates.join ", "
  [ r, this.currentState, this.blankSymbol, f, this.tape].join ":: "
 acceptedInput: () -> this.currentState in this.acceptanceStates
 loadTape: (s) -> this.tape = createTape s, this.blankSymbol
 readTape: () -> this.tape
 update: (r) ->
  this.currentState = r.nextState
  this.tape.write r.writeSymbol
  this.tape.move r.moveTo
  return null
 read: () ->
  q = this.currentState
  s = this.tape.currentSym()
  for rule in this.transitionRules
   if rule.match q, s
    return this.update rule.transitionResult
  return this.isHalted = true
 run: (s) ->
  this.loadTape s
  this.read() until this.isHalted
  return null

Like most of my machines, it takes a set of rules, the initial state and the set of acceptance states.  The Turing Machine also takes the symbol to use for blanks.  Like I mentioned above, a ‘B’ worked out just fine for me.  The tape is initialized with the empty string, and we also have a flag to indicate that the machine has halted.  Previously we simply read all the characters from the input and halted at the end, even if we entered a dead configuration before reading all the characters.  Since the Turing Machine can move back and forth, reaching the end of the input is not automatically a good place to stop processing, so this flag allows the machine to signal its done.  Like the other machines we have an acceptedInput method which allows the machine to indicate if it stopped in an acceptance state.  New to the Turing Machine is the loadTape method, which takes a string initializes a new tape instance, and a readTape method which allows us to check the contents of the tape.  We need this for machines that compute functions, because we will need to know more than whether the machine accepted the input, we want to look at the tape to see the results of the computation.

The update method applies the changes to the machine and the tape supplied by a rule result when the machine finds a matching rule.  On several of these methods I went ahead and returned null explicitly because I found it kind of annoying that CoffeeScript generated returns for methods where I don’t intend the results to convey useful information.  I could have reordered the instructions in the update method so that the assignment to current state occurred last, as I did with the pushdown automaton.  However, in methods like the run method, CoffeeScript allocates an array and generates a collection as the loop implied by the “until” expression is processed.  But it only does this because it thinks you want to automatically return the results from the last expression in the method.   By returning null, that loop is no longer the last statement, and CoffeeScript doesn’t bother generating the array.  Does this really matter?  I’m no expert, I just thought the extra array was kind of annoying.

I’ll wrap up the description of the machine functions by saying that the read method reads the current tape symbol and tries to find a matching rule.  If it can’t find a match, it flips the isHalted flag.  The run method loads the tape with the user input, then calls the read method until the halt flag is set.

I mentioned that we didn’t have a homework assignment to implement a Turing Machine, which is not to say that we did not have an assignment to describe a few machines.  It was my intention from the get-go to use this machine to work out the answers to my homework, so I create a method that would facilitate running different machine configurations with different test strings.

runMachine = (f, i) ->
 m = f()
 m.run i
 alert "input: " + i + " accepted: " + m.acceptedInput() + " tape: " + m.readTape()

This method expects a factory method that produces a fully configured machine, runs it, then alert/prints the results.

Now we are ready to build some machines.

Example

The first machine was a simple machine from our class lecture notes.  This factory method produces it by creating the necessary rule set, then returning the results from createMachine with appropriate parameters.

# Example from lecture
exampleMachine = () -> 
 exampleRuleSet = [ 
  createRule 0, 'a', createResult 0, 'b', 'R'
  createRule 0, 'b', createResult 0, 'b', 'R'
  createRule 0, 'B', createResult 1, 'B', 'L'
 ]
 createMachine exampleRuleSet, 0, 'B', [ 1 ]

This machine reads a tape full of as and bs, and converts all the as into bs.  It accepts the input if it manages to reach the rightmost blank.  We can test it with these lines.

runMachine exampleMachine, "aaa"
runMachine exampleMachine, "aba"
runMachine exampleMachine, "bbb"
runMachine exampleMachine, "caa"

With the three simple rules, the machine is able to convert the first three strings to all bs, but can’t do anything with the fourth since it has no rules for dealing with cs.

L = { w : |w| is even }

Our next machine accepts even length strings composed of as and bs.

m1 = () ->
 ruleSet = [
   createRule 0, 'B', createResult 5, 'B', 'L' # accept the empty string
   createRule 0, 'a', createResult 1, 'X', 'R' # replace the first a or b with an X
   createRule 0, 'b', createResult 1, 'X', 'R'
   createRule 0, 'Y', createResult 4, 'Y', 'R' # possibly located the tail
   createRule 1, 'a', createResult 1, 'a', 'R' # scan right until {B, Y}
   createRule 1, 'b', createResult 1, 'b', 'R'
   createRule 1, 'B', createResult 2, 'B', 'L'
   createRule 1, 'Y', createResult 2, 'Y', 'L' 
   createRule 2, 'a', createResult 3, 'Y', 'L' # replace first a or b with Y
   createRule 2, 'b', createResult 3, 'Y', 'L'   
   createRule 3, 'a', createResult 3, 'a', 'L' # scan left until X
   createRule 3, 'b', createResult 3, 'b', 'L'
   createRule 3, 'X', createResult 0, 'X', 'R'   
   createRule 4, 'Y', createResult 4, 'Y', 'R' # tail should be all Y
   createRule 4, 'B', createResult 5, 'B', 'L' # accept
 ]
 createMachine ruleSet, 0, 'B', [ 5 ]

We replace valid characters on the left side with X symbols.  Every time we write an X, we move to the right side and replace the rightmost a or b with Y, then we repeat.  If the string is even length, the X and Y string should meet in the middle.  The machine states will let us know if we have found the middle.  Lets look at a failing case:

runMachine m1, "a"

When the machine starts, the tape is positioned at this “a”, and the machine is in state 0.  The second rule in the list will match.  The machine will move to state 1, it will overwrite the “a” with an “X”, and it will move the tape to the right.  Now it is in state 1 and looking at a blank.  The matching rule will move the machine to state 2 and move the tape to the left.  Now the tape is positioned over the “X” again, and it is in state 2.  However, there is no rule that explains what to do with an X in state 2.  If the string had an even length, then there would be an a or b under the head right now, and the machine would replace that symbol with a Y.  But it found an X before finding an a or b to replace, therefore the string length must be odd.  The machine halts in state 2.  State 2 is not in the set of acceptance states, so the string is rejected.

Now consider an passing case:

runMachine m1, "ab"

We start off by replacing the “a” with an X.  Then we scan past the “b” until we hit the right side blank.  We bounce off the blank and end up looking at the “b” again, in state 2 as we were in the last example.  This time, we can still act because we have a rule that matches b in state 2.  We write a Y, go left, and change to state 3.  In state 3, we find an X, switch to sate 0 and move right again.  We are back looking at the Y we just wrote.  This time we are fine though, because in state 0, if we find a Y, it could mean we are done, we just need to check that we have all Ys to the right of our current position.  If we scan past all the Ys and reach the right blank again, we go to state 5, which is the acceptance state.  There are no transitions out of state 5, so the machine will halt.  The gist has a few more example strings that should be enough to exercise the entire rule set.

f(x) = 3x

This machine will multiply the input by 3.  The input will be expressed as strings of “1″ characters.   So 1 = “1”, 2 = “11” and so on.  This notation is called unary notation, its not particularly fun for humans, or as efficient as binary notation, but as we will see, it makes life easy on the machine.  We’re optimizing for compact rule sets when we can, and introducing more numerals just makes the number of states and rules grow.

m2 = () ->
 ruleSet = [
  createRule 0, 'B', createResult 1, 'B', 'L' # accept empty string
  createRule 0, '1', createResult 2, 'X', 'R' # replace all 1 with X
  createRule 2, '1', createResult 2, 'X', 'R'
  createRule 2, 'B', createResult 3, 'B', 'L' # Found the right side B, now return to leftside B
  createRule 3, 'X', createResult 3, 'X', 'L'
  createRule 3, '1', createResult 3, '1', 'L'
  createRule 3, 'B', createResult 4, 'B', 'R' # Found the left side B, now look for an X
  createRule 4, '1', createResult 9, '1', 'R' # Found 1 before finding X, check all '1's
  createRule 4, 'X', createResult 5, 'B', 'R' # Erase an X and look for the rightside B
  createRule 5, 'X', createResult 5, 'X', 'R' 
  createRule 5, '1', createResult 5, '1', 'R'
  createRule 5, 'B', createResult 6, '1', 'R' # Write 3 ones
  createRule 6, 'B', createResult 7, '1', 'R'
  createRule 7, 'B', createResult 8, '1', 'R'
  createRule 8, 'B', createResult 3, 'B', 'L' # Repeat search for additional X
  createRule 9, '1', createResult 9, '1', 'R' # Checking string for all '1's 
  createRule 9, 'B', createResult 1, 'B', 'L' # accept 
 ]

createMachine ruleSet, 0, 'B', [ 1 ]

The first thing we do in this algorithm is accept the empty string.  Why?  Because 0 times 3 is 0.  The tape starts with no “1”s written on it, and ends with no “1”s written on it, that’s the expected result.  But its more interesting if there is something written on the tape, so lets look at this example:

runMachine m2, "111" # 3 x 3 = 9

We replace all the 1s with X until we reach the right boundary.  This is how the machine counts.  Once it is done counting, it returns to the left boundary, then backs up a step so it is looking at the leftmost X.  During these operations it has transitioned from state 0 (replace all 1s) to state 2 (find the left most X).  Now that it found the left most X, it erases it by writing a blank, and moves right.  This time around it finds another X, and this means it is not done yet, so it keeps scanning to the right, past any more Xs or 1s, until it finds the right boundary.  When it finds the right side blank, it writes down three ones.  The tape expands to the right to allow the machine to do what it wants, always providing a new right side boundary whenever the machine decides to overwrite the current boundary.  When it is done writing the 1s, it returns to state 3, which causes the machine to scan left until it finds a blank (which is at the site of the first X we erased.)  The machine bounces off the blank and ends up looking at another X, so it erases this and moves to the right end again so that it can write down three more 1s.  It repeats this until it finds that there are no more Xs after scanning to the left boundary.  When it finds a 1 next to the left boundary, it will scan right again, verifying that the tape is filled entirely with 1s, if it scans all 1s on the way to the right boundary, it accepts the string.  We can then look at the tape and see that 9 1s are written on it.

L = { anbncn | n >= 0 }

So we are back to the textbook example of a language that a pushdown automaton cannot accept.  The Turing Machine can accept it, given the right rule set.  My solution involves the double application of the even-length string algorithm, with the erasure semantics of the multiplication algorithm.  It may not be necessary to erase as we go, but it seemed like an obvious way to compute the answer.

m3 = () ->
 ruleSet = [
  createRule 0, 'B', createResult 1, 'B', 'L' # accept empty string
  createRule 0, 'a', createResult 2, 'X', 'R' # replace all a with X
  createRule 2, 'a', createResult 2, 'X', 'R'
  createRule 2, 'b', createResult 3, 'b', 'L' # until we find a 'b'
  createRule 3, 'X', createResult 3, 'X', 'L' # scan left until Blank
  createRule 3, 'Y', createResult 3, 'Y', 'L'
  createRule 3, 'b', createResult 3, 'b', 'L'
  createRule 3, 'c', createResult 3, 'c', 'L'
  createRule 3, 'B', createResult 4, 'B', 'R'
  createRule 4, 'X', createResult 5, 'B', 'R' # Find and erase an X
  createRule 4, 'Y', createResult 7, 'B', 'R' # Find and erase a Y... done with X
  createRule 4, 'Z', createResult 9, 'Z', 'R' # Found Z, check that string is all Z
  createRule 5, 'X', createResult 5, 'X', 'R' # scan right until { c, Y }
  createRule 5, 'b', createResult 5, 'b', 'R'
  createRule 5, 'c', createResult 6, 'c', 'L'
  createRule 5, 'Y', createResult 6, 'Y', 'L'
  createRule 6, 'b', createResult 3, 'Y', 'L' # scan left to b, write Y, repeat X search
  createRule 7, 'Y', createResult 7, 'Y', 'R' # Scan right past {Y, c} until { B, Z }
  createRule 7, 'c', createResult 7, 'c', 'R'
  createRule 7, 'Z', createResult 8, 'Z', 'L'
  createRule 7, 'B', createResult 8, 'B', 'L'
  createRule 8, 'c', createResult 3, 'Z', 'L' # Scan left to c, Write Z, repeat Y search
  createRule 9, 'Z', createResult 9, 'Z', 'R' # scan right past Z until B
  createRule 9, 'B', createResult 1, 'B', 'L' # accept
 ]
 createMachine ruleSet, 0, 'B', [ 1 ]

We start by treating the c characters as a left side boundary, effectively ignoring them.  We don’t care how many cs there are unless we can first prove that there are an equal number of as and bs.  We can prove there are an equal number of as and bs by converting our “a” characters to a “stack” of Xs on the tape. Every time we pop (erase) an X, we should be able to find a b and replace it with a Y.  If we are unable to empty the stack of Xs, then we didn’t have enough bs.  If we are unable to replace all the bs with Ys, then we had too many bs.

If we are able to empty the stack of Xs, and we have no more bs, then we know that |a| == |b|, and we have a stack of Ys on the tape in the same quantity.  So, |a| == |b| == |Y|.  We will also have zero or more cs to the right of the Ys.  We can now use the same technique to start popping Ys, and converting cs to Zs.   If we cant pop all the Ys, that’s a fail.   If we run out of Ys before converting the last c, that’s a fail too.  If we pop the last Y right before converting the last remaining c, then we have a balanced string.  We scan past the Zs one last time, and accept the string once we confirm there are no stray cs.

runMachine m3, ""          # valid n = 0
runMachine m3, "a"         # invalid
runMachine m3, "ab"        # invalid
runMachine m3, "abc"       # valid n = 1

There’s a few examples, there are more in the gist.  I hope you enjoyed reading about my little CoffeeScript machines.  I had quite a bit of fun working on them, especially on the Turing Machine.  For some reason, the Turing Machine gave me the same sense of wonder that I felt when I first started programming.  I think that in working from Turing’s blueprints I felt connected in some corny way to an individual who played a critical role in making what I do every day possible.

I hope you’ll play with the machines and if you see any room for improvement, let me know.

Coding

Pushdown Automaton in CoffeeScript

grammar

I decided to continue my CoffeeScript education by translating another homework assignment from C/C++ to CoffeeScript.  For my first CoffeeScript experiment, I simulated a deterministic finite automaton.  This time the simulated machine will be a pushdown automaton that accepts the language described by the context free grammar pictured to the left.

If you are unfamiliar with reading context free grammars, don’t worry, this one is pretty easy to understand.  The language generated by this grammar is similar to selection statements in C/C++.  In other words, you can use this grammar to correctly generate pairs of “if-else” statements.  Its simplified so we don’t have to worry about the predicate that follows the “if” statements or the body of code to execute when the predicate is true or false.  The only thing we are concerned about is whether every “else” follows an “if” somewhere earlier in the input.  Here’s an example of a correct sequence:

if
  if
  else
else

But since we don’t want to worry about whitespace or newlines either, we’ll feed the input to the machine as a flattened string: “ififelseelse”.

Here’s an example of an incorrect sequence:

if
else
else

It’s easy to see that there are too many “else” tokens.  To tell the difference between the valid sequence and an invalid one, the machine will have to count the number of “if” tokens and make sure there is one for every “else”.  They don’t have to balanced though.  This is also a valid sequence:

if
  if
    if
    else

Finite automata cannot solve this problem because they have no way to keep track of how many “if” tokens were encountered.  But pushdown automata have a stack which we can use to track the “if”s.

For the sticklers out there, I’ll note that the grammar is ambiguous because there is no way to distinguish between the above sequence and one like this:

if
  if
    if
else

In other words, the grammar doesn’t capture the nesting.  Both sequences flatten to “ifififelse”.  This would make this grammar somewhat difficult to use if you actually wanted to express some logic with it.  However, it has no effect on our ability to create pushdown automata that accept “ifififelse” while rejecting “ifififelseelseelseelse”.

Implementation

You can download the source from this gist.

The transition function for deterministic finite automata only need to supply the machine with the next state to move to.  In the previous example, I built the result state right into the rule object.  For pushdown automata, the transition function must supply both a new state, and a set of symbols to add to the machine’s stack.  So, we start with a function that returns an object to hold this information for us:

createResult = (state, symbols) -> {
  resultState: state
  pushSymbols: symbols.reverse()
}

When specifying the stack symbols to push, the textbook way to do so is to write the symbols down in the opposite order they will be pushed.  In other words, in the array [ A, B, C ], “C” will be pushed, followed by “B” and finally “A”.  This function reverses the given symbols so that this convention can be kept when specifying result objects.

Next we have a function to generate rule objects:

createRule = (state, input, symbol, result) -> {
  currentState: state
  inputChar: input
  stackSymbol: symbol
  ruleResult: result
  match: (s, i, y) -> 
    true if (this.currentState == s &&
             this.inputChar == i &&
             this.stackSymbol == y)
}

The rule needs to know the machine’s current state, the character read from the input, and the symbol read from the top of the machine’s stack.  The match function compares its own values with the three values supplied by the machine and returns true when they match.  This time there is only one function to create rules, and no algorithm coded into the example for generating the rule set.  For the DFA, the set of rules was larger, and the logic for automating their generation was fairly obvious, so generating the whole set saved a lot of typing.  But the set of rules for the PDA is smaller, and the logic was not immediately obvious to me, so I didn’t bother.

Next we have a function that returns machine objects.

createMachine = (rules, startState, initialSymbol, finalStates) -> {
  transitionRules: rules
  currentState: startState
  symbolStack: [ initialSymbol ]
  acceptanceStates: finalStates
  update: (r) ->
    this.symbolStack.push x for x in r.pushSymbols
    this.currentState = r.resultState
  read: (c) ->    
    return -1 if this.currentState == -1
    y = this.symbolStack.pop()
    for rule in this.transitionRules
      if rule.match this.currentState, c, y
        return this.update rule.ruleResult
    # no rules matched, enter dead configuration
    this.symbolStack.push y
    this.currentState = -1
  readAll: (s) -> this.read x for x in s.split ''  
  acceptedInput: () -> this.currentState in this.acceptanceStates
}

The machine takes a set of rules, an initial state, the first symbol to push onto the stack and a set of final states.  Like the DFA, it has an acceptedInput method which indicates whether current state of the machine is a member of the acceptanceStates collection.  The readAll convenience function is the same as the it was in the simpler machine.  For the read function, we have the same basic logic, but with a couple of twists.

First, the machine does not require that a rule exists to match every possible input.  If it can’t match the input to a rule, it enters a “dead configuration” at the bottom of the read method.  A guard clause at the beginning of the method checks for the dead configuration and bails out if the machine is already dead.  Next, we remove the top symbol from the stack.  By always removing the symbol, it makes erasure very simple later on, just do nothing and the symbol is erased.  After retrieving the stack symbol, we have the info we need to match the transition rule.  In the DFA example, I used a counted loop to iterate over the collection, but this time I tried a for … in loop.  Not sure what the CoffeeScript lingo for that is, and I’d be interested to see if this loop could be rewritten as a list comprehension.  I think it’s a step up from the counted loop anyway.

I experimented with the CoffeeScript REPL and confirmed that JavaScript does indeed short-circuit out of the loop when it hits the return statement.  It’s a good thing too, since this time I relied on that behavior.  When no rule matches, the short circuit is not hit, the machine restores the stack and enters the dead configuration.  When a rule does match, the machine calls its own update method with the rule result as the argument.  The update method pushes the result symbols onto the stack, then updates the machine state.  While writing this method on coffeescript.org and watching the JavaScript as it generated, I noticed that every method is a function by default.  The result of the last statement is used as the return value if there is no explicit return statement.  I suppose I could simulate a void subroutine by returning null explicitly, but for this example I thought it would be neat to allow the updated state to propagate up the stack and into a collection returned from the readAll method.

Next we have a method that builds and runs the machine for a given test string.

runMachine = (s) ->
  selectStmtRules = [
   createRule 0, 'i', 'S', createResult 3, [ 'A', 'S' ]
   createRule 3, 'f', 'A', createResult 1, [ 'B' ]
   createRule 1, 'i', 'S', createResult 3, [ 'A', 'S' ]
   createRule 1, 'i', 'B', createResult 3, [ 'A', 'B' ]
   createRule 1, 'e', 'B', createResult 2, [ 'C', 'D', 'E' ]
   createRule 2, 'l', 'C', createResult 2, [ ]
   createRule 2, 's', 'D', createResult 2, [ ]
   createRule 2, 'e', 'E', createResult 1, [ ]
  ]

  machine = createMachine selectStmtRules, 0, 'S', [ 1 ]
  history = machine.readAll s
  alert "Accepted: " + machine.acceptedInput() + " States: " + history.join ", "

Here, we build the machine rules necessary to recognize our grammar, then pass the rules to the createMachine function to build the machine.  The grammar, as specified, is easy to translate into transition rules.  For example, the rules S –> iA|iAB and A->fS|f taken together mean that when we start reading we must read an “i” followed by an “f’.  Once we have read the “f” we can quit, or we can read another “if’, or process a “B”.  The B is the entry point into the rules that describe the “else” token.

Translating this into a PDA means that we should start the machine with an S on the stack.  When we start the machine, “S” is the only production rule in the grammar we can use.  The machine starts in state 0, so our first rule says that if we are in state 0, and we read an ‘S’ off the stack, then we can read an “i” from the input and transition.  Since there is only one rule for state 0, it follows that we must read an “i” or the machine will die.  If we do read an “i” then the machine goes to state 3 and pushes an ‘S’ followed by an ‘A’ onto the stack.  Now that the “A” is on the stack, we can only use production rules from the grammar that have “A” on the left side.  Both of these rules start with the letter “f”, so we only need to have one rule in order to require that the next expected character is read.  So, if we are in state 3 and we have an ‘A’ on the stack and read an “f” we transition to state 1, and push a “B” on the stack.  And so on.

The machine accepts the input whenever it reaches state 1.  After it reads the first “i”, “f”, it can stop reading and accept the input.  Once there is a “B” on the stack, the machine can read another “if”, or start reading an “else”.  It reads the else by replacing the “B” with the symbols that represent the grammar rules for the “lse” part of the “else” statement, then erasing them as each required character is read.  When the machine is done reading “else” the “B” is gone—the “if” it represented has been matched with its partner and can’t be used again.  Whenever  the number of “if” and “else” tokens are balanced, “S” will be back on the top of the stack, and so the machine is allowed to generate more “B”s from state 1.

After building the machine, the runMachine function feeds the test string into the readAll method, then lets us know whether the machine accepted the input.  It also shows us the history of the machine’s state changes.

Finally I run some test strings through the machine like this:

runMachine "ifelse"
runMachine "ifelseelse"
runMachine "ififelseelse"
runMachine "ifi"

In these examples, the first and third are correct and others aren’t.  As expected, the machine correctly identifies the acceptable strings.  This time around my implementation was a little cleaner and came in at 55 lines.  Like the DFA this didn’t take long and was easier than the C/C++ implementation.  I think its safe to say that it took me longer to write up this article than it took to write the code.

Coding

Finite State Machine in CoffeeScript

Update: I revised this machine on July 5, 2012.  Visit: Implementing String.Contains for a nicer implementation of this machine, or read on for the full story.

I’m not really a JavaScript guy.  I tend to lean heavily on Google, StackOverflow and trial and error when I need to do some work in the language.  The arrival of CoffeeScript peaked my interest, but since I spend so little time in the JavaScript domain, I had no immediate reason to jump in and give it a try.  Yesterday I had an idea that got me excited giving about CoffeeScript a try.  I decided to take a homework assignment I completed a few weeks ago for the Theory of Computing course I’m taking this semester, and implement it in CoffeeScript.

cs

I spend most of my time in C#, but for this assignment I needed to use C/C++ or Java.  So, initially I implemented the solution in C#, figuring it would be “easy” to translate to C/C++.  I haven’t really done any C/C++ since .NET came out, so it wasn’t as easy as I had hoped it would be.  I spent all of one Sunday flipping back and forth between MonoDevelop and http://www.cplusplus.com/, but it eventually worked out.

Translating to CoffeeScript from the C/C++ implementation went much more smoothly.  The REPL editor and reference material on http://coffeescript.org/ is a great resource.  Here are the results (you can also view/download the whole thing from this gist.

The machine is very simple, it accepts any input that contains the string “operating system”.  Informally, I mentally partitioned the transition function into three sets.  One set of rules advances the machine toward the acceptance state whenever it reads the next expected character.  The next rule set returns the machine to the initial state whenever an unexpected character is encountered.  The final rule traps the machine in the acceptance state after it has read all the characters in the string “operating system”

The first thing I did was to define some functions that would generate these transition rules for me based on an array of expected input characters.

createRule = (index, source) -> {
 initialState: index
 inputChar: source[index]
 resultState: index + 1
 match: (q, c) -> true if (this.initialState == q && this.inputChar == c)
 }

createAltRule = (index, source) -> 
 if index == source.length
  {
    # trap in acceptance state
    initialState: index
    inputChar: null
    resultState: index
    match: (q, c) -> true if (this.initialState == q) 
  }
 else
  {
    # return to start state
    initialState: index
    inputChar: source[index]
    resultState: 0
    match: (q, c) -> true if (this.initialState == q && this.inputChar != c)
  }

Each rule is implemented as an object with some data and a match function. For rules that advance the machine toward acceptance, the match function returns true when the rule’s state and input character match the values passed in by the machine.  For those that move the machine back to the initial state, the match function checks the state and any character other than the expected character.  Finally, the trap rule does not take the input character into account at all, it just checks that the machine has indicated that it has reached the final state.

createMachine = (rules, initialState, finalStates) ->
 {
   currentState: initialState
   transitionRules: rules
   acceptanceStates: finalStates
   read: (c) ->
    for i in [0..this.transitionRules.length - 1]
      if this.transitionRules[i].match this.currentState, c
        return this.currentState = this.transitionRules[i].resultState
   readAll: (s) -> this.read x for x in s.split ''
   acceptedInput: () -> this.currentState in this.acceptanceStates
 }

Next, we have a function that builds the state machine.  This function accepts a set of rules, the start state, and a set of final states.  However, in this example, the machine only has one final state.  When the machine has finished reading the input, clients can use the acceptedInput function to check that the machine’s current state is in the set of final states.  The readAll method takes a string and is provided for convenience, it simply splits the string into a character array and calls the read method an each character in the input.

The read method controls the machine’s state by iterating through each rule until it finds a rule that matches the machine’s current state and input character.  When it finds the match, it assigns the rule’s result state to its own currentState field.  I believe that calling return will short-circuit the loop after when a match is made, but I’m not familiar enough with JavaScript to say for sure.  In any case, it shouldn’t matter because the machine is deterministic and only one rule can match for any given combination of state and input character.

The list comprehension feature of CoffeeScript will also build a collection from the read method return values and this collection should contain each state the machine chose for each computation.  That might be interesting to look at but, it isn’t really necessary for using the machine.

 getMachineResult = (b) -> 
  if b == true 
   return " Accepted. "
  else
   return " Not Accepted. "

source = "operating system".split '' 
rules = (createRule x, source for x in [0..source.length])
altRules = (createAltRule x, source for x in [0..source.length])
rules.concat altRules
finalStates = [ source.length ]

test1 = "Windows is an operating system installed on many machines."
machine = createMachine(rules, 0, finalStates)
machine.readAll test1
alert getMachineResult(machine.acceptedInput()) + test1

test2 = "Welcome to the operating room, the Doctor is just finishing his martini."
machine2 = createMachine(rules, 0, finalStates)
machine2.readAll test2
alert getMachineResult(machine2.acceptedInput()) + test2

In this part, we create one more little function to translate Boolean results into text, and then we use have we have to create and run a couple machines.  As expected “test1” is accepted by the machine, and “test2” is not.

From scratch, translating this machine to CoffeeScript took about 3 hours and resulted in 60 lines of code.  Before this I had zero experience with CoffeeScript.  All in all, I’d say it was fun and easy.