Clean And Type-safe State Machine Implementation In A Statically Typed Language?

October 11, 2024 Post a Comment

I implemented a simple state machine in Python: import time def a(): print 'a()' return b def b(): print 'b()' return c def c(): print 'c()' return a i

Solution 1:

In Haskell, the idiom for this is just to go ahead and execute the next state:

type StateMachine = IO ()
a, b, c :: StateMachine
a = print"a()" >> b
b = print"b()" >> c
c = print"c()" >> a

You need not worry that this will overflow a stack or anything like that. If you insist on having states, then you should make the data type more explicit:

dataPossibleStates= A | B | C
typeStateMachine= PossibleStates -> IO PossibleStates
machineA= print "a()" >> return B
machineB= print "b()" >> return C
machineC= print "c()" >> return A

You can then get compiler warnings about any StateMachine that forgot some states.

Solution 2:

If you use newtype instead of data, you don't incur any overhead. Also, you can wrap each state's function at the point of definition, so the expressions that use them don't have to:

import Control.Monad

newtypeState= State { runState :: IO State }

a :: Statea= State $ print "a()" >> return b

b :: Stateb= State $ print "b()" >> return c

c :: Statec= State $ print "c()" >> return a

runMachine :: State -> IO ()
runMachines= runMachine =<< runState smain= runMachine a

Edit: it struck me that runMachine has a more general form; a monadic version of iterate:

iterateM :: Monad m => (a -> m a) -> a -> m [a]
iterateM f a = do { b <- f a
                  ; as <- iterateM f b
                  ; return (a:as)
                  }

main = iterateM runState a

Edit: Hmm, iterateM causes a space-leak. Maybe iterateM_ would be better.

iterateM_ :: Monadm=> (a -> m a) -> a -> m ()
iterateM_ fa= f a >>= iterateM_ fmain= iterateM_ runState a

Edit: If you want to thread some state through the state machine, you can use the same definition for State, but change the state functions to:

a :: Int -> State
a i = State $ do{ print $ "a(" ++ show i ++ ")"
                ; return$ b (i+1)
                }

b :: Int -> State
b i = State $ do{ print $ "b(" ++ show i ++ ")"
                ; return$ c (i+1)
                }

c :: Int -> State
c i = State $ do{ print $ "c(" ++ show i ++ ")"
                ; return$ a (i+1)
                }

main = iterateM_ runState $ a 1

Solution 3:

The problem with your Haskell code is, that type only introduces a synonym, which is quite similar to what typedef in C does. One important restriction is, that the expansion of the type must be finite, you can't give a finite expansion of your state machine. A solution is using a newtype: A newtype is a wrapper that does only exist for the type checker; there is absolutely zero overhead (excluded stuff that occurs because of generalization that can't be removed). Here is your signature; it typechecks:

newtype FN = FN{ unFM :: (IO FN) }

Please note, that whenever you want to use an FN, you have to unpack it first using unFN. Whenever you return a new function, use FN to pack it.

Solution 4:

In the C-like type systems functions are not first order citizens. There are certain restrictions on handling them. That was a decision for simplicity and speed of implementation/execution that stuck. To have functions behave like objects, one generally requires support for closures. Those however are not naturally supported by mosts processors' instruction sets. As C was designed to be close to the metal, there was no support for them.

When declaring recursive structures in C, the type must be fully expandable. A consequence of this is, that you can only have pointers as self-references in struct declarations:

structrec;
structrec {
    structrec *next;
};

Also every identifier we use has to be declared. One of the restrictions of function-types is, that one can not forward declare them.

A state machine in C usually works by making a mapping from integers to functions, either in a switch statement or in a jump table:

typedefint(*func_t)();

voidrun(){
    func_t table[] = {a, b, c};

    int state = 0;

    while(True) {
        state = table[state]();
    }
}

Alternatively you could profile your Python code and try to find out why your code is slow. You can port the critical parts to C/C++ and keep using Python for the state machine.

Solution 5:

As usual, despite the great answers already present, I couldn't resist trying it out for myself. One thing that bothered me about what is presented is that it ignores input. State machines--the ones that I am familiar with--choose between various possible transitions based on input.

data State vocab = State { stateId :: String
                         , possibleInputs :: [vocab]
                         , _runTrans :: (vocab -> State vocab)
                         }
                      | GoalState { stateId :: String }

instance Show (State a) where
  show = stateId

runTransition :: Eq vocab => State vocab -> vocab ->Maybe (State vocab)
runTransition (GoalState id) _                   = Nothing
runTransition s x | x `notElem` possibleInputs s = Nothing
                  | otherwise                    = Just (_runTrans s x)

Here I define a type State, which is parameterized by a vocabulary type vocab. Now let's define a way that we can trace the execution of a state machine by feeding it inputs.

traceMachine :: (Show vocab, Eq vocab) => State vocab -> [vocab] ->IO ()
traceMachine _ [] = putStrLn "End of input"
traceMachine s (x:xs) = do
  putStrLn "Current state: "
  print s
  putStrLn "Current input: "
  print x
  putStrLn "-----------------------"
  case runTransition s x of
    Nothing -> putStrLn "Invalid transition"
    Just s' -> case s' of
      goal@(GoalState _) ->do
        putStrLn "Goal state reached:"
        print s'
        putStrLn "Input remaining:"
        print xs
      _ -> traceMachine s' xs

Now let's try it out on a simple machine that ignores its inputs. Be warned: the format I have chosen is rather verbose. However, each function that follows can be viewed as a node in a state machine diagram, and I think you'll find the verbosity to be completely relevant to describing such a node. I've used stateId to encode in string format some visual information about how that state behaves.

dataSimpleVocab= A | B | C deriving(Eq, Ord, Show, Enum)

simpleMachine :: State SimpleVocabsimpleMachine= stateA

stateA :: State SimpleVocabstateA= State { stateId = "A state. * -> B"
               , possibleInputs = [A,B,C]
               , _runTrans = \_ -> stateB
               }

stateB :: State SimpleVocabstateB= State { stateId = "B state. * -> C"
               , possibleInputs = [A,B,C]
               , _runTrans = \_ -> stateC
               }

stateC :: State SimpleVocabstateC= State { stateId = "C state. * -> A"
               , possibleInputs = [A,B,C]
               , _runTrans = \_ -> stateA
               }

Since the inputs don't matter for this state machine, you can feed it anything.

ghci> traceMachine simpleMachine [A,A,A,A]

I won't include the output, which is also very verbose, but you can see it clearly moves from stateA to stateB to stateC and back to stateA again. Now let's make a slightly more complicated machine:

lessSimpleMachine :: State SimpleVocablessSimpleMachine= startNode

startNode :: State SimpleVocabstartNode= State { stateId = "Start node. A -> 1, C -> 2"
                  , possibleInputs = [A,C]
                  , _runTrans = startNodeTrans
                  }
  where startNodeTransC= node2
        startNodeTransA= node1

node1 :: State SimpleVocabnode1= State { stateId = "node1. B -> start, A -> goal"
              , possibleInputs = [B, A]
              , _runTrans = node1trans
              }
  where node1transB= startNode
        node1transA= goalNode

node2 :: State SimpleVocabnode2= State { stateId = "node2. C -> goal, A -> 1, B -> 2"
              , possibleInputs = [A,B,C]
              , _runTrans = node2trans
              }
  where node2transA= node1
        node2transB= node2
        node2transC= goalNode

goalNode :: State SimpleVocabgoalNode= GoalState "Goal. :)"

The possible inputs and transitions for each node should require no further explanation, as they are verbosely described in the code. I'll let you play with traceMachine lessSipmleMachine inputs for yourself. See what happens when inputs is invalid (does not adhere to the "possible inputs" restrictions), or when you hit a goal node before the end of input.

I suppose the verbosity of my solution sort of fails what you were basically asking, which was to cut down on the cruft. But I think it also illustrates how descriptive Haskell code can be. Even though it is very verbose, it is also very straightforward in how it represents nodes of a state machine diagram.

Python Freelancers