Keep calculations out of your effectful code

2024年12月19日 · James Haydon

副作用のあるコードから計算処理を分離しておきましょう

When discussing function purity (vs effects), we often hear that we should strive for purity because it makes for code which is easier to test and reason about. But this advice can feel impractical when the whole point of our code is to produce side-effects like printing output. We can’t just not do that.

What we should aim for is the “pure core, imperative shell” pattern, trying to seperate the effectful code from the pure code. To keep this split, the mantra I used to have in my mind while coding was “try to keep effects out of the core”, or even just “try not to use effects”. But looking back this didn’t seem to help me much and was misguided: most of the time when you want to use effects, you do actually need them¹. So these days I’ve reframed this into: “try to keep pure code out of the imperative shell”. That is, your effectful functions should focus solely on orchestrating side effects, with all pure calculations handled in separate functions. Unpacking this produces very concrete guidance, that could even be made into a linter (but I’m not sure that would be a good idea).

Effect categories

(You can skip this section, it’s just some category theory background.)

It’s worth taking a short detour through some theory, to understand what is essential to effectful programming, and what isn’t. (For more on this, see my categorical programming language.)

Pure computation (what I like to call “calculation”) is neatly modelled by a cartesian closed category C. Effectful computation is then modelled as a category E with a functor lift : C -> E which lifts pure computations into effectful ones, with certain properties. This is then called an effect category or a Freyd category over C. The category E is usually not cartesian closed, in fact it usually doesn’t even have products. But it usually does have coproducts that are compatible with the underlying category C. So when defining morphisms in E we can:

compose morphisms, because it’s a category,
lift pure computations from E,
do branching (because it has coproducts),
maybe other stuff like recursion (depending on the sort of effects).

In this style of programming, it’s very obvious where the pure calculations are coming from, they all come from lift. Of course one could also replicate to some extent these calculations in E, but since E is lacks even products, this isn’t practical.

In Haskell, the above manifests itself as the Arrow typeclass (see e.g. Freyd is Kleisli, for Arrows (by Bart Jacobs and our co-founder Ichiro Hasuo)):

arr: lifting pure functions
first and second: “wiring” functions for carrying pure values
(|||): handling coproducts (branching) via (in ArrowChoice)

class Category cat where
    id :: cat a a
    (.) :: cat b c -> cat a b -> cat a c

class Category arr => Arrow arr where
    arr :: (a -> b) -> arr a b
    first :: arr a b -> arr (a, c) (b, c)
    -- ... other methods

class Arrow a => ArrowChoice a where
    (|||) :: a b d -> a c d -> a (Either b c) d
    -- ... other methods

Writing code using these abstractions directly gives us insight into the essential constructs of effectful code:

The categorical composition corresponds to the sequencing of effects with data dependency (with ., i.e. >=> for a Kleisli category)
lifting pure computations (arr)
carrying pure values in parallel to effectful computations (first, second, ..)
branching (|||)
possibly more (see e.g. ArrowLoop)

The principle I’m advocating for is simple: when lifting calculations via using arr, we don’t define these inline, instead we refer to a defined function, which can therefore be independently tested and documented.

But writing code with these methods directly is cumbersome, which is why Haskell has arrow-notation and do-notation. So let’s see how this principle translates to do-notation next.

In Practice

Let’s look at what this means for typical monadic code using do-notation. Your effectful code should only consist of only the following:

Sequencing effects, i.e. calling other effectful function and using bind <-,
Branching (if _ then _ else _ or case _ of),
Invoking pure functions in two places:

As the scrutinee of some branching code,
In the arguments to other effectful functions.

This will make your effectful code look a certain way which is very easy to read.

A Concrete Example: Calculator

Here’s a simple calculator that reads two numbers and prints their sum. The first version does it all at once:

calculator :: IO ()
calculator = do
    putStrLn "Enter first number:"
    x <- getInt
    putStrLn "Enter second number:"
    y <- getInt
    putStrLn ("The sum is: " <> show (x + y))

According to the above rules, the illegal thing here is putStrLn ("The sum is: " ++ show (x + y)), because calculator is meant to only orchestrate effects, but it contains a pure calculation: "The sum is: " ++ show (x + y). So, we must move this calculation to another function:

calculate :: Int -> Int -> String
calculate x y = "The sum is: " <> show (x + y)

calculator :: IO ()
calculator = do
    putStrLn "Enter first number:"
    x <- getInt
    putStrLn "Enter second number:"
    y <- getInt
    putStrLn (calculate x y)

By moving all calculations out of the effectful functions, they become less noisy.

More Complex Example: User Registration

Let’s look at a more realistic example , some code to register a user to some web service. First, here’s a version of the code which mixing calculations and effects:

data User = User
  { username :: String,
    email :: String,
    passwordHash :: String,
    createdAt :: UTCTime
  }

registerUserBad :: IO ()
registerUserBad = do
  username <- question "Enter username (minimum 3 characters):"
  if length username < 3
    then putStrLn "Username must be at least 3 characters! Please try again."
    else do
      email <- question "Enter email:"
      if not (('@' `elem` email) && ('.' `elem` dropWhile (/= '@') email))
        then putStrLn "Invalid email format! Please try again."
        else do
          password <- question "Enter password (minimum 8 characters, must contain upper and lowercase):"
          if not (any isUpper password && any isLower password && length password >= 8)
            then putStrLn "Password must be at least 8 characters and contain upper and lowercase letters! Please try again."
            else do
              passwordConfirm <- question "Confirm password:"
              if password /= passwordConfirm
                then putStrLn "Passwords don't match! Please try again."
                else do
                  now <- getCurrentTime
                  let user =
                        User
                          { username = username,
                            email = email,
                            passwordHash = hashPassword password,
                            createdAt = now
                          }
                  saveUserToDB user
                  putStrLn $ "Welcome " ++ username ++ "! Registration successful!"

And here is the improved version:

data User = User
  { username :: String,
    email :: String,
    passwordHash :: String,
    createdAt :: UTCTime
  }

data UserInput = UserInput
  { username :: String,
    email :: String,
    password :: String,
    currentTime :: UTCTime
  }

validUsername :: String -> Bool
validUsername username = length username >= 3

validEmail :: String -> Bool
validEmail email = ('@' `elem` email) && ('.' `elem` dropWhile (/= '@') email)

validPassword :: String -> Bool
validPassword password =
  any isUpper password && any isLower password && length password >= 8

newUser :: UserInput -> User
newUser UserInput {..} =
  User {
    passwordHash = hashPassword password,
    createdAt = currentTime,
    ..
  }

welcomeMessage :: String -> String
welcomeMessage username =
  "Welcome " ++ username ++ "! Registration successful!"

passwordMatch :: String -> String -> Bool
passwordMatch = (==)

registerUserGood :: IO ()
registerUserGood = do
  username <- question usernamePrompt
  if validUsername username
    then do
      email <- question emailPrompt
      if validEmail email
        then do
          password <- question passwordPrompt
          if validPassword password
            then do
              passwordConfirm <- question confirmPasswordPrompt
              if passwordMatch password passwordConfirm
                then do
                  now <- getCurrentTime
                  saveUserToDB (newUser UserInput {currentTime = now, ..})
                  putStrLn (welcomeMessage username)
                else putStrLn passwordMatchErr
            else putStrLn passwordLengthErr
        else putStrLn emailFormatErr
    else putStrLn usernameFormatErr
  where
    usernamePrompt = "Enter username (minimum 3 characters):"
    emailPromp = "Enter email:"
    passwordPrompt = "Enter password (minimum 8 characters, must contain upper and lowercase):"
    confirmPasswordPrompt = "Confirm password:"
    passwordMatchErr = "Passwords don't match!"
    passwordLengthErr = "Password must be at least 8 characters and contain upper and lowercase letters! Please try again."
    emailFormatErr = "Invalid email format! Please try again."
    usernameFormatErr = "Username must be at least 3 characters! Please try again."

This example highlighted some things that crop up when trying to follow this style:

Literals: All the literal strings have been lifted out. For small literals, you might choose to keep them in the effectful code.
Does UserInput {currentTime = now, ..} beak the rules? Some functions you invoke (effectful or pure) take a product type as input. Often you have to assemble the input from various sources, this is part of the orchestration. When using Arrow and desuraging proc-notation, GHC will use a lot of tuples and tuple-manipulating functions. Similarly it’s okay to deconstruct tuples/records in patterns.
In this example the code has become quite nested, there are ways around this (e.g. The trick to avoid deeply-nested error-handling code), but I didn’t want to detract from the main point of this blog post.
You can break the rules in small places. For example sometimes you want to one branch of an if _ then _ else _ to appear before the other, and so you need to use a not in the condition.
Note also that following the principles guides us to creating a new type UserInput, for collecting all the data that needs to be passed to the newUser calculation.

Conclusion

Separating effects from calculations isn’t just theoretical advice - it’s a practical approach to writing more maintainable code. By moving calculations out of the effectful functions, we’ve created code that follows many other principles for maintainable code: small, focused functions, menaingful intermediate types, independently testable units, etc.

I’m not talking here about using a mutable reference cell when you don’t need one, of course. I’m talking about majing network requests, printing to stdout, etc. ↩︎