union 类型(即sum types)在golang语言中的实现

http://www.jerf.org/iri/post/2917

 

Sum Types in Go

posted Jun 02, 2013
in Programming, Golang, Haskell

A couple of months back, I analyzed whether I wanted to propose switching to Go for work. I've still technically got the blog post with the results of that analysis in the pipeline (though who knows when I'll get it up), but there's a part of it that keeps coming up online, and I want to get this bit out faster. It's about whether Go has "sum types".


A sum type is a type in a language that can have multiple different kinds of values, which themselves may contain values of differing types.

To put it in C terms, it is a "union" in a struct with an element that says which member of the union is currently the active element. In fact the Wikipedia page on the topic lumps the two together. Different elements of the union may have significantly different memory sizes, so languages that make heavy use of this will not use a literal C union, but this gets the idea across.

A classic example is an AST type that defines what a legal abstract syntax tree expression is, pulled from the Haskell GADT Wikibook page:

data Expr a where
   I   :: Int  -> Expr Int
   B   :: Bool -> Expr Bool
   Add :: Expr Int -> Expr Int -> Expr Int
   Mul :: Expr Int -> Expr Int -> Expr Int
   Eq  :: Expr Int -> Expr Int -> Expr Bool

This says that an expression may be an I for Int type, which then contains an Int value, or it may be a B that contains a Bool, or it may be an Add or Mul expression which itself may contain some expression of type Int, or an Eq expression. This is all recursive, so Mul (Add (I 3) (I 4)) (I 5), the translation of (3 + 4) * 5 in traditional infix syntax, is legal.

Good languages support making it impossible to confuse the elements, so you have to "deconstruct" the sum type to do anything with it. Generally associated with a type like this you'll see a base set of functions that run the type through a case statement. Using the above, we can evaluate the expression with:

eval :: Expr a -> a
eval expr = case expr of
                (I n) -> n
                (B b) -> b
                (Add e1 e2) -> eval e1 + eval e2
                (Mul e1 e2) -> eval e1 * eval e2
                (Eq  e1 e2) -> eval e1 == eval e2

(I transformed the original into a case statement, for better parallels with Go.)

Sum types are a great example of how learning a different language can expand your mind. If you've never used a language with good support for sum types, they sound worthless; after all, you've made it this far without ever needing explicit support, and you can bodge together something that works if you ever need to. But once you've become fluent in a language that heavily uses them, you'll never stop seeing places where you need them, and you'll miss them badly. Two places where sum types are particularly strong are the aforementioned AST example, and any sort of protocol, where a sum type can easily represent the legal messages that can be received, while excluding by construction any message that can not be received. When this is done, the type system itself helps you correctly handle errors at the outermost level, then pass the messages along to an inner layer that need no longer worry about such basic error handling. It's a good idea anyhow, and having the type system help enforce it is even better.

AST types are presumably forever going to be a marginal case in Go. Nothing stops you from writing a compiler in Go, but it will never be a dominant use case for the language. Protocols, on the other hand, appear everywhere; every chan in Go defines a protocol. Often they are quite simple (chan bool can only be so complicated), but they can't all be that simple. A serious Go program will be steeped in protocols internally.

But, alas, as is well known, Go does not contain union types. Except, as the FAQ entry alludes to, it sort of already does. But the FAQ is a faint allusion, and my Googling can't find anyone demonstrating what I consider the correct way (you can do much better than interface{}, which seems to be the current de facto standard), so let me show it to you. Let's translate that Expression type into Go.

package main

import "fmt"

// Rather than a "type", we declare an "interface" for the type
type Expr interface {
	isExpr()
}

// We can't define an implementation of isExpr on int directly, so
// we wrap it
type I int

// And tag it as an "Expr". This function's only point is to do that.
func (i I) isExpr() {}

// And so on:
type B bool
func (b B) isExpr() {}

type Add struct { left Expr; right Expr }
func (a Add) isExpr() {}

type Mul struct { left Expr; right Expr }
func (m Mul) isExpr() {}

type Eq struct { left Expr; right Expr }
func (e Eq) isExpr() {}

func eval (e Expr) (r Expr) {
    switch exp := e.(type) {
        case I: r = exp
        case B: r = exp
        case Add: r = I(eval(exp.left).(I) + eval(exp.right).(I))
        case Mul: r = I(eval(exp.left).(I) * eval(exp.right).(I))
        case Eq: r = B(eval(exp.left).(I) == eval(exp.right).(I))
    }
    return
}

func main () {
    fmt.Println("4 + 5: ", eval(Add{I(4), I(5)}))
    fmt.Println("(3 + 2) * 2: ", eval(Mul{Add{I(3), I(2)}, I(2)}))
    // you'll never see this output:
    fmt.Println("Runtime error: ", eval(Add{I(3), B(true)}))
}

You can try that on the Go Playground, or if that is ever removed, paste that into a file and run with "go run". (I deliberately condensed the code for blog-readability, hit "Format" on the playground if you want to see the official formatting.)

There are a couple of differences in the translation. eval can not return either a "true" int or a "true" bool (that is, values of the literal Go types int or bool), so we write it to return an Expr unconditionally. We've lost some type safety; Haskell would refuse to compile the third sample output, but Go does, and the type system doesn't prevent eval from returning an Add, so if we wish to verify that, we must do it by inspection. This is Go's modus operandi, though, so shrug. We do still have some type safety. You can't construct an Expr that is not one of those five things, nor can you accidentally stuff a String anywhere in there, as that will be caught at compile time, nor can you fail to unpack the I or B types out of the Expr before using them in most other code, which also turns out to be a very useful property. (It means you must explicitly unpack it, and handle any errors that occur at that time, immediately, which is good for writing robust code.) I say "most" because there are some implicit conversions, but it's much more sane and constrained than, say, Javascript.

If your interface declares a lower-case is* function, no users of your package will be able to add to your sum type. I don't think you'd ever want to use this technique with a public tag function, because interfaces themselves cover the use case of wanting something publicly extendable.

Of course, the tag interface is only used if you have no meaningful interface to define for your sum type. If you do have a real interface you can define you don't need a fake is* function, just define the interface. I have code that sends out an "Item" over a socket, where the Item is some component of the protocol in question (may be a list, hash, string, etc), and there's a real interface there, such as the function that takes the object and serializes it to an io.Writer, returns the size in bytes it will have, pretty-prints, etc. Using private names for the interface does ensure though that nobody else can accidentally create a new element in the protocol in another module. In concrete terms, if you do use an is* private method in your interface, as soon as you add any other private method you should just remove the is* one from the code.

Another example I found I was using a lot was defining the following five elements:

  • An interface to declare a protocol type, with a private tag function.
  • All the messages that could be sent on that protocol as one type each, as private types.
  • An object that contained the data for the object, and a receiving control channel of the interface type defined above.
  • This object contained public methods corresponding to the messages that could be sent, which constructed the relevant structs, sent them on the control channel, and if there was to be a return value, handled creating the return channel and returning the resulting value. If the control channel can be closed, this should also handle that case.
  • A goroutine that consisted entirely of an endless for{} loop with a switch statement switching on the request coming in, and a defer statement that guarantees we close the control channel if this scope gets exited in any way. More sophisticated protocols could replace the switch with a state machine, though I never needed one.

The end result of this is a goroutine acting as a server, providing easy, mediated access to whatever resource it was controlling, with an extremely easy-to-use external object for talking to it, which at the same time successfully sealing away the details of the chan's protocol to the inside of the implementing module.

The downside of this approach is that it is awfully verbose for what it is accomplishing, and Go provides no mechanisms for factoring this pattern away. You must declare a tag interface, a type per message, an empty implementation of the tag interface per message, a method per message to send the command (themselves rather redundant, repeatedly copying the function params into a struct, creating the return channels, etc), and then a case statement per message type in the core implementation loop. That's a lot of repetitions of the phrase "per message". But it seemed to be a pretty solid pattern in Go, and this really is a great way to safely create a non-trivial protocol that is easy-to-use for using code.

(I've fiddled a bit with using the go language packages to auto-generate this sort of code. It should be possible, though I'm having some trouble with correctly embedding comments at the moment.)

You also have no compiler support for detecting that you missed a case. Again, Go's m.o., so shrug.

So, Go may not "have sum types" but they can easily be implemented in Go, right? I don't know. No two computer languages ever seem to be able to precisely agree on what any two terms mean. Obviously what we have here is not a "Haskell sum type", but this may be the Go-e-ist "sum type" thing you could hope for, within the context of its type system. (Most Go-ic? Maximally Go-ful? What's Go's equivalent to "Pythonic"?) It's certainly stronger than interface{}. Interfaces and sum types conflict because the language already has constructs that do this, and it's always a problem when a language has two constructs trying to do almost-but-not-quite the same thing. I think this covers the major use cases of sum types, and if it doesn't cover all the possible academic variants, that seems in keeping with Go's philosophy.

Given that this is possible, I consider interface{} to be a code smell, unless it is being used to indicate a function that truly takes anything. If you have a function that takes interface{} and the first thing it does is assert that the input is one of a small number of types and panic if it isn't, it should probably be using this pattern. (A possible exception can be made for things like int, but even then I prefer defining types for things as much as possible, so the type system is as helpful as possible.)

Without having any particular ideas on what the solutions may be, though, some sort of way of indicating to godoc that this is a sum type construct would be helpful. The resulting godoc you get today is not very helpful. Also, some way of addressing the boilerplate of the sum-type-protocol description above would be helpful; it is sufficiently verbose that it will tend to inhibit people from using it, even though it's a fairly robust solution. But perhaps that's just an IDE problem.

 

Sum/Union/Variant Type in Go and Static Check Tool of switch-case handling

https://medium.com/@haya14busa/sum-union-variant-type-in-go-and-static-check-tool-of-switch-case-handling-3bfc61618b1e

I’ll introduce how to represent sum/union/variant(-like) type in Go first.
Then, I’ll introduce ‘gosumcheck’   (url:   https://github.com/haya14busa/gosum), which is a static lint tool which checks all possible cases of type-switch.

This is a post for Hatena Engineer Advent Calendar 2016 (Japanese).

Sum Type in Go

So, what is sum/union/variant type? From Wikipedia,

In computer science, a tagged union, also called a variant, variant record, discriminated union, disjoint union, or sum type, is a data structure used to hold a value that could take on several different, but fixed, types.
https://en.wikipedia.org/wiki/Tagged_union

Example from Wikipedia: (binary tree)

datatype tree = Leaf
| Node of (int * tree * tree)

In this example, “tree” is sum type and it could be “Leaf” or “Node”. OK, then, how can we represent sum type in Go?

Take a look at Go FAQ about variant types.

We considered adding variant types to Go, but after discussion decided to leave them out because they overlap in confusing ways with interfaces. What would happen if the elements of a variant type were themselves interfaces?
Also, some of what variant types address is already covered by the language. The error example is easy to express using an interface value to hold the error and a type switch to discriminate cases. The syntax tree example is also doable, although not as elegantly.
— https://golang.org/doc/faq#variant_types

Yes, Go doesn’t have variant(sum) types, but we can use “interface” to represent sum-like types instead. Since FAQ mentions the syntax tree as an example, let’s take a look at go/ast implementation.

type “Node” represents AST Node type and all node types must implement the Node interface. “Node” interface requires “Pos() token.Pos” and “End() token.Pos” which returns position of node. So, to represent sum type as interface, add common methods to the interface. It’s useful for users and it prevents that unexpected type will be assigned to the interface.

It’s more interesting to see the implementation of “Expr” and “Stmt” type. They embed “Node” type to represent that they are one of “Node” type and they also have unexported “dummy” method to each interface (“exprNode()’” and “stmtNode()”) to represent each data type. So, if a sum type doesn’t have any common useful methods, you can use dummy methods.

“internal” interface and “public” interface

And one more thing! If an interface has unexported methods, external package cannot create types which implements the interface. For example, external packages cannot create their own “Expr” types. Let’s call this pattern as “internal” interface. For internal interface, we can list up every types which implements the interface because they must be in the same package.

On the other hand, “Node” type only has public method, so external packages can create their own node. Let’s call this pattern as “public” interface. In this case, if go/ast doesn’t expect node which defined in external package and there were public method which accepts “Node” type as an argument, such methods must be not safe and unexpected behavior may happen.

Go Playground: https://play.golang.org/p/dPZ5UQU98S

Just to be on the safe side, it might be better to add an internal method to sum types even though there are public common methods. In addition, the “internal” interface is useful to check every possible cases are handled. I’ll explain it later.

How to use sum types(, or interface type) in Go

We use “interface” to represent sum types, so you can use sum types effectively if you know some basic way to work with interface. Let me introduce some tips.

type switch

You can use “type switch” to discover the dynamic type of an interface variable. You can also use type assertion, but for sum types, “type switch” might be more useful in most cases.

go/ast/#Inspect Example:

// Inspect the AST and print all identifiers and literals.
ast.Inspect(f, func(n ast.Node) bool {
var s string
switch x := n.(type) {
case *ast.BasicLit:
s = x.Value
case *ast.Ident:
s = x.Name
}
if s != "" {
fmt.Printf("%s:\t%s\n", fset.Position(n.Pos()), s)
}
return true
})

Ensure a type implements expected interface at compile time

Sometimes, we may forget to add required methods to a type. In such cases, the type doesn’t implement the expected interface but compilation will succeed. To check types implements expected interface, we can use the following pattern.

// Ensure MyNode implements Node interface at compile time.
var _ Node = &MyNode{}
// You can also check by following line instead.
var _ Node = (*MyNode)(nil)

Go Playground: https://play.golang.org/p/2PKx_jLYGk

Decoding JSON of Sum Type

By the way, I am working at the Mackerel team. Mackerel(https://mackerel.io/) is server monitoring service. I’ll use Mackerel API client in Go (mackerelio/mackerel-client-go) as an example of sum type usage. But of course, it’s just a practical example, so you can ignore the details about Mackerel to understand it.

Sometimes, JSON structure for REST API contains “type” or similar fields. You have to see the value of “type” fields to grasp the structure. For such cases, we cannot decode JSON easily.

Mackerel has API for monitoring. (https://mackerel.io/api-docs/entry/monitors). There are several monitoring types, such as “connectivity”, “host metric”, etc… and “type” field represents monitor type. Monitoring types in Go is defined here.

type Monitor interface {
MonitorType() string
MonitorID() string
MonitorName() string
 isMonitor()
}
// MonitorConnectivity represents connectivity monitor.
type MonitorConnectivity struct {
ID string `json:"id,omitempty"`
Name string `json:"name,omitempty"`
Type string `json:"type,omitempty"`
IsMute bool `json:"isMute,omitempty"`
NotificationInterval uint64 `json:"notificationInterval,omitempty"`
 Scopes        []string `json:"scopes,omitempty"`
ExcludeScopes []string `json:"excludeScopes,omitempty"`
}
// ...

GET /api/v0/monitorsreturns a list of monitor configurations. Example:

{
"monitors": [
{
"id": "2cSZzK3XfmG",
"type": "connectivity",
"isMute": false,
"scopes": [],
"excludeScopes": []
},
{
"id" : "2cSZzK3XfmG",
"type": "host",
"isMute": false,
"name": "disk.aa-00.writes.delta",
"duration": 3,
"metric": "disk.aa-00.writes.delta",
"operator": ">",
"warning": 20000.0,
"critical": 400000.0,
"scopes": [
"SomeService"
],
"excludeScopes": [
"SomeService: db-slave-backup"
],
"notificationInterval": 60
}
]
}

The corresponding method of Go client is FindMonitors()

func (c *Client) FindMonitors() ([]Monitor, error)

There are 2 ways to decode JSON and get list of Monitor.

1) json.RawMessage

  • decode to a list of json.RawMessage
  • decode json.RawMessage to get only “type” value. var typeData struct { Type String 'json:"type"'}
  • decode json.RawMessage to monitor types using “type” value.
import "encoding/json"
func decodeMonitorFromRawMessage(rawmes []byte) (monitorI, error) {
var typeData struct {
Type string `json:"type"`
}
if err := json.Unmarshal(rawmes, &typeData); err != nil {
return nil, err
}
var m monitorI
switch typeData.Type {
case monitorTypeConnectivity:
m = &MonitorConnectivity{}
case monitorTypeHostMeric:
m = &MonitorHostMetric{}
case monitorTypeServiceMetric:
m = &MonitorServiceMetric{}
case monitorTypeExternalHTTP:
m = &MonitorExternalHTTP{}
case monitorTypeExpression:
m = &MonitorExpression{}
}
if err := json.Unmarshal(rawmes, m); err != nil {
return nil, err
}
return m, nil
}

2) mitchellh/mapstructure

  • decode to list of map[string]interface{}
  • get “type” value
  • convert map[string]interface{} to each monitor types by using mitchellh/mapstructure
func decodeMonitorFromMap(mmap map[string]interface{}) (monitorI, error) {
typ, ok := mmap["type"]
if !ok {
return nil, errors.New("`type` field not found")
}
var m monitorI
switch typ {
case monitorTypeConnectivity:
m = &MonitorConnectivity{}
case monitorTypeHostMeric:
m = &MonitorHostMetric{}
case monitorTypeServiceMetric:
m = &MonitorServiceMetric{}
case monitorTypeExternalHTTP:
m = &MonitorExternalHTTP{}
case monitorTypeExpression:
m = &MonitorExpression{}
}
c := &mapstructure.DecoderConfig{
TagName: "json",
Result: m,
}
d, err := mapstructure.NewDecoder(c)
if err != nil {
return nil, err
}
if err := d.Decode(mmap); err != nil {
return nil, err
}
return m, nil
}

You can see full implementation and benchmark result here.

$ go test -v -run="^$" -bench=Monitor_JSON_ | prettybench
benchmark iter time/iter
--------- ---- ---------
BenchmarkMonitor_JSON_mapstructure-4 1000 1.98 ms/op
BenchmarkMonitor_JSON_rawmessage-4 1000 1.40 ms/op

Before running benchmark, I suspect that the first json.RawMessage method might be slow because it needs to decode JSON byte 3 times for each monitor JSON. However, benchmark shows that it’s faster than the second mapstructure method, though both are fast enough I guess. From the benchmark result and the fact that encoding/json is standard library, I decided to use the json.RawMessage method to decode monitors JSON.

gosumcheck — check all cases are handled in type-switch statically

I introduced that what is sum type and how to represent it in Go, useful idioms to work with sum type as interface type and how to work with JSON. However, we can go further.

One of the best part of sum types is that, in functional language like Haskell, Scala, etc… the compiler can verify that all possible cases are handled for pattern match of sum type. We may miss some cases and when we add a new type to sum type, we must find all pattern matches (or type-switch statement). It’s better to use computers to check these cases instead of carefully finding such cases by human.

Yes, I created a static analysis tool to do that for Golang!

The result contains false positive results, but it finds fairly possible bugs!

How It Works

The idea is basically same as “guru implements” and “implements” of godoc analysis. We can find types which implements the interface with the help of go/types package. gosumcheck gathers all possible types and outputs missing types which is not handles by “case” statements. It’s simple but it just works well!!

In the middle of this post, I introduced “public” interface and “internal” interface. “public” interface is basic interface type which only has public method and any external packages can define their own types which implements the interface. I guess “error” interface is most famous example. On the other hand, “internal” interface requiresunexported methods, so all types which implement the interface must be in the same package.

You may notice it already. “internal” interface is suited for “gosumcheck” because all types which implement the interface are fixed and in one package.

But you can also use “gosumcheck” for “public” interface. Although any external package may add types which implement the interface, we can get all the dependent packages when running “gosumcheck” as a linter. So, we can list up possible types, though it may just unexpectedly implement the interface.

Heuristics for suppressing false positive results as much as possible

Since, type switch is not always used for sum types and there are cases that we don’t have to handle all types, the result might be messed up with false positive results. To address this problem, “gosumcheck” uses “cover rate” of type switch to calculate confidence of outputs. I assumed that we usually don’t miss lots of cases but miss just a few cases. So, if a cover rate is nearly 100%, gosumcheck reports rest of missing cases. On the other hand, a cover rate is low, e.g. under 50%, it assumes that the programmer intentionally didn’t list up lots of cases.

If you want gosumcheck to report more cases, please specify -min_confidence flag. (e.g. $ gosumcheck -min_confidence=0.1 ./...)

I guess we can improve outputs quality by using other heuristics, so if you have some idea, please open a pull-request or post idea to the issue tracker. https://github.com/haya14busa/gosum

 

 

posted @ 2018-04-08 14:39  微信公众号--共鸣圈  阅读(2630)  评论(0编辑  收藏  举报