A talk about the core differences in using Go versus Java
Hi, Fandom fans!
Last month I did a Fandom-internal tech talk, on the topic of learning how to develop software in Go, from the perspective of an experienced Java software developer with a deep understanding of the JVM. The same talk was a hit with my own team when I first gave it earlier in the year, but that instance of the talk wasn’t recorded.
Luckily, we had the foresight to record it this time, and now it’s available on YouTube. Fully captioned by me, the presenter. Here it is:
First I’m going to paste my slides into the chat so folks can follow along if they would rather do it themselves, rather than wait for me to turn the pages.
Uh, hi! I’m Donald. I work out of the San Francisco area, so it’s 7 in the morning for me, unfortunately, but I’m happy to be here.
Basically, I put together a talk to discuss the differences between Go and Java, just as… using Java as a reference point, since it’s a relatively popular high-performance programming language. One of these days, I may also do a similar talk for NodeJS.
Um, let me go ahead and start presenting my slides.
[Host:]
Just one question, do you prefer questions straight away, or at the end?
[Me:]
If there’s any confusion, ask questions straight away, but if you have questions about anything that… you want me to go into more detail, I’d rather you hold until the end.
[Host:]
OK, thank you, go ahead.
[Me:]
Alright, um… One of the first places where there’s a noticeable difference between Go and Java, of course, is the standard library. Here are some of the things that Go has as an advantage over Java.
The Go standard library has built-in support for HTTPS servers and clients, with support for both HTTP 2 and TLS 1.3 (which are the latest versions of each, and are significantly different from the previous versions of those protocols).
I believe Java has finally added a decent HTTP client to the standard library, but to my knowledge there still isn’t a built-in HTTP server.
Another nice thing that Go has that Java does not is HTML templates. In particular HTML templates that do context-sensitive auto-escaping. That way you have a guarantee that you aren’t accidentally introducing security problems like Cross Site Scripting attacks, content injection, things like that.
One nice thing is that Go has is a built-in JSON encoder and decoder that can do structured types. In Java, you usually end up using either Jackson or GSON both of which are monsters to use.
Go also has a very basic support for XML encoding of structured types. It’s not the sort of thing you’d use to deserialize arbitrary XML, but if you want to create an XML format for some existing structured types that you have, the Go XML encoder is pretty useful.
Go has support for modern cryptography primitives, and the Go standard library is a lot easier to use than OpenSSL, and also a lot easier to use than the equivalent cryptography standards in the Java standard library.
And Go also has robust support for low-level Unix operations: path stuff like stat, chown (“change own”), chmod (“change mod”) — which Java added relatively recently but people still aren’t using java.nio.files.Paths very much.
Go also supports subprocesses in a much more robust way than Java does; you can actually do things you would do in the subprocess module of Python, for instance.
And Go supports signal handlers, which Java does not.
Go also has built-in equivalents to ArrayList and HashMap, so you don’t have to go hunting around in the standard library for those. They’re just available without any imports.
On the other hand, Java actually has some advantages itself.
The Java Collections Library in java.util is a lot more complete than anything offered by the Go standard library.
The Java standard library has lots of support for generic types, but Go didn’t get generic types until go 1.18 was released earlier this year.
Java supports a nice inline syntax for lambdas; Go supports anonymous functions but it doesn’t actually have a lambda syntax, so using function callbacks can be pretty obnoxious in Go.
Java supports anonymous classes that implement a named interface; for Go, if you want to implement any methods on a type you have to actually give it a name, and that name has to be declared at the top level of your current package.
And Java supports custom iterable types; in Go, only built-in types can actually iterate. You can build your own iterator, but you won’t have the nice “for item in range” syntax that Go provides.
Object Oriented Programming is extremely different, from Go to Java, or vice versa.
In particular, Go’s philosophy is “Interfaces, Not Inheritance”.
Go interfaces describe a collection of methods that a type has to implement. Go types do not need to declare which interfaces they implement; it’s automatically determined based on the methods you’ve implemented. When you try to use a value as a specific interface Go will check at compile time that the interface is implemented by the value’s type. And then it will automatically synthesize whatever internal bridging stuff it needs to make sure that type can be used as that interface anywhere else in the program.
(In technical terms, it’s synthesizing a vtable (“vee table”) for all the methods that are available through the interface.)
Versus Java, a huge difference is that Go does not have inheritance. Go does have something called “anonymous struct embedding”, which is not quite the same thing, but it can be used to graft default method implementations onto a type.
So, you can declare an anonymous struct that implements some methods, and then you can declare some additional types that embed that anonymous struct, and they will get those method implementations for free.
But those default methods don’t actually work like an abstract base class because they can’t actually call methods on your type, they can only refer to the fields within the anonymous struct and to the methods already defined on the anonymous struct.
Here is a worked example of a type that implements an interface and that has an abstract base class.
We have two methods, called “sum” and “product”, and the sum and product are computed over two integers. For the sake of the example, we concretely implement the “sum” method in the abstract base class, but we leave the “product” method undefined for implementations to override. And then in the derived class, we actually implement “product”.
And this is what it looks like when you actually use that code. You construct it using the public constructor, which I actually omitted from this example… Oh, actually I’m using Lombok!… and then you can call the “sum” and “product” methods on that type.
Here is something relatively similar in Go. We’re declaring an interface of “Sum” and “Product” that each return an int. We’re declaring an anonymous struct called “AbstractFoo” that we’re using as if it were an abstract base class and we implement “Sum” on the fields in AbstractFoo and then in the derived type we actually embed a copy of the AbstractFoo type into our type. So then when we implement the “Product” method we can refer to the fields inside the AbstractFoo as if they were our own fields. And likewise, any methods that were implemented by the abstract type also become our methods.
And at the bottom I have an example of a syntax that’s very commonly used as a compile-time type assertion which is a way of asking the compiler to confirm that the derived type can actually be used as the interface. That it implements all of the methods. That can also be useful as an executable way of telling people reading the source code that you’re intentionally implementing a given interface.
Here is an example of actually using that type in practice.
As you can see, the fact that AbstractFoo is embedded… does not… even though it’s anonymous doesn’t mean that it’s transparent! You actually have to construct the inner struct and then give it as an argument when constructing the outer, derived type. So it’s actually part of your visible contract if it’s a public type.
But once we’ve got the implementation of the Foo interface we can call the methods that are defined on it and everything’s hunky-dory.
Pointers versus Values…
Java only has one type of object passing, which is pass-by-reference, but Go provides two: there’s pass-by-value and pass-by-pointer.
Unlike C++ pointers, Go pointers are type safe and memory safe so most of the headaches that you would have with pointers from C++ are banished. Objects passed by pointer are mutable which is what you would expect from the Java world; objects passed by value are effectively read-only because you’re making a copy of them — a shallow copy, I should note — each time you call a function that takes a pass-by-value.
Go allows you to control whether the “this” reference… “this” is not actually a keyword in Go, but you can call it whatever you want, but I’m using “this” as a signifier for the concept… so Go allows you to control whether your “this” argument is passed by value or passed by pointer on a method-by-method basis.
Methods that take “this” as a pointer can’t be called on temporary values such as a return value from a function that you didn’t store as a local variable, and that’s because Go assumes that if a function takes a pointer for “this” then the function is meant to mutate state, and if it’s meant to mutate state, then it doesn’t make sense for you to throw away the object immediately after calling a state mutator.
Here is an example of two methods implemented on a type; one of them is pass-by-value, and one of them is pass-by-reference… er, pass-by-pointer. So, the basic idea of this type is that it’s an accumulator of integers and then you can ask for the sum later. So, you would call “Record” with an integer, repeatedly, and Record will append the new integer to a list that’s internal to the type, and that requires that we can actually mutate the type so that we can update the list.
But then when we call “Sum”, we don’t actually need to be able to mutate it, so we can make that a pass-by-value instead of a pass-by-pointer. And then we can iterate over the list, and then add it up into an accumulator, and then return the accumulator.
Here is an example of that in reality.
As you can see, at the call time there is no difference between a “this” argument that is passed by pointer versus a “this” argument that is passed by value.
For other arguments that aren’t the “this” pointer… er, the “this” argument… you would actually have to use the & (“address of”) operator where you use an ampersand before the type to actually create a pointer to it.
Error Handling… is another area that Go and Java are very different in. Go has a concept… instead of exceptions, Go uses errors.
Go technically has a mechanism called “panic-recover” that’s similar in functionality to throw-catch, but it’s deliberately hard to use, and you’re not ever really meant to recover from a panic… unless you are implementing something like an HTTP server that wants to recover from panics that happen inside of your individual handler threads.
Instead, Go has a simple built-in interface called “error” that functions are supposed to return if they can fail. Go doesn’t actually need an Optional or a Result type, and that’s because functions can return two or more values at once. Now that Go has generics you could actually implement such a thing, but Go got away without generics for so long because of the fact that Go can return type tuples rather than just a single type. Go code linters will automatically prevent you from ignoring an error return value, so sticking with the standard is very advantageous because it’ll mean that the linter can help you.
Here is an example of a function that actually returns an error.
I created a relatively simple function that decodes the first Unicode code point found inside a UTF-8 string. Basically, this is a wrapper around a library function called “unicode/utf8.DecodeRuneInString” (“unicode slash U T F 8 dot decode rune in string”), but the DecodeRuneInString interface is a little bit clunky so we go ahead and wrap it so we can actually return sensible error values instead of cryptic integers.
What we do is we call the library function and then we identify some error cases and then we use “fmt.Errorf” (“format dot error F”) to construct an anonymous implementation of the error interface that will provide us with a useful error message when it’s actually printed.
The error interface is actually just as simple as implementing a function called “Error” that returns a string. So, you can implement certain additional methods that can extend the functionality of a basic error but every error has that as a guarantee.
So, here we have an example of calling “firstRuneInString” and if the error is not nil, then we print it out using the standard formatter that will nicely format an error by replacing it with the actual error message.
Concurrency…
Concurrency is very different between Java and Go as well.
Java uses… these days Java uses a… a pure native thread model, but Go is using a hybrid M-to-N threading model. What that means is, since OS threads are relatively expensive, you don’t want to create more than a few thousand of them, at tops. But Go instead uses a concept called “goroutines”, which are a special type of thread that’s implemented inside of the actual Go application rather than being implemented directly on top of an OS thread. So, these goroutines are extremely cheap because the OS doesn’t need to keep track of them.
Go multiplexes these goroutines onto the OS threads using a runtime scheduler that gets compiled into your application, and the application…
the runtime scheduler in it is aware of all blocking operations and concurrency primitives, so it knows when a goroutine is about to block, and it can actually reserve that OS thread for the blocking operation if that operation can only block or it can actually automatically use OS primitives like the “poll” system call and then switch between goroutines while keeping the number of OS threads constant.
So it’s normal for a Go program to actually have hundreds of thousands of live goroutines even though the Go program only has a few tens of OS threads. Normally, the number of OS threads that Go will allocate is proportional to the number of cores that are available on your runtime environment.
When one goroutine blocks, the OS thread switches to another one; and when one goroutine unblocks another goroutine, the OS thread of the first goroutine will automatically switch to the second goroutine. So, that effectively means that one thread can wake up another instantly and take over the time slice of the OS thread without involving the OS-level thread scheduler. That means that the OS thread scheduler doesn’t actually have to keep track of which goroutines are running. It doesn’t even have to be aware that there was any kind of a context change at all! because the goroutines are completely invisible to the OS.
Go vs Java…
Goroutines are not joinable compared to a Java thread, but the Go standard library does include a type called “sync.WaitGroup” (“sync dot wait group”) that makes it really easy to fake the fact that… if you want to keep track of whether a thread is still alive, you create a WaitGroup, you add 1 for each thread that you’re spawning, and then you call “WaitGroup.Wait” (“wait group dot wait”), to block until the WaitGroup goes down to 0.
Except for the goroutine that calls your “main” function, goroutines don’t keep your process alive — which in Java terms means they are called “daemon threads”.
Java threads have a fixed stack size which defaults to 512 KiB; that means that each thread is taking up half a megabyte of stack whether you’re using it or not. Goroutines start with a 2 KiB stack size, but it can only get away with that because each goroutine can actually grow its stack dynamically as needed.
So, if an individual goroutine needs a deeper stack than that, it will allocate more, but it doesn’t have to plan ahead for the largest possible stack, which is what Java has to do.
One other nice thing that Go has is channels.
It’s a built-in concurrent data structure… a channel allows you to build very complex algorithms that share data only via message passing, instead of via sharing memory. That means you can write a lot of your code without ever needing to keep track of mutexes or other synchronization primitives. You can just use channels.
A channel can be buffered or unbuffered. A buffered channel works like a fixed-capacity concurrent queue. You put some work in on one end, and you pull some work out on the other end, and the queue can hold up to some number N of work items and putting work into the queue will block if the queue is already full.
An unbuffered channel works kind of like a mutex or a condition variable… you try to write to the unbuffered channel, and you block until something is ready to read from the unbuffered channel. And likewise, if you read from an unbuffered channel and there’s nothing in there, then of course you block until somebody writes. So that means that it’s a pure handoff, where it’s essentially like one thread passing an open mutex to another thread. Or a condition variable, where you write to a shared location and then you signal to the condition variable that the consumer of that variable should wake up.
You can read from or write to multiple channels concurrently in a single operation using the “select” statement. The “select” statement guarantees that only one select case will ever run, each time you run the “select” statement, so “select” is normally used in the body of a loop.
Essentially, it’ll pick the first select case that’s capable of running, [and] if multiple select cases become capable of running at the same time, it’ll pick one arbitrarily.
So, here’s the longest part of my presentation, which is a complete, actual, worked example of using channels and multi-threads to implement some actual functionality.
This is a type called MongoBatchWriter, and the idea of MongoBatchWriter is that we are trying to batch up writes to MongoDB so that we can reduce the amount of traffic between the client and the server using bulk operations.
Basically, we are keeping track of the Mongo client itself… We’re using a WaitGroup to keep track of all the outstanding threads so we can shut down cleanly… We keep a channel that will be used internally for incoming writes, and that’s called “writeChan”… We keep a thread that does flushes, and that thread is asynchronous so when somebody asks for a flush, we need to do the flush and then tell them that it completed, and we do that by using nested channels…
I’ll give an example of how that’s actually used in a little bit.
And then… there’s a “done” channel used internally that lets us know when an individual bulk operation completes, so that we can tell when the flush completes because the flush will require that (all bulk operations of an ID number less than or equal to when we started) have completed.
OK, so the “NewMongoBatchWriter” is a function that we’re using as a constructor wrapper… Basically all it does is populate the fields and make sure that the channels are actually allocated… And then we create a thread called the “gatherer thread” that will do the actual channel polling, and will build up batches, and then send them out to flush threads as we need them… And since we’re spawning a thread, we add 1 to the WaitGroup and then later we will block on that WaitGroup to verify that the gatherer thread has closed.
Here are the implementations of the public methods: “Write”, “Flush”, and “Close”.
For a Write, we just send it in to the write channel.
For a Flush, we create an unbuffered channel that takes no data, and then we send it in to the flush channel, and then we block until our wait channel is closed. And that will let us know that the flush that we requested has completed. So, writing to the flushChan is a request that we want to start a flush and the wait channel being closed is our confirmation that the flush has completed… so the waitChan read is expected to block until the flush is complete.
And then we Close the MongoBatchWriter by closing the write channel and the flush channel and then blocking until the gatherer thread has completed plus all flush threads.
This is a rough outline of the gatherer thread[‘s] internal code. We build up a queue, which is a pre-allocated array that has capacity for up to 1,000 items. That will automatically grow later, we’re just guessing the max number it should ever have… but… although in this case we never exceed that. We create a “Ticker” that flushes automatically every 30 seconds, even if nobody has asked for a flush, and that guarantees that if we have fewer than 1,000 writes, they will eventually make it to disk in some finite amount of time.
And then we loop on our “select” statement… As you can see, the “select” statement syntax is very similar to a “switch” statement. We read from our write channel, flush channel, and done channels… and for write and flush, those can be closed, so we take the two-argument form of a channel read where the second argument is a boolean letting you know if the channel was successfully read from, or if it was closed. “ok” will be true if we actually got data out of the channel; “ok” will be false if the read completed because the channel was closed.
I have omitted several passages of error checking and checking of the “ok” values so this is very simplified from the real code, but it should give you the flavor…
Basically, if we receive a write, we add it to the queue and we call an internal function (which is actually defined as a closure) called “flush” that will only do a flush if 1,000 items are in the queue.
And then, if we read from the “flush” channel, then we start a flush that will flush even if there’s only one item in the queue, and then we add the resulting wait channel to notify the caller to the list of blocked flushes — where a “blocked flush” means that there is a flush thread currently active and we’re waiting for the done channel to report that that flush ID has actually been completed.
And then when we receive a read from the “done” channel… the done channel is written by the individual flush threads… and then we call a closure called “done” that does the actual logic of closing all the wait channels…
And then whenever our Ticker ticks every 30 seconds we also do an unconditional flush… which is a flush that has a threshold of “1 or more items”.
And then when we’re done looping, because “ok” returned false, then we stop the Ticker, we do some cleanup work, and then we call “wg.Done” (“wait group dot done”) to let the Close operation know that this thread has completed.
“blockedFlush” is a very simple struct… We have a couple of state variables that were omitted from the last listing that are internal to the gatherer thread…
And “flush” is defined as a closure inside the gatherer thread that just does a few things that use those state variables… Basically, all we’re doing is we’re allocating a new flush identifier that uniquely identifies this flush operation… We’re adding it to the list of flushes that are currently pending… And then we create a flush thread and add it to the WaitGroup.
And then when a flush thread completes, we delete it from the list of active flushes… We increment the “done” counter to know… to the maximum that we can so that “doneFlushID” should equal the highest… the highest flush ID that has been completed so far. And then, when we go through the list of blocked flushes, if the doneFlushID is greater than the flush ID that the blocked flush is waiting on, then we go ahead and close the notification channel; otherwise we keep a copy of it for the next round… for the next time “done” gets called.
The flush thread is actually fairly simple. All it does is grab a handle to the Mongo collection, and then call BulkWrite… And it does some error checking, and we use “defer” to guarantee that some code runs: most notably writing to the “done” channel and telling the WaitGroup that this thread has completed.
(As explained in the comment, “defer” is sort of like a Java “try/finally” block.)
OK, that’s the entire worked example of MongoBatchWriter.
So… some of these features that Go has, the Java folks have actually been inspired by and they have a “Java Enhancement Proposal”… I believe that’s what that stands for… to implement a feature called “virtual threads”. Virtual threads would basically be goroutines for the JVM, where you use a hybrid threading model and I believe they’re also trying to reduce the stack size so the stack size is growable.
There’s no word yet on whether channels or “select” will come to OpenJDK, but that will probably be a future JEP, implemented separately from the Virtual Threads JEP.
For more information, search for “Project Loom”. It’s coming as a preview to JDK 19, I believe. We’ll see if it actually makes it to production for the next stable JDK.
And my final section, “Memory Usage and Optimization”…
Go and Java are very different in this respect.
The biggest thing is that it’s easier to be memory efficient in Go. Go is a lot more aggressive at placing local variables on the stack. All allocations are subject to what’s called “escape analysis” and escape analysis determines whether or not any pointers to an object outlive the scope where the object was allocated. So if none of the pointers outlive the scope where the object is allocated, then the object can be allocated inside the stack of that scope. And that means that, at compile time, if you can prove that the memory will never be referenced after the function call ends, [then] you can just make it part of the function call stack, and stack RAM is extremely cheap to free because freeing stack RAM is just changing a pointer.
(A stack is essentially a bump allocator for an individual thread. And bump allocators are the cheapest possible allocators. The problem is that they don’t let you free things in arbitrary order.)
Go’s escape analysis is very powerful, because Go actually does full-program optimization at link time. And that means that Go can actually see not just what functions exist in your current package; It can actually see what functions exist in the entire program and it can see that the functions that you’re calling do or don’t hold on to pointers to their arguments.
Go makes it easy to embed the actual bytes of one complex type inside of the bytes of another complex type. Java requires that… each “java.lang.Object” (“java lang object”) has to be allocated individually, and usually on the heap. Only primitives: boolean, int, long, short… et cetera, et cetera… can actually be embedded directly into your object. Anything that’s a class has to be allocated as a separate object, and objects have overhead.
Go has zero overhead at runtime for embedded types; but Java implements a bunch of fields that are related to GC… and each Java object also implements a mutex so that the “synchronized” keyword can work.
That adds up very fast. The overhead for a single Java object on 64-bit… I don’t remember exactly what it is, but it’s on the order of 32 to 64 bytes. Per object.
So comparable server programs written in idomatic Java (versus idiomatic Go) are often ten times more bloated in RAM use, in my experience. For example, a server that takes 512 megabytes of RAM at runtime in Java can often run in 50 megabytes of RAM at runtime in Go. For the same load and the same use case.
Basically, the GC [in Go] is vastly different because Go takes advantage of 25 years of language design history to have one of the best GC systems in wide use, beating Java in most ways. That’s a very bold statement, because Java’s GC is known to be one of the most advanced GCs in the industry.
Here are some of the differences between the Go GC versus OpenJDK’s GC… or default GC algorithms…
Go never defragments or consolidates heap memory, and that’s because Go was designed for 64-bit architectures, and there’s no longer any real need to do this on a 64-bit architecture. Basically, defragmentation is important if you have a limited amount of virtual address space, but if you have 64 bits of address space, you’re never going to use it all, so there’s no worry about fragmentation… And because there’s no defragmenter, Go never moves objects around in RAM… And because Go never moves objects around in RAM, “stop the world” pauses are extremely short and extremely rare.
Go is conceptually similar to JVM’s “Incremental Concurrent Mark and Sweep” (ICMS), but ICMS mode can hit what’s called a “concurrent mode failure”, when threads are allocating memory too quickly. But in Go, goroutines that outpace the dedicated GC threads — if they’re allocating memory too fast, they get… those goroutines get temporarily rescheduled, and the OS thread gets forced into doing GC work, and that helps the GC to catch up.
That does add some latency to your program, if you have heavily allocating threads, but it does mean that it’s the threads that do the allocating that get punished. It’s not the threads… that are running… that are being nice to the GC. So… if your… if you have some threads that are “messy”, and other threads that are very “clean”, in terms of how much allocation they do, then it’s only going to be the “messy” threads that get punished. And therefore, any thread that you write that is nice to the garbage allocator… nice to the garbage collector… will also give you nice latency.
Java’s GC algorithms are tuneable, but they are too tuneable. They are notoriously fiddly… you have to tune them correctly or they will give you terrible performance… you have to tune them correctly or they will stop your application for ridiculously long periods of time…
In contrast, Go’s GC has almost no tuneables. Go does not use generational heap, that’s what the stack is for. The environment variable “GOGC” (“go g.c.”) specifies the ratio between “objects found live during the last GC pass” versus “objects allocated since then”. Go will dynamically resize the maximum heap size, up OR down to pace with object allocation rates; so object allocation… [if] objects are being allocated too fast then the max heap will be grown and that will slow down the number of [GC cycles] needed; And if object allocation rates then slow down later it will actually decrease the heap size so that [GC cycles] happen more frequently. And it will try to maintain a specific ratio of… of objects live versus objects allocated in order to tell the GC thread how often it should be running.
There’s one noteworthy downside to the Go GC… and that’s that Go doesn’t provide an easy way to tell the GC that you would rather waste RAM than waste CPU. But there’s a solution to that, and it’s called a ballast.
A ballast is a chunk of garbage that you deliberately leak and then you never access it again. And ballast doesn’t count against your RAM usage as far as the OS is concerned because you never actually read [from] or write [to] it. It’s created zero-allocated but those zero pages are actually lazy and if you never actually read from them or write to them they never get physically allocated to physical RAM. It only tricks the GC in Go into believing that you have 1 gigabyte more of heap usage than you actually do.
Uh, questions?
[Host:]
Alright, we have 5 minutes for questions. Do we have any questions?
[Me:]
Going once…
Going twice…
Sold! To the beautiful Nobody.
[Host:]
Thank you very much. It was a really great presentation. I’m going to stop… my recording now…