Background: An Architecture for Millions of Things

Note: I'm working on a mind-blowing, super secret update. I don't know how long it will take so I will bridge the time in between with a series of more abstract posts, covering some of my current thoughts!

Also, soon I will be moving countries again, can you guess where I'm going?

In this blog post I will again talk about my current development of fundamental technology for Citybound. I want to make sure that Citybound will be about millions of things interacting and not just thousands.

This requires not just the quantitative kind of effort like "this will need to be optimized by X%", but it actually requires a qualitatively different architecture from the very beginning. Let me tell you how I found it and what it's all about.

This post is a long one (~10 min read), but it will teach you important things about computers using some nice metaphors and a little cynicism.

1. Laying out Things in Memory

Disclaimer: if you already know everything here, feel free to read this section faster. If you feel like nothing makes sense, feel free to skip to section 2 of this blogpost, which is much cooler. No one will judge you for it.

If you think about it, simulating as many things in Citybound as possible just means taking as little time as possible to simulate each individual thing. The time required depends on:

  1. how much extra work needs to be done because of stupid algorithms
  2. how much extra work needs to be done because of very generic solutions applied to very specialized problems
  3. The harsh physical reality of how today's computers work

Before even starting to write stupid algorithms, I decided to at least get 2. and 3. under control and focused on the problem in computing nowadays:

RAM alone is about 400 times too slow at feeding the CPU with data. The ugly kludge (that became normal) is to introduce layers of caches, which are successively tinier but faster memory units that give you a tunnel vision copy of a part of the RAM at an acceptable access speed.

As you can imagine, the challenge of still seeing important information when you have tunnel vision is to find a very good strategy for choosing where you move your center of attention next, ideally based on what you're currently seeing or what you just saw a couple moments ago.

This is in fact exactly what a modern CPU does: ahead of time, it shifts its center of attention to the parts of memory that your program probably wants to look at next.

If the CPU predicts this place correctly, your program will be able to get the information it needs very quickly and everything is good.

If the CPU mispredicts (this is called a cache-miss), it now has to fix its mistake and slooooowly bring the data that your program actually wanted into the cache/center of attention - all while your program is already waiting and can't do anything useful.

The problem is that the center-of-attention-prediction-rules of the CPU have to be very simple, else the CPU might spend too much time just thinking about these, not doing anything useful anymore. Another problem is that all the very different programs running on your computer can have very different memory access patterns and even if you tried to be very clever, it would be hard to come up with better general prediction rules than the ones we have.

This is the first example of a lowest-common-denominator general solution to a problem, where almost everybody trades off lost potential for not having to implement something in a specialized way themselves.

In the case of CPUs we are basically forced to accept and work around the tradeoff, in software we have more luck, as you will see further down.

To understand what we need to work with, let's have a look at the very few CPU center-of-attention prediction rules:

  • if your program just looked at some data it will probably also want to look at data around it
  • if your program just looked at some data it will probably want to look at that same data again very soon
  • if your program just looked at data here and then at data 135 bytes further to the right, it probably wants to do a walk through memory with a step size of 135 bytes

So if you want to write a really fast program, it all comes down to following these clichés that the CPU has about programs.

Now, if you think about it, this challenge is not even that much about your program, but mostly about how you lay out your data in memory, since that will determine the access patterns your program has to do.

In general, it seems like a good idea to keep related data (for example the properties of a building, or the list of all buildings) as close together as possible, in a contiguous chunk of memory, without gaps and laid out in fixed step sizes if you want to walk over that data.

Additionally, you really want to make sure that you store your data as compactly as feasible, not because of RAM limitations (there is plenty), but because you want to cram as much useful data into the tiny center of attention at once as possible, so you have to do less of this very expensive moving of attention. But don't try to be too clever! Your data can only be as compact as you can still read it directly without hassle - else you will just spend time trying to decipher overly compressed data instead.

Side note: this is just one example of the many almost Zen-like conflicts you encounter in high-performance computing. Often, all you can really do is trade off, rarely you can find a solution that transcends the conflict and improves both constraining metrics at the same time.

So far, this is actually very widely understood and common practice in video games, computer graphics and scientific simulation whenever you need to simulate millions of something very fast.

The more you read about the subject, the more you will notice, however, in how many cases the things that you have millions of are exactly identical to each other in one very important way: the size of the data representing each thing.

Particles are a good example: you can simulate a fluid with millions of particles that each consist of exactly a position and a mass, not more, not less. You can draw impressive game graphics consisting of millions of triangles, each consisting of exactly three points and some fixed set of properties, not more, not less. You can simulate a huge flock of birds, each having only a direction and a speed, not more not less.

You get the idea. But what am I hinting at, what's missing? And why is it relevant for Citybound?

To say it abstractly: it's already impressive that we can simulate millions of very similar things, but everything that is interesting about Citybound will come from the fact that it will simulate millions of unique individuals, having some similarities, but interacting in amazingly complicated, lifelike ways exactly because of their differences.

Like a corner shop might sell 10 different kinds of goods, but a supermarket might sell 100 different kinds of goods. They're still both shops. Even worse: the same supermarket at one point might serve 100 customers, later 15 and on a special occasion more than 300! How to fit that into any reasonable fixed amount of memory except by wasting a lot of space most of the time to accommodate the worst case?

The common practice solution to this problem is to separate out the differently-sized or growing/shrinking parts (let's call them "dynamic" parts) of a thing from the constant-size part: You simply ask the memory allocator of your system to give you a custom-tailored region somewhere in memory for each dynamic part and you just store a (constant-size) reference to that region in your constant-size part. If the dynamic part outgrows its tailored region, you simply copy it to a bigger tailored region, give the small region back to the system and update your reference accordingly.

That all sounds very neat and works great for most programs, the problem is exactly the somewhere in memory part. Memory allocators are very keen on keeping the overall memory at least somewhat clean and organized to be able to fulfill the requests of, once again, all kinds of programs, without wasting a lot of space.

The memory allocator, similarly to the CPU cache-predictor, is forced to be incredibly dumb to be fast enough. So it has basically no helpful assumptions about when your program will need how much memory, whatsoever.

This is the second example of a lowest-common-denominator general solution to a problem, but this time, we'll be able to do something about it!

The result is that the dynamic parts of things (whose const-size parts you laid out neatly in succession) actually end up all over the place in memory, out of order, or even worse, just the dynamic parts of one thing end up in very different places. (The commonly used nickname for this free-for-all system-managed area of memory is "heap". Seems appropriate, now that I think about it.)

If you now remember our CPU access-prediction clichés, you will realize that our all-over-the-place dynamic data flies completely in the face of the expectations of the CPU, leading to lots and lots of painfully slow cache misses.

But even for this bad job the allocator is (maybe forced to) doing, it needs huge amounts of bookkeeping that make asking it for memory regions even more painfully slow.

That is why high-performance programmers usually avoid asking the allocator for something like the pest and try to organize stuff themselves in just one or a couple huge regions.

If you do memory management yourself in this way, and you want something more complicated than collections of constant-size things, it looks like you have to invest a lot of work and you will just recreate the bad general-case solution of memory management.

That doesn't have to be true, however. By looking very carefully at what I need, I came up with a custom memory-management system, specialized for Citybound that offers great fit with CPU clichés with almost no bookkeeping required, that is very straightforward to implement.

Here is the simple gist of it:

  • dynamic parts of a thing always are stored right after its constant-size part, in a contiguous chunk
  • things of roughly similar total chunk sizes (constant + dynamic part) are stored in one "bucket", laid out contiguously in fixed-size slots, with a little leeway for growth inside
  • there are buckets for each existing size of a thing
  • if a thing outgrows the slot size of its bucket, it simply migrates to the bucket with the next bigger slot size

And the best thing is that thinking about this led me to the following discovery...

2. Actors + Message Passing

The way I was thinking about how to update the game state so far was pretty much in terms of loops over lists of game things that update each one and extra loops to handle interactions between things of type A and things of type B. Every thing needed in theory to be able to see every other thing, something that can get very hairy once you do parallel processing. My last post presented a kind of brute-force solution to it that still had its limits and drawbacks.

Overall, I was using a top-down approach to implementing the complexity of a city, hopelessly complicated, hard to reason about, hard to change. The motivation was performance. Deep down, I knew of a better way to do it, but it took me a while to see that it was actually feasible to use it in a fast way.

This other way of doing things is called Actor-oriented programming (it should actually be called Object-oriented programming, but this term was stolen and changed its meaning).

An Actor is like a tiny computer. It has all of its state hidden inside itself and can communicate only by sending messages to other Actors.

To be completely precise, all an actor can do is to react to a message arriving, by changing its internal state, by sending out messages itself, by spawning some new actors and by killing itself.

The best example for a programming language that builds upon Actors + Message Passing is Erlang, used with great success for three decades for all kinds of super-fault-tolerant and hugely-scalable systems. I had some great fun with it in the past!

To give an example from Citybound: a stretch of road will be an Actor. It knows its length and geometry and contains all the cars that are currently on it. It knows the Actors of the following and previous stretches of road that it is connected to. When it receives a "time passed!"-message, it updates its cars by moving them along itself according to physical and psychological models. Should it happen that a car goes over the end of the stretch of road, it simply sends the car (inside a "new car arrived!"-message) to the following stretch of road, which in turn will react by incorporating that car into its list of cars and now taking further care of it.

Now let's look at some magical properties of this approach:

  1. It makes no difference for the Actor where the recipient Actor of its message is located. It might be running on another core of your CPU. As long as someone delivers the message there, eventually, everything will work fine. The recipient actor might even be running on a different computer, on the other side of the earth. Yes, Actors + Message Passing is the profound pattern that will almost trivially allow distributed simulation of a city, with the "seam-lines" of computer boundaries between arbitrary things in the simulation.
  2. The Actor doesn't really care how exactly the recipient Actor of its messages work, or what it is. For example, the "following stretch of road" might not actually be a stretch of road at all, but a garage instead. Or a ferry. All that matters is that it can accept a "new car arrived!"-message and continue with its own behavior from there. Yes again, Actors + Message Passing is the profound pattern that allows this huge amount of flexibility and open-endedness, while still giving some guarantees. This alone makes some interactions between different kinds of city systems that felt impossibly complicated to me suddenly trivial to implement. Things like: how does a citizen go to a car, drive to a shop, do a transaction between his household and the shop's company…? Or: how does a pedestrian on a sidewalk behave completely differently than a pedestrian in a park? It's simple if you choose the right subdivision into Actors and Messages. This is also what will make modding very obvious and easy.
  3. If there's a bug in an Actor, it can just crash without bringing the rest of the simulation down. Some other part of the simulation probably knows how to restart the local situation. And: the bug should be pretty easy to find and fix, since each Actor doesn't really do that much complicated stuff.

I already hinted at the reason why my subconscious didn't let me see that opportunity earlier: actually based on my Erlang experience, I had the assumption that Actors and Message Passing were strictly linked to a high memory overhead and bookkeeping effort (from a high-performance computing standpoint).

When I combined Actors with my new ideas about memory management from the first part of this blog post, I realized, however, that yes, I could really have my cake and eat it too! I even discovered some further magic surprises!

  • If every Actor keeps its own state and all I ever do is update Actors, then all I ever need is some tiny "scratch memory" for intermediate calculation results during each actor update, which can just be completely wiped after the update is done (the results were either stored in Actor state again or sent out as Messages). No need for any kind of bookkeeping-intensive, all-over-the-place heap at all! (Only the custom memory management described above, which can actually be used for both the Actors and the Messages!)
  • The Actors just really don't care about their computational environment at all: how their state is stored, how they receive messages, etc. This means that I can experiment with and gradually optimize the Actor storage system as well as the Message delivery system, without touching the Actor behavior code at all. Things like loading or saving actor state to a savegame, using memory-mapped files to do so efficiently, employing double buffering, capturing the history of an Actor, ... can all be swapped in and out of the architecture for various performance, testing, debugging or development reasons without affecting the Actor's behavior and thus the simulation's *semantic behavior in the slightest.

This is big. Citybound is not only better off because of this discovery, it might actually only be possible because of it.

Let me know what you think!

Discussion on Reddit · on Simtropolis · on Something Awful
or follow @CityboundSim on Twitter
Donate with paypal
Every donation supports the development ♥

Background: A Tale of Two Worlds

Note: I'm working on a mindblowing, super secret update. I don't know how long it will take so I will bridge the time in between with a series of more abstract posts, covering some of my current thoughts!

This blog post is about a new approach to capture and update the state of the ingame world in Citybound and what it makes possible.

The naive approach to state handling

In the old Javascript prototype of Citybound, all of the game state was stored in memory in a pretty straightforward fashion:

Each game object existed once, in its current state.

To update the whole simulation, I would go over each object and update it individually.

This had three big problems:

  • When I want to write a savegame, I have to wait until the current simulation step finishes to end up with a constistent view of the world in the save game.
  • When I update car A, I don't know if car B, which is in front of it, was already updated in this simulation step or not.
    • This leads to for example a wrong estimate of available breaking distance (because you're essentially comparing the future with the past).
    • In this case it's not that horrible (it just causes weird jittering), but for other interactions between game objects it might even lead to inversions of causality.
  • Once I update a game object, there is no way to see the previous version of it anymore.
    • For example, if it's payday and I apply the incomes of all the members of a household to its wealth, I can't find out in a following simulation sub-step, by how much the wealth of the household increased in this timestep - because I overwrote the old value to compare to.
  • The worst: if I tried to parallelize the simulation and I'm updating car A in one thread and car B in another, car A might try to get some information about B that is currently being overwritten by the other thread (imagine trying to read a number on a blackboard that someone is currently changing) - and you will end up with garbage information about B.
    • In the best case this leads to wrong behaviour.
    • In the common case this just crashes the game.
    • The only way to solve this would be to introduce so called synchronization locks, but they bring a significant overhead with them (imagine the blackboard guy always having to put up a sign "wait a sec, I'm rewriting the number" and you having to wait until he removes the sign again).

Introducing the "double buffer"

In the world of computer graphics there exists the concept of double buffering, maybe you've already seen a setting called like this in a game's graphics settings.

Double buffering is the solution to the problem that your graphics card draws to the screen pixel by pixel, but you only want to see whole, finished pictures (else you get a visible "tear" between the old and new pictures).

Note: This is exactly equivalent to the first problem stated above, that in a savegame you only want to see a finished simulation step, even though the simulation updates the world one object at a time.

So double buffering works like this: you introduce a second ("double") buffer to hold a picture for the screen, and then you always display one of them on the screen, while letting the graphics card draw (pixel by pixel) onto the other screen. When it's done, you simply swap the roles of the two buffers, showing the recently finished picture that's in the second buffer and starting to draw the next picture into the now hidden first buffer.

What if we apply the same concept to our simulation? What are the two buffers in our case?

In our case this means that we always have exactly two representations of the game world: the past and the future.

Very nicely following intuition, the following properties hold:

  • Updating the world means: determining the future based on the past.
  • After one simulation step ("time passes") the future becomes your new past and you start determining an even newer future (this is exactly the swapping of buffer roles).
  • Inside one simulation step, the past cannot be changed and you know for sure that it is correct and consistent, even when you determine the future in a parallelized fashion.

This truly makes parallelization possible.

Even better, this brings the following surprising advantages:

  • You can at any point in time create a savegame out of the past and even start saving it in parallel to the current simulation step.
  • It makes it much easier to test the simulation in a "does this actually lead to that" way.
  • For programmers: the simulation update looks much more functional on all levels - instead of willy-nilly mutating the current simulation state, you explicitly map an immutable past to a future in an almost declarative way.
  • If there is a bug in the game, where a correct past state results in an invalid future state or even a game crash, you can at least save the past state (as an emergency measure). Then you reload the game in this state and try fixing it until it doesn't crash and produces the correct future state.
    • This means that even if someone else finds a game-crashing bug, they can just send me their emergency-auto-savegame and I can debug exactly this case of wrong behaviour that they discovered.
    • That means that I can write the game with a "let it crash" mentality, where I don't try desperately to keep the game running even under invalid circumstances, trying to sweep them under the rug.
    • This makes a lot of code much simpler and leads to earlier discovery of bugs - while still keeping your most valuable savegame! (ideally)

The only drawback: you need exactly twice as much memory.

I would say: worth it.

Discussion on Reddit · on Simtropolis · on Something Awful
or follow @CityboundSim on Twitter
Donate with paypal
Every donation supports the development ♥

How I'm getting along

Hi, just a short sign-of-life kind of blog post.

C++ is pretty cool, everything I do feels more "proper".

Apart from that, I've got the pleasant problem that my day job became pretty damn interesting and demanding. Not as interesting as Citybound though - I still manage to work on it ~2hrs per day on average.

What I did so far:

  • wrote the basic part of compass, the core geometry module and in the process greatly simplified the code compared to the old one
  • wrote whiteboard, a helper library to quickly display 2D drawings, mostly used as visual debugging for the geometry code
  • sketched out how memory-mapped game state/save games would look in C++ (surprisingly natural!)
  • wrote lzy, my own helper library for iterators in C++ - since a lot of high-level code in the JavaScript version of Citybound made use of them and I want to keep the style
  • found inspiration for a simplification of the road network representation, should affect a lot of areas

What I'm currently working on:

  • implementing monet, a custom-tailored minimal deferred & physically based rendering engine that should allow for some pretty advanced and beautiful lighting shenanigans later on

The goal I'm working towards:

"Benchmark 1"

  • reimplement basic planning mode for roads
  • reimplement traffic simulation
  • stress test traffic simulation with huge amount of randomly spawned cars, see how far I can push it
  • also stress test renderer with amount of cars

That's all, I hope that gives you an idea! Speak to you soon!

Discussion on Reddit · on Simtropolis · on Something Awful
or follow @CityboundSim on Twitter
Donate with paypal
Every donation supports the development ♥

April Fools!

Don't worry, everything will stay in English. Kinda sad and funny that I have to write this disclaimer, there were too many people who believed me!

Now in German: Eine kleine Statusberichterstattung

Dear English readers: ever since I started rewriting Citybound in C++, I noticed how much more efficient something can be - just because you express it in a different language! Learning from this experience, I will now do all communication with the community, be it blog posts, answers to comments, code documentation and code comments in German, my mother tongue. I'm sure you will all appreciate the speed boost this will give me for developing Citybound.

... April Fools!

Wo ich jetzt stehe

Nachdem ich mich langsam mit C++ angefreundet habe und auch schon die erweiterten Funktionalitäten einsetze und mehr und mehr meistere, wollte ich euch einfach mal bescheid geben, was ich schon so gemacht habe und was so mein nächster Plan ist.

Wenn ihr mal auf Github schaut, werdet ihr sehen, dass es schon drei neue Module gibt:

Klar, die Projektnamen sind noch auf Englisch, aber all die Beschreibungstexte sind schon deutsch!

Kuckt sie euch gerne selbst mal an:

Bitte bei all dem beachten: ist noch ganz früh in Entwicklung und ich muss noch so einiges lernen in C++, also nicht wundern, wenn der Code komisch aussieht!

Der weitere Plan

Zunächst werde ich mich weiter auf die Geometrie konzentrieren, die ist halt echt das Kernstück von allem. Sobald das zufriedenstellend steht, ist das Hauptziel, Grafik, Benutzerschnittstelle und Spiellogik so weit wieder zu implementieren, dass der Planungsmodus wieder so geht wie ihr ihn von früher kennt.

Wie lange das dauert kann ich leider noch nicht einschätzen, aber ich halte euch wie immer auf dem Laufenden. Jetzt eben aber auf deutsch!

Liebe Grüße und bis bald!

Diskussion auf Reddit · auf Simtropolis · auf Something Awful
oder folge @CityboundSim auf Twitter
Donate with paypal
Jede Spende unterstützt die Entwicklung ♥

A Music Update From Dane

Greetings Citybounders, this is your composer speaking. While things progress in the programing side of the project, I thought I'd get you up to speed on how the music aspect is coming along. The trolly will be around with a refreshment selection of ginger ale or scotch momentarily.

I'll give a few paragraphs of exposition. If you want to skip this part, go to the bold and read from there.

The original plan was to write a few pieces for each season of the game (about 4 pieces per season), with each piece being approximately 5 minutes in length. Chris and I got to work drafting up some pieces and ironing out the contextual direction for the game. Since a soundtrack can have an immense effect on the success of a game, we were very careful with our choices at the start - writing about 8 pieces between us. Here is a playlist containing the initial tracks I wrote for the game. You might recognize some of them from Anselm's updates. (Once I get some links from Chris, I'll include his work as well. I don't like reposting people's work without their permission)

2015 came and Chris and I had other projects we needed to work on. Anselm began the arduous task of building the entire engine by which Citybound will run. It came with a lot of technical information that I didn't understand. I can program in Kontakt script, but that's it. We took the year to work on our other projects and I began to really sharpen up my skills by putting them to the grindstone of Abelton and unthinkable pounds of coffee.

February 2016. The Citybound crew of 3ish is still kicking and we've stayed in touch. Everyone is doing their own part of the project, and things are going as smooth as ever. I saw that Anselm was making rather impressive strides in his C++ programing/port, and I had an itch to write. Cue music update.

Here is a compilation of the new music I've written (and finished) since February. The music here is much more diverse, and reflects more of the Autumn/Winter side of the game soundtrack.

The direction of the music has gone the way of many simulation games from yesteryear. There is music in numerous genres from Jazz to House to Downtempo to Hip Hop, and it is all categorized in respective seasons. We have upped the music production to the point where each season will likely have more than 4 tracks each, with a total soundtrack length nearing the 2 hour mark. We are currently at an hour and 4 minutes, which made a midway point update necessary.

The end goal is to write enough music to create a vastly immersive, diverse playing experience as well as get great pieces out into the world. Nothing beats a game with snappy music, and we will do our best to deliver that to you.

I would like to request you, the community's feedback. Do you like where the music is heading? Does it sound good to you? What do you want to hear heading into the second half of the soundtrack? Can you tell I'm on 4 hours of sleep based on how poorly this is written?

Finally, I want to thank you guys for sticking around for this project. It's looking really fun, and I can't wait to get my hands on an alpha copy sometime.

Comments? Questions? Cusswords? Opposing viewpoints?

Discussion on Reddit · on Simtropolis · on Something Awful
or follow @CityboundSim on Twitter
Donate with paypal
Every donation supports the development ♥