We do many projects where we’re helping customers lay down new infrastructure for a new initiative, a new product or sometimes even for new lines of business. Just recently we engaged with two separate customers who had several years of experience with B2B corporate computing environments but were newly entering the world of providing their service, B2C fashion, through online and mobile applications.
While our customer understood their customer’s profile and business problem, servicing them this new way presented them with new puzzles they’d never encountered. How many people would download the app? How frequently would they access their accounts? At what time of day? When could we take the system down for maintenance and batch processing? How would our user’s access patterns coincide with nightly batch jobs? If so, how would we manage the spikes in load and demand on our system?
Fortunately, there’s a whole genre of architectures and no-sql database options that can provide highly horizontal database options that provide practically limitless scaling along with practically no downtime. How is this accomplished? Largely through a large number of clustered servers with some sort of peer-to-peer synchronization.
The subject of this blog is not to go into different clustering models of the various no-sql databases. They are well documented elsewhere. Solution architects are comforted that many if not most issues related to uptime and scalability can be addressed by adding more nodes to clusters or more clusters altogether along with some proper configuration and tuning.
But at what cost? Our customer’s new initiative had a compelling business case ROI provided their mobile application could be driven by a nice, tidy 3-node cluster. But what if I need more nodes? How many is the right amount, today? What about next year?
Rather than guessing what your AWS and/or software spend will be, wouldn’t it be better to ground that in reasonable data? Even better, make it consistent with the same assumptions that your product management and marketing teams are making. When your business case says something like: “By the second quarter of next year, we expect to have 100,000 users.”
In some cases, it may be an argument for using serverless lambda architecture where you pay on an event based consumption model. However, there may be a cutover point where lambdas aren’t a better cost model if you know you’ll always have some baseline of demand on your system. Furthermore, not all computational layers in an architecture are good candidates for lambdas - for instance, databases are stateful by their nature.
You can say something like: “Based on industry estimates, we will size our infrastructure based on the 1000:10:1 ratio. In other words, for every 1000 named users there are 100 active users and 10 concurrent users.”
So now given the above set of assumptions, you know that you need to simulate 1000 concurrent users and attach some NFRs that would be expressed as probabilities. (eg: the response time should be less than 20 ms 95% of the time). You need to empirically derive 2 factors: 1) the maximum concurrent load factor that the system can manage and meet the performance NFR with no failures and 2) the concurrent load threshold where response failures are observed.
There are several tools you could use for simulating this traffic. We’ve used and quite like Gatling. Gatling allows you to download it and use it as a standalone bundled tool or you can incorporate scripted tests directly it into your SDLC using Maven or SBT. It also allows you to try before you buy by using the open source option before (or if) you opt for the enterprise solution.
The quickstart guide is terrific and you can record and simulate your assumptions quite easily. At the very beginning, you can capture the traffic you’d like to simulate by using Gatling’s recorder tool.
Starting up the recorder is a simple command line script that opens a java app.
After the recorder is running (and the listener is started), you then open your browser of choice and point your browser traffic to go through the recorder you just started. In Chrome, you find this from Settings → System:
which will open your network settings. Point your HTTP traffic to localhost and the port you configured your recorder to listen on. (HTTP/S traffic opens up some other issues that are not germane to this blog so we’ll defer those for later.)
That’s all there is to it! Let’s start capturing our test.
In our case, we’ll just use the gatling provided database and run through a use case where we’ll add an entry to a database, save it and then retrieve it.
Note that you should see HTTP events accrue in your Gatling recorder as you perform each step in the sequence as seen below. (If you don’t see events added every time you perform a step, then you probably have not proxied your browser through the same port as your recorder is listening on.)
Once you’ve created a record and retrieved it (or some other set of steps for our simulation), you can go ahead and stop the recorder and save the simulation - and we’re now ready to run it.
Running Gatling from the command line is dead simple, it lists every simulation you’ve captured and you simply pick the one you want to run and you name it if you like.
Unsurprisingly, with a small number of requests, our application is performing quite well, with all requests being satisfied in under 800ms.
So now lets crank up the stress. How do we do that? For each recorded session, Gatling creates a scala simulation directory saved by default in
No go find your simulation and open it in your favorite editor, it’s quite easy understand what’s going on. At the top, you see the URIs and headers being built.
Next up you’ll see the sequence of steps that you simulated. (In my case, there were only 5 steps. Your simulation may have more or less. Notice the sequence of steps and the pauses in between. The specifics matter only to the extent that they are a reasonable approximation of the real world you’re trying to simulate.
And then last but not least, the number of concurrent sessions to run. By default this number is 1 when you capture a session using your recorder.
But what happens when we start to rachet this number up? Let’s find out. I ended up running a few more trials with 10, 100, 200 and 1000 concurrent sessions to see how it fared under increasing load.
As you see the concurrent sessions go up (10, 100, 200, 1000), you can also see some of the responses begin to slow down and ultimately fail. For this backend data service, we can now begin to deduce that as we get up to 200 concurrent users, we are still maintaining sub-second response times over 95% of the time.
Depending on the demands of your application (and assuming these statistics are satisfactory) you can now expect to satisfy 200 concurrent users without requiring new backend infrastructure. We have a problem when we get to 1000 concurrent users. More time and exploration should be done between these 2 thresholds.
Admittedly this is a single use case simulated over and over, normal workloads would be variable. Stay tuned for the next in the blog series to see how to mix up the workload and weave these tests into a broader software development lifecycle.