## What’s a Sizing

A sizing is a recommendation on how big (or small) your server needs to be to handle an estimated workload. Will you need a server with two cores or eight cores? This is difficult question to answer. The reason this question is difficult is because you need to estimate the behavior of your users. How many users will be active during the day? How many times will they use a wiki or a blog or their homepage? If you’re new to Connections, you’ll probably shrug your shoulders and say, “I have no idea.”

I completely expect this response. If you’ve read Predictably Irrational, we humans are just plain bad at assigning value to something new or independent. How much should this cost? “I don’t know, how much is the competitor’s?” How big should the server be? “I don’t know, how big is our current server?” We need an anchor, a point a reference to begin the conversation.

## Techline

Fortunately, IBM has a team that facilitates sizings. You complete a questionnaire filling in answers to questions like how many users, how many will use these applications, how often, etc. The Techline team runs the numbers, and you’re given a helpful report on everything from the number of processors and memory needed to the disk space required in year one. Very helpful.

But when you look at the sizing questionnaire, you’ll find many default suggestions. I’ve read enough sizings to surmise that overwhelmingly people tend to take the defaults. This is better than shrugging your shoulders, but still not ideal.

## Comparables

Let’s go back to our need for an anchor. If most of use simply take the defaults, which answers do change? The most obvious distinction between clients is their user population. Below are two graphs where the Connections defaults remain the same but the number of active users vary.

The first graph shows the number of requests by application for 100 to 5,000 active users. The minimum suggested size of the application server for this entire group is 2 cores.

Next, we increase the users: starting at 25,000 and increasing to 100,000 active users. The suggestion for the 25,000 and 50,000 populations is 4 cores. The 100,000 population is 12 cores.

The number of cores suggested exclude high availability. But for clarity let’s consider the 12 core example. This uses quad core servers. Thus we have 3 servers with 4 cores each. High availability means we add one more server with four cores for a total of 16 licensed cores.

## Beware, Math Ahead

I used data to build the above graphs. An alternative approach is to use the same data and build a model that estimates the sizing. I’ve previously used Excel and SPSS Statistics to create such models. But this time I wanted to use SPSS Modeler. With Modeler you can 1) run multiple numeric models at the same time and select the best and 2) refine the inputs to get down to the most accurate model. I did the latter since I assumed a regression as the best model and included factors that made sense to what I was estimating. For example, consider memory. The more people using the server means we need more memory, right? Yes, but the questionnaire also asks whether Connections Content Manager is used. Using this feature inherently requires more memory, and it made sense for me to include this factor in the model. The result of adding this factor was an increase in correlation from 65% to 83%. Statistical arguments and skepticism aside, this is a good thing.

## Creating a Model

SPSS doesn’t do everything for us. Or maybe it does, and I’m just a novice. But I still needed to take the data I collected from the sizing questionnaires and put that into a suitable format to build my model. Recall that the questionnaire is asking questions on user behavior: how many users, how many times do they use this application, etc. While that gives us average load, it doesn’t tell us peak load. We must make sure that the server can handle the maximum, peak, usage or bad things will happen (i.e. crash).

There are two questions that create a peak load estimate.

What is the typical length of a day in hours?

What multiplier should be applied for the load during peak hour?

The calculations then look like this for each of the Connections applications: homepage, blogs, activities, etc.

RegisteredUsers | 10,000 |

x ActivePercent | 10% |

= ActiveUsersCount | 1,000 |

x AppPercent | 75% |

= AppUsers | 750 |

x AppUse | 2 |

= AppDailyUseCount | 1,500 |

÷ DayLength | 8 |

= AppHourlyUseCount | 188 |

x Multiplier | 2 |

= AppPeakCount | 375 |

I do this calculation for each of the Connections applications and then add up all of the peak counts to give me a total peak count. You could argue that all applications are not equal. The activities application is more resource intensive than blogs. You would be correct. But the goal here is to build an estimate and blending works out better than considering each application’s count individually. If I break out the individual counts (and I did investigate this), I see negative correlations as the model tries to overfit the data. So the total peak count is what I settled on.

## So What?

Assuming you’re still here after all that, you get to ask, “So what?” Well now I can take the total peak hour request count I calculated and simply plug it into an equation given by Modeler.

TotalPeakCount * 0.00004074 + 1.213 = SuggestedCores

## Bringing It All Together

Forms Experience Builder (FEB) is one of my favorite IBM products. It gives you a drag and drop way to create a form with all the programmatic control that you’d expect from IBM. I’ve recreated much of the Connections sizing questionnaire as a FEB form. This allows the user to answer questions and run a quick calculation to get an instant sizing. What’s happening behind the scenes is that the calculations I showed above are occurring. The results are then fed into the SPSS Modeler equation to create the instant sizing.

You can take this concept one step further. A model is only as good the data that built it. With less data, you typically have more error. So the ability to submit authentic data back to FEB allows me to augment the SPSS Modeler’s data set. The model is then rebuilt, and the overall accuracy of the FEB form’s suggestion improves.

Take the form for a test drive, but remember that **it does not replace working directly with IBM**. For purchasing or production decisions, continue to engage Techline for validated and accurate sizings.