Atlassian released Jira Data Center two years ago offering users an active-active clustering, high availability, scalability and performance. In this study, we focused on the performance. Could a single Jira server handle 2000 active users, with 10 million existing issues ? How about using the same volume with four nodes ? What are the differences between a single Jira server and four nodes, in terms of performance ? What limitations will the system face when we reach that number ?
In the following sections we detailed our experience with the tests we’ve conducted in our servers.
In this blog, we won’t be covering how to install and setup Jira Data Center, because it has been covered in Atlassian documentation. Also, to create the data in JIRA we’ve used:
- Jira Data Generator to create the sample data in Jira.
- Jira Command Line Interface to create 2000 users with passwords from a CSV file.
We install Jira Data Center in AWS (Amazon Web Services) servers, using the instance types listed below:
|Resource||Type||vCPU||Memory (GiB)||Instance Storage (GB)||OS||JIRA Allocated Memory|
|Node 1||c3.4xlarge||16||2.8 GHz, Intel Xeon E5-2680v2||30||2 x 160 SSD||Ubuntu Server||21 GB|
|Node 2||c3.4xlarge||16||2.8 GHz, Intel Xeon E5-2680v2||30||2 x 160 SSD||Ubuntu Server||21 GB|
|Node 3||c3.4xlarge||16||2.8 GHz, Intel Xeon E5-2680v2||30||2 x 160 SSD||Ubuntu Server||21 GB|
|Node 4||c3.4xlarge||16||2.8 GHz, Intel Xeon E5-2680v2||30||2 x 160 SSD||Ubuntu Server||21 GB|
|Shared Folder + Apache HTTP Sever||c3.4xlarge||16||2.8 GHz, Intel Xeon E5-2680v2||30||2 x 160 SSD||Ubuntu Server||21 GB|
|Database (Postgres Database 9.5)||c3.4xlarge||16||2.8 GHz, Intel Xeon E5-2680v2||30||2 x 160 SSD||Ubuntu Server|
|Apache Jmeter||c4.2xlarge||8||2.9 GHz, Intel Xeon E5-2666v3||15||EBS-Only||Ubuntu Server|
Notes: To set up a shared home directory between the nodes, we use NFS. Also, we’ve decided to move the testing tool (Apache Jmeter) into a dedicated server, given its high requirement of CPU and memory.
This illustration shows how we’ve setup Jira Data Center in AWS servers.
We’ve decided to use Apache Jmeter as testing tool to load test the instances. The test script simulates a common user’s actions (i.e create issue in JIRA). Each user is performing a set of actions, in the sequence below:
|Login||Login to JIRA.|
|View Dashboard||Opening a dashboard page with 4 gadgets (pie chart, assigned to me, issue statistic, filter result).|
|View Kanban Board||Opening Kanban board with around 1000 issues in each column.|
|JQL Search||Performing search query using JQL in issue navigator interface. This search is performed twice by each user: once searching for a specific issue, and another time viewing all the tickets in Jira.|
|Create Issue||This action is preformed twice.|
|Add Comment||Adding a comment to an issue.|
All tests used the same Postgres database that contains:
- 33 projects
- 10 million issues
- 107 custom fields
- 1.6 million attachments
- 3000 users in total. Only using 2000 users in this test
We installed “Stepping Thread Group” Jmeter add-on, which provides more parameters options for thread scheduling. To make it simple, a thread means a single user that performs the actions we’ve defined above.
After the plugin has been installed, we set the initial parameters to 100 user sessions, then add 50 users every 15 seconds. There is a 10 seconds starting gap between each user. Each user then completes their actions within 5 minutes. Finally, 20 users logout every 15 seconds. The total run time for a single test is 45 minutes. The total number of users that is loaded into JIRA is 2000 users.
Apache Jmeter Threads Scheduling Parameters
Finally, we start the test with four nodes, and reduce the number of nodes by one for each round and statistic are collected.
- Significant performance and stability improvements are observed when adding a second and a third node. Adding a fourth node also improves performance, but only marginally.
Average Response Time for each node. (In milliseconds)
Highest Response Time reached by each node. (In seconds)
- During the single node tests, CPU reached 100% when all the 2000 users were logged in and performing actions, causing a very high error rate (28%). In comparison, with 4 nodes, CPU usage never exceeded 50% and the error rate was at 0.07%.
CPU utilisation showing the CPU usage for each test round
- During the test we did not observe a high memory usage from Jira instances. 21 GB of memory was allocated to each instance, and the single node JIRA reached 13 GB at some point during the test.
- Index file was 125GB, so we had to ensure the database and Jira home directory were provided with enough disk space to accommodate this.
- The full re-index took a day to be completed. In order to make a copy of the index file, we re-started Jira, and the start-up time was around 2 hours. This happened because Jira was trying to compare between the data in the index file (10 million issues) and the database. It was sending a SQL count query to Jira issue, file attachment and worklog tables. Once this was done, the postgres database indexed those queries to enhance performance (most database do have this feature). The subsequent restarts were much quicker, around 15 min.
- We recommend to use the steps below to carry out the re-indexing, while users are using Jira to avoid downtime (reference from Atlassian knowledge base):
- Ensure the node has a ‘back-door’ connector in it. This is an alternate HTTP connector that is not connected to the load-balancer / proxy, as detailed in Integrating Jira with Apache in the Configure Tomcat section.
- Remove the node from the load-balancer configuration, so clients are still not being redirected to it.
- Access the node using the ‘back-door’ connector, and perform a ‘lock JIRA and re-build index’ as per Search Indexing.
- When the index is complete, access the node and verify everything is OK.
- If so, add the node back to the load-balancer configuration.
- We notice an Apache Load Balancer was not distributing the load equally between the nodes. As shown in the CPU utilisation graph above, a specific node gets more load than others.
- Ensure you give a fair amount of disk space to your Jira_Home and shared_home directories, because you can easily run out of space once JIRA server reaches a million of issues.
- AWS doesn’t provide a memory statistic similar to the CPU utilisation graph. We’ve installed JavaMelody Monitoring Plugin in each JIRA server to monitor the memory.
- When creating data using the data generator add-on, don’t select the option of re-indexing the issues with creation process, because it will slow down the creation process, especially with big data. You can always re-index after the creation process has been completed.
We found out that Jira Data Center can handle 2000 users with 10 million issues while delivering adequate performance. We noticed that the CPU usage was one of the main bottlenecks in JIRA performance, especially when single node reached 100% and users couldn’t complete their actions. Also, the RAM was not affected much during this test, but we believe that, if third-party add-ons that requires memory are added (i.e viewing a complex reports, generate a monthly timesheet), then the usage will increase extensively. Finally, Jira Data Center can help companies scale up their large existing instance by just adding an extra node when needed to ensure their users always have the best experience from the application.
What’s next ?
- Run the test with most used add-ons configured in Jira (i.e. ScriptRunner, Tempo, EazyBi)
- Use a different load balancer (i.e AWS load balancer, NGINX)
- Use a load balancer that supports advance routing so we can route request based on the requester. If the requester is an external application (via API), send the request to the first node and normal users to the second node.
- Try to upgrade Jira nodes automatically using configuration management tool (i.e Puppet, Ansible)