Blogs

Multi-Tenant Database Strategies for Mixed Workloads and Real-Time SLAs

Multi-tenancy—where a single instance or multiple instances of one or more applications operate in a single shared infrastructure environment—is not a new concept. First used in mainframe hardware, timesharing enabled a single massive piece of computing iron to be sliced up and used by multiple applications on a single hardware instance. This simplified both the hardware landscape as well as the skills and resources needed to manage environments. With the rise of software-as-a-service (SaaS) in today’s modern private and public cloud environments, multi-tenancy is now a common and critical function. It allows cloud companies to scale and customers to save money by essentially pooling resources. As more applications demand more and more data, multi-tenancy is now a core function at the database layer.

But when you have a lot of different applications running on the same infrastructure and accessing the same database, how do you prevent one application from impacting the data performance of another? And how do you determine from the underlying database infrastructure which tenants get what infrastructure and at what performance level?

Here’s a look at the driving forces behind today’s real-time data needs and how they impact multi-tenant data architecture management.

SLAs Are at the Heart of Multi-Tenant Data Management

Everyone wants faster performance and more resilient systems. We’re impatient, so reliable, real-time performance delivers large, competitive business benefits. However, every tenant does not require the same level of performance. Actual performance needs may be plenty fast for a user or machine without requiring the fastest execution path possible with the available resources.

Service-level agreements (SLAs) ultimately govern the resources delivered to a tenant. Real-time may need to be measured in sub-milliseconds in one instance (i.e., for a stock trade) but in sub-seconds for others (i.e., in a Zelle funds transfer). This is where developers, architects and operators partner with a business to understand its actual user or system needs and to build a system that can deliver the SLA using the minimum amount of resources possible.

Mixed Workloads and the ‘Noisy Neighbor’ Problem

As soon as workloads are shared, there’s the potential for one application to quickly grab and use resources in a way that impacts other applications. Often called a “noisy neighbor,” the impacting applications can dramatically interfere with real-time response time SLAs when multi-tenant apps share an infrastructure base. For example, one node in a database cluster becomes heavily loaded, and the performance of other nodes begins to be impacted, causing SLAs to be missed.

Typically, multiple virtual machines (VMs) are hosted on a single physical machine or node but share system resources (i.e., processor, network, storage, etc.) with the other VMs. It’s possible for applications to overuse those shared resources, impacting other applications and sometimes even the database itself. Such overusage of the infrastructure layer (in this case, sharing compute, storage, network queues, etc.) leads to poor database and end-to-end system performance. It’s like plugging a 100-amp appliance into a 50-amp circuit. All the circuit breakers will trip, and the entire system comes to a screeching halt!

Strategies to Maximize Performance for Database Multi-Tenancy

Configuring and managing the database itself (not just infrastructure) with multi-tenancy in mind can maximize performance and avoid problems like noisy neighbors. Isolating databases in your multi-tenant architecture can help minimize non-database problems that can be the root cause of poor performance and missed SLAs.

Start by picking databases that allow applications to isolate their data access based on sets (or tables in relational parlance) that can have storage limits attached to them. When combined with rate limits for transactions on a per-user basis, applications can be guaranteed their quota of runtime resources. Using the per-app quota configurations for items like storage and TPS (transactions per second), the database can evaluate the total capacity available in the system and ensure that enough resources are reserved and available to ensure that the SLAs of all applications are met all the time.

Note that unexpected events that cause rebalancing of data within the distributed database, such as node and network failures, require additional resources. The database system will subtract that resource capacity from what is available for apps so that the system is not overcommitted during failure situations.

Enterprises also need very good automation and observability to fix immediate problems quickly and have the requisite visibility into long-term trends that impact performance and SLA compliance. Otherwise, by the time a human notices a problem, the SLA may already have been missed and the business impacted.

Good observability can help you understand how data is used and create those automations. As more applications access and use the same data set in different ways, you need modern observability tools integrated into (or interoperable with) the underlying database.

With a good understanding of how your data-rich applications interoperate with other tenants in mixed-load environments, you can begin to deploy these strategies to make sure you conform to all required SLAs and keep all that data working hard for the business.

Srini Srinivasan

Srini Srinivasan is a founder and chief technology officer at Aerospike. When it comes to databases, he is one of the recognized pioneers of Silicon Valley. He has two decades of experience designing, developing, and operating high-scale infrastructures. He also has over a dozen patents in database, web, mobile, and distributed systems technologies. He co-founded Aerospike to solve the scaling problems he experienced with Oracle databases while he was senior director of engineering at Yahoo

Recent Posts

Preparing for the Shift to Platform Engineering

Organizations must proactively navigate newly emerging trends within platform engineering to remain competitive.

47 seconds ago

How to Build a Data Platform for Self-Service, Ad-Hoc Analytics

To truly fulfill the promise of self-service, ad-hoc data analytics across huge datasets, teams need to work in real-time.

1 hour ago

Post service upgrade pods are taking too long to serve traffic – Answers

After a service upgrade the pods are taking too long to serve traffic. What would you need to find out…

15 hours ago

Atlassian Makes Compass IDP Generally Available

Atlassian's Compass provides an instance of an IDP that is simpler for DevOps teams to set up and maintain.

24 hours ago

Remember: Tools Fail, Too

In DevOps, we’re admonishing each other less and fixing things more, and that’s good. But tools are going to fail.

1 day ago

GitGuardian Adds Tool for Discovering Secrets in Public Repositories

GitGuardian added a tool that makes it possible for DevOps teams to search GitHub repos to determine if secrets have…

2 days ago