How do azure tables address scalability
This document uses the terms ETag and LMT interchangeably because they refer to the same underlying data. The following example shows a simple table design to store employee and department entities.
Many of the examples shown later in this guide are based on this simple design. So far, this data appears similar to a table in a relational database with the key differences being the mandatory columns, and the ability to store multiple entity types in the same table. Also, each of the user-defined properties such as FirstName or Age has a data type, such as integer or string, just like a column in a relational database.
Although unlike in a relational database, the schema-less nature of the Table service means that a property need not have the same data type on each entity. For more information about the table service such as supported data types, supported date ranges, naming rules, and size constraints, see Understanding the Table Service Data Model.
Your choice of PartitionKey and RowKey is fundamental to good table design. Every entity stored in a table must have a unique combination of PartitionKey and RowKey. As with keys in a relational database table, the PartitionKey and RowKey values are indexed to create a clustered index to enable fast look-ups.
However, the Table service does not create any secondary indexes, so PartitionKey and RowKey are the only indexed properties. Some of the patterns described in Table design patterns illustrate how you can work around this apparent limitation.
A table comprises one or more partitions, and many of the design decisions you make will be around choosing a suitable PartitionKey and RowKey to optimize your solution. A solution may consist of a single table that contains all your entities organized into partitions, but typically a solution has multiple tables. Tables help you to logically organize your entities, help you manage access to the data using access control lists, and you can drop an entire table using a single storage operation.
The account name, table name, and PartitionKey together identify the partition within the storage service where the table service stores the entity. As well as being part of the addressing scheme for entities, partitions define a scope for transactions see Entity Group Transactions below , and form the basis of how the table service scales.
For more information on partitions, see Performance and scalability checklist for Table storage. In the Table service, an individual node services one or more complete partitions, and the service scales by dynamically load-balancing partitions across nodes.
If a node is under load, the table service can split the range of partitions serviced by that node onto different nodes; when traffic subsides, the service can merge the partition ranges from quiet nodes back onto a single node. For more information about the internal details of the Table service, and in particular how the service manages partitions, see the paper Microsoft Azure Storage: A Highly Available Cloud Storage Service with Strong Consistency.
In the Table service, Entity Group Transactions EGTs are the only built-in mechanism for performing atomic updates across multiple entities.
EGTs are sometimes also referred to as batch transactions. The core of any table's design is its scalability, the queries used to access it, and storage operation requirements. The PartitionKey values you choose dictate how a table is partitioned and the type of queries you can use. Storage operations, and especially inserts, also might affect your choice of PartitionKey values. The PartitionKey values can range from single values to unique values. They also can be created by using multiple values.
You can use entity properties to form the PartitionKey value. Or, the application can compute the value. The following sections discuss important considerations.
Developers should first consider whether the application will use entity group transactions batch updates. Entity group transactions require entities to have the same PartitionKey value. Also, because batch updates are for an entire group, the choices of PartitionKey values might be limited. For example, a banking application that maintains cash transactions must insert cash transactions into the table atomically. Cash transactions represent both the debit and the credit sides and must net to zero.
This requirement means that the account number can't be used as any part of the PartitionKey value because each side of the transaction uses different account numbers. Instead, a transaction ID might be a better choice. Partition numbers and sizes affect the scalability of a table that is under load. They're also controlled by how granular the PartitionKey values are. It can be challenging to determine the PartitionKey based on the partition size, especially if the distribution of values is hard to predict.
A good rule of thumb is to use multiple, smaller partitions. Many table partitions make it easier for Azure Table storage to manage the storage nodes the partitions are served from. Choosing unique or finer values for the PartitionKey results in smaller but more partitions. This generally is favorable because the system can load-balance the many partitions to distribute the load across many partitions.
However, you should consider the effect of having many partitions on cross-partition range queries. These types of queries must visit multiple partitions to satisfy a query.
It's possible that the partitions are distributed across many partition servers. If a query crosses a server boundary, continuation tokens must be returned. Continuation tokens specify the next PartitionKey or RowKey values to retrieve the next set of data for the query. In other words, continuation tokens represent at least one more request to the service, which can degrade the overall performance of the query. Query selectivity is another factor that can affect the performance of the query.
Query selectivity is a measure of how many rows must be iterated for each partition. The more selective a query is, the more efficient the query is at returning the rows you want. The overall performance of range queries might depend on the number of partition servers that must be touched or how selective the query is. You also should avoid using the append-only or prepend-only patterns when you insert data into your table. If you use these patterns, despite creating small and many partitions, you might limit the throughput of your insert operations.
The append-only and prepend-only patterns are discussed in Range partitions. Knowing the queries that you'll use can help you determine which properties are important to consider for the PartitionKey value. The properties that you use in the queries are candidates for the PartitionKey value. The following table provides a general guideline of how to determine the PartitionKey value:. If there's more than one equally dominant query, you can insert the information multiple times by using different RowKey values that you need.
Your application will manage secondary or tertiary, and so on rows. You can use this type of pattern to satisfy the performance requirements of your queries. The following example uses the data from the foot race registration example.
It has two dominant queries:. To serve both dominant queries, insert two rows as an entity group transaction. The following table shows the PartitionKey and RowKey properties for this scenario. The RowKey values provide a prefix for the bib and age so that the application can distinguish between the two values.
In this example, an entity group transaction is possible because the PartitionKey values are the same. The group transaction provides atomicity of the insert operation. Although it's possible to use this pattern with different PartitionKey values, we recommend that you use the same values to gain this benefit. Otherwise, you might have to write extra logic to ensure atomic transactions that use different PartitionKey values.
Tables in Azure Table storage might encounter load not only from queries. They also might encounter load from storage operations like inserts, updates, and deletes.
Consider what type of storage operations you will perform on the table and at what rate. If you perform these operations infrequently, you might not need to worry about them. However, for frequent operations like performing many inserts in a short time, you must consider how those operations are served as a result of the PartitionKey values that you choose.
Important examples are the append-only and prepend-only patterns. When you use an append-only or prepend-only pattern, you use unique ascending or descending values for the PartitionKey on subsequent insertions. If you combine this pattern with frequent insert operations, your table won't be able to service the insert operations with great scalability. The scalability of your table is affected because Azure can't load-balance the operation requests to other partition servers.
In that case, you might want to consider using values that are random, such as GUID values. Then, your partition sizes can remain small and still maintain load balancing during storage operations. When the PartitionKey value is complex or requires comparisons to other PartitionKey mappings, you might need to test the table's performance.
The test should examine how well a partition performs under peak loads. To examine the throughput, compare the actual values to the specified limit of a single partition on a single server. Partitions are limited to entities per second. If the throughput exceeds entities per second for a partition, the server might run too hot in a production setting.
In this case, the PartitionKey values might be too coarse, so that there aren't enough partitions or the partitions are too large. You might need to modify the PartitionKey value so that the partitions will be distributed among more servers. Load balancing at the partition layer occurs when a partition gets too hot. When a partition is too hot, the partition, specifically the partition server, operates beyond its target scalability.
For Azure storage, each partition has a scalability target of entities per second. Load balancing at the partition layer doesn't immediately occur after the scalability target is exceeded.
Instead, the system waits a few minutes before beginning the load-balancing process. This ensures that a partition has truly become hot. It isn't necessary to prime partitions with generated load that triggers load balancing because the system automatically performs the task. If a table was primed with a certain load, the system might be able to balance the partitions based on actual load, which results in a significantly different distribution of the partitions.
Instead of priming partitions, consider writing code that handles the timeout and Server Busy errors. The errors are returned when the system is load balancing. By handling those errors by using a retry strategy, your application can better handle peak loads. Retry strategies are discussed in more detail in the following section.
When load balancing occurs, the partition becomes offline for a few seconds. During the offline period, the system reassigns the partition to a different partition server.
It's important to note that your data isn't stored by the partition servers. Instead, the partition servers serve entities from the DFS layer. Because your data isn't stored at the partition layer, moving partitions to different servers is a fast process. This flexibility greatly limits the downtime, if any, that your application might encounter.
Within Table storage, you can store metadata and flexible datasets. It enables you to store an unlimited number of entities and each storage account you use can contain as many tables as can fit in your storage space. You can also use it for structured, non-relational data.
This is a premium service that provides automatic secondary indexes, global distribution, and throughput-optimized tables. When using Azure tables, you can access data directly through the following addresses. This access is based on the OData protocol.
Below you can learn how these services differ and the capacities of each. Performance When using Azure Table storage there is no upper bound on the latency of your operations. With Azure Table, your throughput is limited to 20k operations per second while with Cosmos DB throughput is supported for up to 10 million operations per second.
Additionally, Cosmos DB provides automatic indexing of properties. This can be used during querying to increase performance. Global distribution You can use Azure Table in a single region with a secondary, read-only region for increased availability. In contrast, with Cosmos DB you can distribute your data across up to 30 regions.
Automatic, global failover is included and you can choose between five consistency levels for your desired combination of throughput, latency, and availability. However, with Cosmos DB a superset of functionality exists that you can use for additional methods. Billing Billing in Table storage is determined by your storage volume use.
Pricing is per GB and affected by your selected redundancy level. The more GB you use, the cheaper your pricing. You are also charged according to the number of operations you perform, per 10k operations.
Your database is provisioned in increments of RU per second and you are billed hourly for any units used. You are also billed for storage per GB at a higher rate than Table storage. When using Table storage, there are a few tips you can apply to optimize your performance.
0コメント