Wednesday 25 April 2012

SQL Azure Federation





 What is Federation?

  • Federation is a database object like other objects such as tables, views, stored procedures or triggers.
  • It is also known as Sharding.
  • There can be multiple federations in a database.
  • It is data dependent/data directed routing so it can scale out across databases.
  • It is a feature that you can only do on Azure; it is not supported in regular SQL Server. 


 Advantage of Federation

  • Elasticity 

Ø  Scale out when you need and scale back when you don’t need.
Ø  Create an elastic database tier that can expand and contract with your applications workload Ø  without downtime!

  • Unlimited Scalability 

Ø  Federations offer scale beyond the capacity limits of a single SQL Azure database.
Ø  It can horizontally scale out to many nodes.
Ø  It can go beyond the limits of a single Azure database.

  • Multi-Tenancy 

Ø  It is favorable for very small tenants and very large tenants.
Ø  Tenant management is very easy with federation.
Ø  Tenant placement can be changed at any time without downtime.

  • Online Repartitioning  

Ø  Federation allows online repartitioning of data through ALTER FEDERATION T-SQL commands.
Ø  By repartitioning SPLIT operation data can be moved to new federation members without downtime.


 




 


 Federation and Related terms difference

Federation & NoSQL - Federation and NOSQL are different. It uses SQL within each federation member. It supports Parallel Queries and Map Reduce patterns same as what NoSQL offers.

Federation & Partition - Federation and partition are similar but there exist a difference. In partitioning SQL Server gives us an automatic way to query all the partition. In federation there is no automatic way; Federation members don’t talk to each other.

Just take an example. Divide a big table into 2 tables and keeps them into a single database and query them separately that is really a federation. It means 2 tables look like to be in separated databases and they don’t talk to each other. You can’t do the join between 2 tables too.

 Scenario where Federation can be used

  • Web Scale Database Solutions 

      Ø  Gaming
      Ø  Social Networking sites
      Ø  Online electronic stores
      Ø  Travel reservation system
      Ø  NOSQL Applications

  • Applications which 

     Ø  Need big data
     Ø  Big scale
     Ø  Massive parallelism
     Ø  Eventual consistency & semi-structured data 

  • Software Vendors Implementing Multi-tenant SaaS Solutions 

  • Very Huge Partitioned Data Mart


 Federation Architecture & Design

Federation Root: It is a database that contains the federation database object. It stores federation metadata information. All application connects to federation root first. Root database is the central repository for information about distribution of scaled-out data.
Federation Member: Federation use system managed SQL Azure databases to achieve scale-out named federation members. Federation members provide the computational and storage capacity for parts of the federations workload and data. Collection of all federation members in a federation represents the collection of all data in the federation. Federation members are managed dynamically as data is repartitioned. Administrators decide how many federation members are used at any point in time using federation repartitioning operations.

Each federation member is a single database. The CREATE FEDERATION process creates a new database based on the root database. When you create a new federation you have to give the name of the federation (not the name of the physical database, which is a System-GUID), the name for the distribution key, and distribution data type on which data will be federated on. The valid distribution data types are int, bigint, uniqueidentifier and varbinary (up to 900 bytes).Varbinary can be used if you want to define string or characters federation. You need to cast string to varbinary and then you can create a string federated member. Currently only Range Distribution Type is supported. Probably in next future release we expect Azure should support HASH Distribution Type also.

 
 
 
SalesDB is the root database with many federations.

Federation Distribution Key: Each federation has a single key for federating or we can say partitioning data. In the federation definition, the distribution key represented by 3 properties:
  1. A distribution key label that is used for referring to the key,
  2. A data type to specify the valid data domain for the distribution such as uniqueidentifier or bigint, and
  3. A distribution type to specify the method for distributing the data such as ‘range’.
Federation Atomic Unit: All data for a value of a federation key stored in the same federation member. It represents all data that belongs to a single instance of a federation key.
Federation design is de-normalized rather than usual normalized design.

There are three main types of tables we have to consider while designing federation:

Entity/Federated Table: Refer to tables that contain data that is distributed by the federation. Tables that will be split based on the federation and their federation columns.
       
Ø  It is subset in each federation member.
Ø  Must Use Entity Key FEDERATED ON CLAUSE.
Ø  Federated by entity key – must be part of a primary key.

Reference table: Refer to tables that contain reference information to optimize lookup queries in federations. Reference tables are created in federation members and do not contain any FEDERATED ON annotation.

Ø  Tables that will not be split, but should be referred by the federated table.
Ø  These tables will be copied across all federation members.
Ø  Exist in each federation member in entity.

Example would be a Product table or other dimension tables.

Important Note -When you do a split of a federation member than reference table will be replicated automatically to new federation member. If you want to insert/update reference table than you have to do it individually for each reference table in each federation member. There is no way to keep reference table in sync automatically.

      Global Central Table: Refer to tables that are created in the federation root for typically low traffic or heavily cached data that does not need to be scaled-out.

Ø  Tables that will not be split, and very rarely join-select with the tables in federation members.
Ø  Exist in federation root – Good for storing metadata.

 Limitation of Federation

  • Cross-database joins are not supported in SQL Azure and it is a typical behavior of SQL Azure.
  • SQL Azure federations don’t support Identity on a column while creating new table.
  • If you shard your data into N number of federation members and then decide you need to make a schema change, you’ll have to modify the schema on each federation member database. 
  • There should be a primary key on a table of a source database if you want to do migration to SQL Azure federation.
  • Merge is not supported in SQL Azure federation currently; probably it should be accommodated in next future release. If you want to do merge than
    • First Backup the federation member database somewhere on SQL Azure.
    • Now drop the federation member.
    • Now insert backup data into specific range.
  • Sync Services of SQL Azure is not supported.
  • Azure import-export is also not supported.
  • SQL Server 2012 data tools (SSDT) is also not supported.
  • Joins, grouping, aggregation should be done by application and to do the same fan out queries can be used.
  • There is no automatic fan out query is currently supported in Azure.
  • Federation keys are immutable.


 Some Useful System Views

Standard system views use in day to day federation operation

  • sys.federations represent the federations in a database.
  • sys.federation_distributions represent the definition of the federation distribution scheme for federations.
  • sys.federation_members represent members of the federations.
  • sys.federation_member_distributions provides the distribution details per federation distribution key per member.


Standard system views use in day to day to manage federation

  • sys.dm_federation_operations returns one row per SPLIT or DROP operation on a federation. Each row contains information on the progress and any error conditions for the operation. 
  • sys.dm_federation_operation_errors returns rows containing information on errors that occur during SPLIT or DROP federation operations.
  • sys.dm_federation_operation_error_members returns the list of members involved in federation operations that failed due to errors.


Example

Connect to SQL Azure database through SSMS.

CREATE DATABASE FedDb

Above statement will create a federation root database.

Now connect to FedDB again using SSMS. You can’t switch to FedDB directly so you need to connect again.

Now you need to create a federation member with federation key. To create a federation you have to give a
name and federation key. Let us consider a scenario. Suppose you have a customer table where
CustomerKey is a primary key and table is very huge and you want to create a federation for it. As per
federation rule CustomerKey has to be a part of or whole thing of a primary key in every federated 
tables.

So if we want to create a customer federation than CustomerKey should be chosen a federation key. Let us
assume that we also have a CustomerAddress table and it has a CustomerKey as a part of primary key. 
Now assume we have Order and and OrderDetail table also. Order table has a CustomerKey as a part of 
key, so it would be fine again. OrdeDetails table hasn’t CustomerKey so in this case we have to de-
normalize it and introduce CustomerKey into it. Now assume you have a Product table that is nowhere 
related with CustomerKey, in this case product table will be created as a reference table.


CREATE FEDERATION CustomerFederation(CustId BIGINT RANGE)

Above statement will create a Federation with name CustomerFederation where CustId is a distribution key
with data type int and distribution type is range.

Before you create any table on federation member you have to execute below:

USE FEDERATION CustomerFederation (CustId=0) WITH RESET , FILTERING OFF
GO

After executing above you can to create a customer table and syntax should be like below:

SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

CREATE TABLE [dbo].[Customer](
   [CustomerKey] [int] NOT NULL,
   [FirstName] [nvarchar](50) NULL,
   [MiddleName] [nvarchar](50) NULL,
   [LastName] [nvarchar](50) NULL,
CONSTRAINT [PK_Customer_CustomerKey] PRIMARY KEY CLUSTERED
(
   [CustomerKey] ASC
)

) FEDERATED ON (CustId = CustomerKey)

SET ANSI_NULLS OFF
GO

SET QUOTED_IDENTIFIER OFF
GO

Please take a note of highlighted part. Now if you want to create a CustomerAddress table than syntax
should be like below:

SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

CREATE TABLE [dbo].[CustomerAddress](
   [CustomerKey] [int] NOT NULL,
   [AddressKey] [int] NOT NULL,
CONSTRAINT [PK_CustomerAddress_CustomerAddressKey] PRIMARY KEY CLUSTERED
(
   [CustomerKey] ASC,[AddressKey] ASC
)

) FEDERATED ON (CustId= CustomerKey)

SET ANSI_NULLS OFF
GO

SET QUOTED_IDENTIFIER OFF
GO

Assume you have data loaded into Customer and CustomerAddress table by using some insert script.

To query federation data you should use below


USE FEDERATION CustomerFederation (CustId=0) WITH RESET , FILTERING OFF
GO

SELECT * FROM Customer 

Special Note: You connect to the root and from there you can connect to multiple federation members
means multiple databases, so you don’t need a multiple connection pool. Please remember that SQL Azure
doesn’t support linking databases as it happens in the SQL Azure Gateway. 

If you want to query for a specific customer than you have to turn on FILTERING option and USE
FEDERATION statement will also change like below:

USE FEDERATION CustomerFederation (CustId=10) WITH RESET , FILTERING ON
GO

SELECT * FROM Customer 

SELECT * FROM CustomerAddress

Above query will just give you specific row with CustomerKey = 10 even though you select all from
federated tables, this is one of the beauty of federation and good example of atomic unit.

If you want to SPLIT federation you have to use following syntax:

USE FEDERATION ROOT WITH RESET
GO
ALTER FEDERATION CustomerFederation SPLIT AT (CustId = 100)
GO

BCP Syntax to load data in SQL Azure Federation

bcp.exe "<SysId of SQL Azure Federation Member>.<tablename>" in "<sourcefilepath>" -E -b 10000 -a 16384 -q -S <sql azure server name>.database.windows.net -U <SQL Azure User Name> -P <Password> -f "<format file path>" –E

Acknowledge

My sincere thanks to Bob Beauchemin, Cihan Biyikoglu, George Huey and msdn.




2 comments:

  1. are you sure that you have to have the federated column included as a part of every table's primary key? where did you get that info? that would mean that all tables except for maybe one have composite primary keys with the real primary key and then the tenantid

    ReplyDelete
  2. It is supported, but it will require you to use SQL Always ON Availability sets if you want to replicate VMM too.
    disaster recovery aws

    ReplyDelete