What is Federation?
- Federation is a database object like other objects such as tables, views, stored procedures or triggers.
- It is also known as Sharding.
- There can be multiple federations in a database.
- It is data dependent/data directed routing so it can scale out across databases.
- It is a feature that you can only do on Azure; it is not supported in regular SQL Server.
Advantage of Federation
- Elasticity
Ø Scale out when you need and scale back when you don’t need.
Ø Create an elastic database tier that can expand and
contract with your applications workload Ø without downtime!
- Unlimited Scalability
Ø Federations offer scale beyond the capacity limits of a
single SQL Azure database.
Ø It can horizontally scale out to many nodes.
Ø It can go beyond the limits of a single Azure database.
- Multi-Tenancy
Ø It is favorable for very small tenants and very large
tenants.
Ø Tenant management is very easy with federation.
Ø Tenant placement can be changed at any time without
downtime.
- Online Repartitioning
Ø Federation allows online repartitioning of data through
ALTER FEDERATION T-SQL commands.
Ø By repartitioning SPLIT operation data can be moved to new
federation members without downtime.
Federation and Related terms difference
Federation
& NoSQL - Federation and NOSQL are
different. It uses SQL within each federation member. It supports Parallel Queries
and Map Reduce patterns same as what NoSQL offers.
Federation
& Partition - Federation and partition
are similar but there exist a difference. In partitioning SQL Server gives us
an automatic way to query all the partition. In federation there is no
automatic way; Federation members don’t talk to each other.
Just take an example. Divide a big table into
2 tables and keeps them into a single database and query them separately that
is really a federation. It means 2 tables look like to be in separated databases
and they don’t talk to each other. You can’t do the join between 2 tables too.
Scenario where Federation can be used
- Web Scale
Database Solutions
Ø Gaming
Ø Social Networking sites
Ø Online electronic stores
Ø Travel reservation system
Ø NOSQL Applications
- Applications which
Ø Need big data
Ø Big scale
Ø Massive parallelism
Ø Eventual consistency & semi-structured data
- Software Vendors
Implementing Multi-tenant SaaS Solutions
- Very Huge Partitioned Data Mart
Federation Architecture & Design
Federation Root: It is
a database that contains the federation database object. It stores federation
metadata information. All application connects to federation root first. Root
database is the central repository for information about distribution of
scaled-out data.
Federation
Member: Federation use system managed SQL
Azure databases to achieve scale-out named federation members. Federation
members provide the computational and storage capacity for parts of the
federations workload and data. Collection of all federation members in a federation
represents the collection of all data in the federation. Federation members are
managed dynamically as data is repartitioned. Administrators decide how many
federation members are used at any point in time using federation
repartitioning operations.
Each federation member is a single database. The
CREATE FEDERATION process creates a new database based on the root
database. When you create a new federation
you have to give the name of the federation (not the name of the physical
database, which is a System-GUID), the name for the distribution key, and
distribution data type on which data will be federated on. The valid
distribution data types are int, bigint, uniqueidentifier and varbinary (up to 900 bytes).Varbinary can be used if you
want to define string or characters federation. You need to cast string to
varbinary and then you can create a string federated member. Currently only Range Distribution Type is supported.
Probably in next future release we expect Azure should support HASH Distribution Type also.
Federation
Distribution Key: Each federation has a
single key for federating or we can say partitioning data. In the federation
definition, the distribution key represented by 3 properties:
- A distribution key label that is used for referring to
the key,
- A data type to specify the valid data domain for the
distribution such as uniqueidentifier or bigint, and
- A distribution type to specify the method for
distributing the data such as ‘range’.
Federation Atomic
Unit: All data for a value of a
federation key stored in the same federation member. It represents all data
that belongs to a single instance of a federation key.
Federation design is de-normalized rather
than usual normalized design.
There are three main types of tables we have
to consider while designing federation:
Entity/Federated
Table: Refer to tables that contain data
that is distributed by the federation. Tables that will be split based on the
federation and their federation columns.
Ø
It is subset in each
federation member.
Ø
Must Use Entity Key
FEDERATED ON CLAUSE.
Ø
Federated by entity
key – must be part of a primary key.
Reference
table: Refer to tables that contain
reference information to optimize lookup queries in federations. Reference
tables are created in federation members and do not contain any FEDERATED ON
annotation.
Ø
Tables that will not
be split, but should be referred by the federated table.
Ø
These tables will be
copied across all federation members.
Ø
Exist in each
federation member in entity.
Example would be a Product table or other
dimension tables.
Important
Note -When you do a split of a federation
member than reference table will be replicated automatically to new federation
member. If you want to insert/update reference table than you have to do it
individually for each reference table in each federation member. There is no way to keep reference table in
sync automatically.
Global Central Table: Refer
to tables that are created in the federation root for typically low traffic or
heavily cached data that does not need to be scaled-out.
Ø
Tables that will not
be split, and very rarely join-select with the tables in federation members.
Ø
Exist in federation
root – Good for storing metadata.
Limitation of Federation
- Cross-database joins are not supported in SQL Azure and it is a typical behavior of SQL Azure.
- SQL Azure federations don’t support Identity on a column while creating new table.
- If you shard your data into N number of federation members and then decide you need to make a schema change, you’ll have to modify the schema on each federation member database.
- There should be a primary key on a table of a source database if you want to do migration to SQL Azure federation.
- Merge is not supported in SQL Azure federation currently; probably it should be accommodated in next future release. If you want to do merge than
- First Backup the federation member database somewhere on SQL Azure.
- Now drop the federation member.
- Now insert backup data into specific range.
- Sync Services of SQL Azure is not supported.
- Azure import-export is also not supported.
- SQL Server 2012 data tools (SSDT) is also not supported.
- Joins, grouping, aggregation should be done by application and to do the same fan out queries can be used.
- There is no automatic fan out query is currently supported in Azure.
- Federation keys are immutable.
Some Useful System Views
Standard system views use in day to day
federation operation
- sys.federations represent the federations in a database.
- sys.federation_distributions represent the definition of the federation distribution scheme for federations.
- sys.federation_members represent members of the federations.
- sys.federation_member_distributions provides the distribution details per federation distribution key per member.
Standard system views use in day to day to
manage federation
- sys.dm_federation_operations returns one row per SPLIT or DROP operation on a federation. Each row contains information on the progress and any error conditions for the operation.
- sys.dm_federation_operation_errors returns rows containing information on errors that occur during SPLIT or DROP federation operations.
- sys.dm_federation_operation_error_members returns the list of members involved in federation operations that failed due to errors.
Example
Connect to SQL Azure database through SSMS.
CREATE DATABASE FedDb
Above statement will create a federation root database.
Now connect to FedDB again using SSMS. You can’t switch to
FedDB directly so you need to connect again.
Now
you need to create a federation member with federation key. To create a
federation you have to give a
name
and federation key. Let us consider a scenario. Suppose you have a customer
table where
CustomerKey
is a primary key and table is very huge and you want to create a federation for
it. As per
federation
rule CustomerKey has to be a part of or
whole thing of a primary key in every
federated
tables.
So if
we want to create a customer federation than CustomerKey should be chosen a federation
key. Let us
assume
that we also have a CustomerAddress table and it has a CustomerKey as a part of
primary key.
Now assume
we have Order and and OrderDetail table also. Order table has a CustomerKey as
a part of
key, so it would
be fine again. OrdeDetails table hasn’t CustomerKey so in this case we have to
de-
normalize it and introduce
CustomerKey into it. Now assume you have a Product table that is nowhere
related with CustomerKey,
in this case product table will be created as a reference table.
CREATE FEDERATION
CustomerFederation(CustId BIGINT RANGE)
Above statement will create a Federation with name
CustomerFederation where CustId is a distribution key
with data type int and distribution type is range.
Before you create any table on federation member you have
to execute below:
USE FEDERATION
CustomerFederation (CustId=0) WITH RESET , FILTERING OFF
GO
After executing above you can to create a customer table
and syntax should be like below:
SET
ANSI_NULLS ON
GO
SET
QUOTED_IDENTIFIER ON
GO
CREATE
TABLE [dbo].[Customer](
[CustomerKey] [int] NOT NULL,
[FirstName] [nvarchar](50) NULL,
[MiddleName] [nvarchar](50) NULL,
[LastName] [nvarchar](50) NULL,
CONSTRAINT
[PK_Customer_CustomerKey] PRIMARY KEY CLUSTERED
(
[CustomerKey] ASC
)
)
FEDERATED ON (CustId = CustomerKey)
SET
ANSI_NULLS OFF
GO
SET
QUOTED_IDENTIFIER OFF
GO
Please take a note of highlighted part. Now if you want to
create a CustomerAddress table than syntax
should be like below:
SET
ANSI_NULLS ON
GO
SET
QUOTED_IDENTIFIER ON
GO
CREATE
TABLE [dbo].[CustomerAddress](
[CustomerKey] [int] NOT NULL,
[AddressKey] [int] NOT NULL,
CONSTRAINT
[PK_CustomerAddress_CustomerAddressKey] PRIMARY KEY
CLUSTERED
(
[CustomerKey] ASC,[AddressKey] ASC
)
)
FEDERATED ON (CustId=
CustomerKey)
SET
ANSI_NULLS OFF
GO
SET
QUOTED_IDENTIFIER OFF
GO
Assume you have data loaded into Customer and CustomerAddress
table by using some insert script.
To query federation data you should use below
USE FEDERATION
CustomerFederation (CustId=0) WITH RESET , FILTERING OFF
GO
SELECT * FROM Customer
Special Note: You connect to the root and from there you can connect to
multiple federation members
means multiple databases, so you don’t need a multiple
connection pool. Please remember that SQL Azure
doesn’t support linking databases as it happens in the SQL
Azure Gateway.
If you want to query for a specific customer than you have
to turn on FILTERING option and USE
FEDERATION statement will also change like below:
USE FEDERATION
CustomerFederation (CustId=10) WITH RESET , FILTERING ON
GO
SELECT * FROM Customer
SELECT * FROM
CustomerAddress
Above query will just give you specific row with CustomerKey
= 10 even though you select all from
federated tables, this is one of the beauty of federation
and good example of atomic unit.
If you want to SPLIT federation you have to use following
syntax:
USE FEDERATION
ROOT WITH RESET
GO
ALTER FEDERATION
CustomerFederation SPLIT AT (CustId = 100)
GO
BCP
Syntax to load data in SQL Azure Federation
bcp.exe
"<SysId of SQL Azure Federation Member>.<tablename>" in
"<sourcefilepath>" -E -b 10000 -a 16384 -q -S <sql azure
server name>.database.windows.net -U <SQL Azure User Name> -P
<Password> -f "<format file path>" –E
Acknowledge
My sincere thanks to Bob Beauchemin, Cihan Biyikoglu, George Huey and msdn.