Thursday, 12 March 2015

Azure table storage vs Windows Azure SQL database

What is Windows Azure SQL database?

Think of Windows Azure SQL as the cloud evolution of your on premises relational database that supports standard relational principles and practices such as ACID transactions and server side execution through stored procedures.

When should I use SQL Azure? 

As a general rule use SQL Azure when you need structure, your database mightn't be over 500GB and you need to provide visualisation and BI reporting using out of the box reporting tools.

What is Azure table storage?

Azure table storage is Microsoft's cloud offering of a NoSQL database.

When should I use Azure table storage?

As a general rule use Azure table storage when you need to store significantly large data volumes, you need a flexible data schema and you need your data geo-replicated for additional durability in case of disaster.

Decision trees - Azure table storage Azure SQL or On Prem





Figure 1 - Choosing Between SQL Database and On-Premise SQL Server (source Technet)










Figure 2 - SQL Database Choices  (source Technet)



Figure 3 - Evaluating Compliance Factors Related to SQL Database vs. On-Premise Data Hosting (source Technet)

SQL Azure editions

*Note: The flow charts above are based on the Web and Business editions which will be  retired in September 2015 and replaced with Basic, Standard and Premium editions.

Service Tier/Performance Level DTU MAX DB Size Max Concurrent Requests Max Sessions Benchmark Transaction Rate Predictability
Basic
5
2 GB
30
300
16,600 transactions per hour
Good
Standard/S0
10
250 GB
60
600
521 transactions per minute
Better
Standard/S1
20
250 GB
90
900
934 transactions per minute
Better
Standard/S2
50
250 GB
120
1,200
2,570 transactions per minute
Better
Standard/S3*
100
250 GB
200
2,400
5,100 transactions per minute
Better
Premium/P1
100
500 GB
200
2,400
105 transactions per second
Best
Premium/P2
200
500 GB
400
4,800
228 transactions per second
Best
Premium/P3
800
500 GB
1,600
19,200
735 transactions per second
Best

References

https://msdn.microsoft.com/en-us/library/azure/jj553018.aspx

http://social.technet.microsoft.com/wiki/contents/articles/3398.windows-azure-sql-database-delivery-guide-for-business-continuity.aspx

https://msdn.microsoft.com/en-us/library/azure/ee336279.aspx

https://msdn.microsoft.com/en-us/library/azure/dn741330.aspx



Thursday, 5 March 2015

Azure Data factory

What is Azure data factory?

A simplistic view is that Azure data factory (ADF) is the cloud evolution of  SQL Server Integration Services (SSIS) - the tool traditionally used to perform Extract, Transform and Load (ETL) operations from hetergenous data sources into an Enterprise data warehouse that ships with the on-premises MS SQL server product. The ability to ETL is where the similarities end between SSIS and ADF. 

ADF allows for connection to unstructured, semi-structured data from cloud and on-premises sources. For example an unstructured cloud source could be Twitter, whilst a structured on-premises source could be SQL Server. ADF is known as a composition platform - it takes other services (storage, compute and movement services) and puts a management layer on top. ADF orchestrates and operationalises analytics and data movement services.




Azure Data Factory supports the following capabilities to enable simple consumption of the data produced:

  •     Easily move (one time or scheduled) the produced data assets to relational data marts for consumption using existing BI tools (Excel, Tableau, etc…).
  •     Consume data assets produced by a data factory directly using Power Query in Excel

Central Concepts of ADF

Pipelines

A pipeline is composed of a set of activities and data sets. Activities are processing steps that take one or more datasets as input and produce one or more data sets.  Typical data transformation activities in Azure Data Factory are performed through Hive, Pig and custom C# processing running on Hadoop (Azure HDInsight). Pipelines are defined with simple JSON scripting and activated via PowerShell commands. Unlike SSIS there is no fancy drag and drop rapid application development enviroment - at least not yet.

Useful skills for building pipelines

The Hub Container

A central concept to ADF is the Hub Container. Data storage and processing services are collected into a Hub container which facilitates and optimizes computation and storage activities, enables unified resource consumption management, and provides services for data movement as-needed.


Application Model

Timeslices

A slice is a logical time based partitioning of the data produced by one or more activity runs. A table in an Azure data factory is composed of slices over the time axis. The width of a slice is determined by the schedule – hourly/daily. IT Professionals can view all the downstream data slices for a given time interval and rerun a slice in case of a failure. This is a handy time saving feature when dealing with large volumes of data.

Monitoring ADF

The health of your data factories can be monitored through the Azure preview portal. Here you can see all the details of your factory including any linked services, datasets etc. The Azure preview portal also contains as the Azure Data factory editor - a lightweight which allows you to create, edit, and deploy JSON files of all Azure Data Factory entities. This enables you to create linked services, data sets, and pipelines by using the JSON templates that ship with the Data Factory service. If you are new PowerShell, this removes the need for installing and ramping up on Azure PowerShell to create Azure data factories.

Disposable compute services

One of many things you can do with ADF is to automate creating compute services on the fly for a particular operation and tear them down afterwards. So you can define a HDInsight cluster and when the pipeline runs it will put the cluster together on the fly. This reduces your overall operational costs if you only need the service for a short period.

The Obligatory Microsoft 50,000 foot view of ADF

Here's a short (very) high level clip of the Azure Data factory service.



References


http://azure.microsoft.com/en-us/documentation/articles/data-factory-introduction/

http://azure.microsoft.com/en-us/documentation/articles/data-factory-common-scenarios/

http://blogs.technet.com/b/dataplatforminsider/archive/2014/10/30/the-ins-and-outs-of-azure-data-factory-orchestration-and-management-of-diverse-data.aspx

https://github.com/Azure/azure-content/blob/master/articles/data-factory-editor.md

http://weblogs.asp.net/scottgu/azure-announcing-new-real-time-data-streaming-and-data-factory-services

http://www.bedreinnsikt.no/book/export/html/254