Microsoft will soon offer three
additional ways for enterprises to store data on Azure, making the cloud
computing platform more supportive of big data analysis.
Azure will have a data warehouse
service, a "data lake" service storing large amounts of data, and an
option for running "elastic" databases that can store sets of data
that vary greatly in size, explained Scott Guthrie, Microsoft executive vice
president of the cloud and enterprise group, who unveiled these new services at
the company's Build 2015 developer conference, held this week in San Francisco.
The Azure SQL Data Warehouse,
available later this year, will give organizations a way to store petabytes of
data so it can be easily ingested by data analysis software, such as the
company's Power BI tool for data visualization, the Azure Data Factory for data
orchestration, or the Azure Machine Learning service.
Unlike traditional in-house data
warehouse systems, this cloud service can quickly be adjusted to fit the amount
of data that actually needs to be stored, Guthrie said. Users can also specify
the exact amount of processing power they'll need to analyze the data. The
service builds on the massively parallel processing architecture that Microsoft
developed for its SQL Server database.
The Azure Data Lake has been designed
for those organizations that need to store very large amounts of data, so it
can be processed by Hadoop and other "big data" analysis platforms.
This service could be most useful for Internet of Things-based systems that may
amass large amounts of sensor data.
"It allows you to store
literally an infinite amount of data, and it allows you to keep data in its
original form," Guthrie said. The Data Lake uses Hadoop Distributed File
System (HDFS), so it can be deployed by Hadoop or other big data analysis
systems.
A preview of the Azure Data Lake will
be available later this year.
In addition to these two new
products, the company has also updated its Azure SQL Database service so
customers can pool their Azure cloud databases to reduce storage costs andprepare for bursts of database activity.
"It allows you to manage lots of
databases at lower cost," Guthrie said. "You can maintain completely
isolated databases, but allows you to aggregate all of the resources necessary
to run those databases."
The new service would be particularly
useful for running public-facing software services, where the amount of
database storage needed can greatly fluctuate. Today, most
Software-as-a-Service (SaaS) offerings must over-provision their databases to
accommodate the potential peak demand, which can be financially wasteful. The
elastic option allows an organization to pool the available storage space for
all of its databases in such a way that if one database rapidly grows, it can
pull unused space from other databases.
The new elastic pooling feature is
now available in preview mode.
Microsoft Azure's new Data Lake
architecture.