Problem
With
Microsoft SQL Server Integration Services (SSIS), you can build powerful and
flexible packages to manage your enterprise-wide ETL solutions. However, every
ETL project has some repetitive tasks: you need to import 20 different flat
file exports from the ERP system, you need to load 15 dimensions into the data
warehouse and so on. With SSIS out-of-the-box, you are forced to create
multiple individual packages which is very time consuming due to the lack of
easy code reuse. In a typical BI project, the ETL implementation can easily
take up 60% of the project time, so it is beneficial if the development time of
similar SSIS packages can be cut down.
Solution
BIML,
which stands for Business Intelligence Markup Language, offers us a solution.
With BIML you can easily generate SSIS packages based on metadata, allowing us
to effectively apply code reuse, templates and patterns in our ETL solutions.
This tip will give a brief introduction to BIML with a small example, but later
tips will delve into the powerful capabilities of BIML.
What is BIML?
BIML is a dialect of XML and can be used to specify business
intelligence and data warehouse solutions. It is 100% compatible with SSIS and SSAS.
When you generate for example an SSIS package, you can further edit it in
Visual Studio, as if you created the package manually.
The language is developed by Varigence, but an open-source
implementation is available in BIDS Helper, a free add-on for Visual Studio.
This means you can easily develop a BIML script inside BIDS and generate SSIS
packages directly into your project.
BIML supports SSIS 2005, 2008(R2) and 2012, although not every feature
of the project deployment model is currently implemented in BIDS Helper.
How do I start using BIML?
As
mentioned in the previous paragraph, you simply install BIDS Helper on your
machine, if you haven't already. To add a new BIML file to your project, simply
right-click on the project or the SSIS packages node in the solution explorer.
In the context menu, choose Add New BIML File.
A
file called BIMLScript.biml will be added to the Miscellaneous folder. This
BIML file has an empty BIML root element to start with.
As you can see in the screenshot above, Visual Studio offers color
coding for the BIML, as it is just XML. BIDS Helper also offers some basic intellisence
features:
When
you type "<", you'll get a list of all the possible child elements
for the current element. BIDS Helper will insert the corresponding closing tag
when you close an element. When typing inside an element, you'll get a list of
the attributes you can configure.
The
intellisense reports XML violations - such as not closing an element - but also
when you forget to configure required attributes.
The
BIML can be checked for errors by right-clicking the file and choosing Check
BIML for Errors. The compiler doesn't always give the most straight forward
error messages, so you might want the check for errors often when developing
large BIML files.
When
you choose Generate SSIS Packages, BIML will check for errors by validating the
XML and BIML syntax, but also by validating the resulting SSIS packages. For
example, if a destination table used in a data flow does not already exist, an
error will be thrown. Of course, packages are not generated when any error is
found during compilation.
When
packages have been generated successfully, they will be added to the SSIS
Packages node in the project.
A small BIML example
Let's take
a look at a small sample BIML file to get a feeling on how BIML works. This
example will create a SSIS package with two Execute SQL Tasks. The first task
will create a table if it does not exist yet, the second task will insert a row
in the same table.
First we need
to create the connection managers. The following code creates an OLE DB
connection manager called OLE_BIML to the database named BIML on the localhost
server.
========================================================================
<Biml xmlns="http://schemas.varigence.com/biml.xsd"> <Connections > <Connection Name ="OLE_BIML" ConnectionString="Data Source=.;Initial Catalog=BIML; Provider=SQLNCLI11.1;Integrated Security=SSPI;Auto Translate=False;"/> </Connections>
========================================================================
Connections
are defined outside any package. Once you refer to a connection in a package,
it is added to this package when it is generated by BIML. The next step is to
add a package to the BIML script. Small remark: if you are working with SSIS
2005 or 2008, you might want to change the version number of the native client
in the connection string. For example, SQLNCLI10.1 for SSIS 2008.
The
following snippet will create an empty package called HelloWorld.
========================================================================
<Packages> <Package Name="HelloWorld" ConstraintMode="Linear"> </Package> </Packages>
========================================================================
The
constraint mode defines the default behavior of tasks inside the package. When
Parallel is specified, no precedence constraints are defined between tasks.
With the Linear constraint mode, each task is connected with a Success
precedence constraint to the previous task, following the order of
specification in the BIML file.
Let's add
the two Execute SQL Tasks to our package. This code will add the first Execute
SQL Task, which will check the existence of the destination table and create it
if necessary.
========================================================================
<Tasks> <ExecuteSQL Name="(SQL) Create or Drop Table" ConnectionName="OLE_BIML"> <DirectInput> IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[HelloWorld]') AND type IN (N'U')) DROP TABLE [dbo].[HelloWorld]; GO IF NOT EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[HelloWorld]') AND type IN (N'U')) BEGIN CREATE TABLE [dbo].[HelloWorld]( [ID] [int] IDENTITY(1,1) NOT NULL, [Message] [varchar](50) NOT NULL, [Timestamp] [datetime2](3) NOT NULL ); END GO </DirectInput> </ExecuteSQL>
========================================================================
The Execute
SQL Task has two attributes defined: its name and the connection it will use.
The task has a child element DirectInput, specifying the query issued against
the database. When specifying a query, watch out for special XML characters,
such as double quote or the less than sign. You need to replace these with
their escaped counterparts. For example, ampersand & becomes &.
The second
Execute SQL Task is created with similar code:
========================================================================
<ExecuteSQL Name="(SQL) Insert Hello World" ConnectionName="OLE_BIML"> <DirectInput> INSERT INTO [dbo].[HelloWorld] VALUES ('Hello World!',SYSDATETIME()); </DirectInput> </ExecuteSQL> </Tasks>
========================================================================
And that's
all the code we need to generate our package. The full script - with comments
added - looks like this:
========================================================================
<!-- BIML script created by Koen Verbeeck --> <!-- This BIML code will generate a simple SSIS package with two Execute SQL Tasks --> <Biml xmlns="http://schemas.varigence.com/biml.xsd"> <!-- First define the connection. This is defined outside the packages. Add CreateInProject="true" for project connection managers.--> <Connections > <Connection Name ="OLE_BIML" ConnectionString="Data Source=.;Initial Catalog=BIML; Provider=SQLNCLI11.1;Integrated Security=SSPI;Auto Translate=False;"/> </Connections> <Packages> <!-- Linear automatically connects all tasks in order of creation. Parallel doesn't create precedence constraints. --> <!-- Using SSIS 2012, projects also have protection levels. Those of the packages must match the protection level of the project. --> <!-- DontSaveSensitive is the default package protection level (as it should be). --> <Package Name="HelloWorld" ConstraintMode="Linear"> <Tasks> <!-- When a connection is used in a task, the connection manager is added to the package. --> <ExecuteSQL Name="(SQL) Create or Drop Table" ConnectionName="OLE_BIML"> <DirectInput> IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[HelloWorld]') AND type IN (N'U')) DROP TABLE [dbo].[HelloWorld]; GO IF NOT EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[HelloWorld]') AND type IN (N'U')) BEGIN CREATE TABLE [dbo].[HelloWorld]( [ID] [int] IDENTITY(1,1) NOT NULL, [Message] [varchar](50) NOT NULL, [Timestamp] [datetime2](3) NOT NULL ); END GO </DirectInput> </ExecuteSQL> <ExecuteSQL Name="(SQL) Insert Hello World" ConnectionName="OLE_BIML"> <DirectInput> INSERT INTO [dbo].[HelloWorld] VALUES ('Hello World!',SYSDATETIME()); </DirectInput> </ExecuteSQL> </Tasks> </Package> </Packages> </Biml>
========================================================================
After
generating the package, we can admire the result:
The package
can now be edited like any other SSIS package. You can validate the package by
running it and checking the result in SQL Server.
Conclusion
This tip
gave a basic introduction to the language BIML, which you can use to generate
SSIS packages. Next tips will dig deeper into BIML to expose its powerful
capabilities and will show you how you can use metadata to generate SSIS
packages on the fly.
Try BIML
out for yourself. You can use the code used in this tip to create your own
simple BIML script. The script uses a database called BIML, so you either can
create such a database or change the connection string to work with another
database.