Wednesday, May 21, 2014

Introduction to Business Intelligence Markup Language (BIML) for SSIS

Problem


With Microsoft SQL Server Integration Services (SSIS), you can build powerful and flexible packages to manage your enterprise-wide ETL solutions. However, every ETL project has some repetitive tasks: you need to import 20 different flat file exports from the ERP system, you need to load 15 dimensions into the data warehouse and so on. With SSIS out-of-the-box, you are forced to create multiple individual packages which is very time consuming due to the lack of easy code reuse. In a typical BI project, the ETL implementation can easily take up 60% of the project time, so it is beneficial if the development time of similar SSIS packages can be cut down.

Solution


BIML, which stands for Business Intelligence Markup Language, offers us a solution. With BIML you can easily generate SSIS packages based on metadata, allowing us to effectively apply code reuse, templates and patterns in our ETL solutions. This tip will give a brief introduction to BIML with a small example, but later tips will delve into the powerful capabilities of BIML.

What is BIML?


BIML is a dialect of XML and can be used to specify business intelligence and data warehouse solutions. It is 100% compatible with SSIS and SSAS. When you generate for example an SSIS package, you can further edit it in Visual Studio, as if you created the package manually.

The language is developed by Varigence, but an open-source implementation is available in BIDS Helper, a free add-on for Visual Studio. This means you can easily develop a BIML script inside BIDS and generate SSIS packages directly into your project.

BIML supports SSIS 2005, 2008(R2) and 2012, although not every feature of the project deployment model is currently implemented in BIDS Helper.

How do I start using BIML?


As mentioned in the previous paragraph, you simply install BIDS Helper on your machine, if you haven't already. To add a new BIML file to your project, simply right-click on the project or the SSIS packages node in the solution explorer. In the context menu, choose Add New BIML File.
A file called BIMLScript.biml will be added to the Miscellaneous folder. This BIML file has an empty BIML root element to start with.
As you can see in the screenshot above, Visual Studio offers color coding for the BIML, as it is just XML. BIDS Helper also offers some basic intellisence features:
When you type "<", you'll get a list of all the possible child elements for the current element. BIDS Helper will insert the corresponding closing tag when you close an element. When typing inside an element, you'll get a list of the attributes you can configure.
The intellisense reports XML violations - such as not closing an element - but also when you forget to configure required attributes.

The BIML can be checked for errors by right-clicking the file and choosing Check BIML for Errors. The compiler doesn't always give the most straight forward error messages, so you might want the check for errors often when developing large BIML files.

When you choose Generate SSIS Packages, BIML will check for errors by validating the XML and BIML syntax, but also by validating the resulting SSIS packages. For example, if a destination table used in a data flow does not already exist, an error will be thrown. Of course, packages are not generated when any error is found during compilation.


When packages have been generated successfully, they will be added to the SSIS Packages node in the project.

A small BIML example


Let's take a look at a small sample BIML file to get a feeling on how BIML works. This example will create a SSIS package with two Execute SQL Tasks. The first task will create a table if it does not exist yet, the second task will insert a row in the same table.

First we need to create the connection managers. The following code creates an OLE DB connection manager called OLE_BIML to the database named BIML on the localhost server.
========================================================================
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
  <Connections >
    <Connection Name ="OLE_BIML"
                ConnectionString="Data Source=.;Initial Catalog=BIML;
                   Provider=SQLNCLI11.1;Integrated Security=SSPI;Auto Translate=False;"/>
  </Connections>
========================================================================
Connections are defined outside any package. Once you refer to a connection in a package, it is added to this package when it is generated by BIML. The next step is to add a package to the BIML script. Small remark: if you are working with SSIS 2005 or 2008, you might want to change the version number of the native client in the connection string. For example, SQLNCLI10.1 for SSIS 2008.

The following snippet will create an empty package called HelloWorld.
========================================================================
<Packages>
    <Package Name="HelloWorld" ConstraintMode="Linear">
    </Package>
</Packages>
========================================================================
The constraint mode defines the default behavior of tasks inside the package. When Parallel is specified, no precedence constraints are defined between tasks. With the Linear constraint mode, each task is connected with a Success precedence constraint to the previous task, following the order of specification in the BIML file.

Let's add the two Execute SQL Tasks to our package. This code will add the first Execute SQL Task, which will check the existence of the destination table and create it if necessary.
========================================================================
<Tasks>
   <ExecuteSQL Name="(SQL) Create or Drop Table" ConnectionName="OLE_BIML">
      <DirectInput>
            IF EXISTS (SELECT * FROM sys.objects WHERE object_id = 
                       OBJECT_ID(N'[dbo].[HelloWorld]') AND type IN (N'U'))
                   DROP TABLE [dbo].[HelloWorld];
            GO
            IF NOT EXISTS (SELECT * FROM sys.objects WHERE object_id = 
                       OBJECT_ID(N'[dbo].[HelloWorld]') AND type IN (N'U'))
            BEGIN
                   CREATE TABLE [dbo].[HelloWorld](
                          [ID] [int] IDENTITY(1,1) NOT NULL,
                          [Message] [varchar](50) NOT NULL,
                          [Timestamp] [datetime2](3) NOT NULL
                   );
            END
            GO
      </DirectInput>
   </ExecuteSQL>
========================================================================
The Execute SQL Task has two attributes defined: its name and the connection it will use. The task has a child element DirectInput, specifying the query issued against the database. When specifying a query, watch out for special XML characters, such as double quote or the less than sign. You need to replace these with their escaped counterparts. For example, ampersand & becomes &amp;.

The second Execute SQL Task is created with similar code:
========================================================================
<ExecuteSQL Name="(SQL) Insert Hello World" ConnectionName="OLE_BIML">
      <DirectInput>
     INSERT INTO [dbo].[HelloWorld] VALUES ('Hello World!',SYSDATETIME());
      </DirectInput>
   </ExecuteSQL>
</Tasks>
========================================================================
And that's all the code we need to generate our package. The full script - with comments added - looks like this:
========================================================================
<!-- BIML script created by Koen Verbeeck -->
<!-- This BIML code will generate a simple SSIS package with two Execute SQL Tasks -->
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
  <!-- First define the connection. This is defined outside the packages.
       Add CreateInProject="true" for project connection managers.-->
  <Connections >
    <Connection Name ="OLE_BIML"
                ConnectionString="Data Source=.;Initial Catalog=BIML;
                Provider=SQLNCLI11.1;Integrated Security=SSPI;Auto Translate=False;"/>
  </Connections>
  <Packages>
    <!-- Linear automatically connects all tasks in order of creation.
         Parallel doesn't create precedence constraints. -->
    <!-- Using SSIS 2012, projects also have protection levels.
         Those of the packages must match the protection level of the project. -->
    <!-- DontSaveSensitive is the default package protection level (as it should be). -->
    <Package Name="HelloWorld" ConstraintMode="Linear">
      <Tasks>
        <!-- When a connection is used in a task, the connection manager is added
             to the package. -->
        <ExecuteSQL Name="(SQL) Create or Drop Table" ConnectionName="OLE_BIML">
          <DirectInput>
            IF EXISTS (SELECT * FROM sys.objects WHERE object_id = 
                           OBJECT_ID(N'[dbo].[HelloWorld]') AND type IN (N'U'))
                   DROP TABLE [dbo].[HelloWorld];
            GO
            IF NOT EXISTS (SELECT * FROM sys.objects WHERE object_id =
                            OBJECT_ID(N'[dbo].[HelloWorld]') AND type IN (N'U'))
            BEGIN
                   CREATE TABLE [dbo].[HelloWorld](
                          [ID] [int] IDENTITY(1,1) NOT NULL,
                          [Message] [varchar](50) NOT NULL,
                          [Timestamp] [datetime2](3) NOT NULL
                   );
            END
            GO
          </DirectInput>
        </ExecuteSQL>
        <ExecuteSQL Name="(SQL) Insert Hello World" ConnectionName="OLE_BIML">
          <DirectInput>
            INSERT INTO [dbo].[HelloWorld] VALUES ('Hello World!',SYSDATETIME());
          </DirectInput>
        </ExecuteSQL>
      </Tasks>
    </Package>
  </Packages>
</Biml>
========================================================================
After generating the package, we can admire the result:
The package can now be edited like any other SSIS package. You can validate the package by running it and checking the result in SQL Server.

Conclusion


This tip gave a basic introduction to the language BIML, which you can use to generate SSIS packages. Next tips will dig deeper into BIML to expose its powerful capabilities and will show you how you can use metadata to generate SSIS packages on the fly.

Try BIML out for yourself. You can use the code used in this tip to create your own simple BIML script. The script uses a database called BIML, so you either can create such a database or change the connection string to work with another database.


No comments:

Post a Comment