Azure Synapse Analytics is the unified analytics service from Microsoft that brings together data warehousing and Big Data analytics. It’s a bit like a Swiss Army knife for data, letting you ingest, prepare, manage, and serve data for business intelligence and machine learning needs.

Let’s see it in action. Imagine you’ve got data in Azure Data Lake Storage Gen2 and you want to run some SQL queries on it. Here’s how you’d set up a Synapse workspace and connect it to your data.

First, you’ll need an Azure subscription. Once you’re in the Azure portal, search for "Azure Synapse Analytics" and click "Create."

You’ll need to pick a subscription and resource group. Let’s say you’re creating a new resource group called synapse-rg.

For the workspace name, something descriptive like my-synapse-workspace works well.

You’ll also need to select a region, let’s go with East US.

The next crucial part is configuring the Data Lake Storage Gen2 account. You can create a new one or select an existing one. For this example, let’s create a new one named mydatalakestoragegen2. Make sure "Hierarchical namespace" is enabled – this is essential for Data Lake Storage Gen2. Then, assign a role to your Synapse workspace’s managed identity for access. The role you need is Storage Blob Data Contributor.

For networking, you can choose to enable managed virtual network for enhanced security. This means your workspace will be isolated within its own virtual network.

You’ll also need to set up a SQL administrator login and password. Remember these credentials; you’ll use them to connect to your dedicated SQL pools.

Once the workspace is deployed, navigate to it. You’ll see a "Launch Synapse Studio" button. Click that to open the Synapse Studio, which is the web-based interface for managing your analytics.

In Synapse Studio, you can see different areas: Data, Develop, Integrate, Monitor, and Manage.

Under the "Data" hub, you’ll see your linked services. You’ll notice your Data Lake Storage Gen2 account is already linked. You can also link to other data sources like Azure SQL Database, Cosmos DB, and more.

Let’s create a dedicated SQL pool. Go to the "Manage" hub, then "SQL pools," and click "+ New." Give it a name like my-dedicated-sql-pool. For performance, you’ll select a Data Warehousing Unit (DWU) level. Let’s start with DW100c. This is where you’ll run your traditional data warehousing workloads.

Now, let’s go to the "Develop" hub. Here, you can create SQL scripts. Let’s write a simple query to query data from your Data Lake. First, you need to create an external table that points to data in your Data Lake. Assuming you have a CSV file at container/folder/mydata.csv in your Data Lake Storage, you’d write something like this:

CREATE EXTERNAL TABLE [dbo].[MyExternalTable] (
    [col1] [varchar](100) NULL,
    [col2] [int] NULL
)
WITH (
    LOCATION = 'container/folder/mydata.csv',
    DATA_SOURCE = [mydatalakestoragegen2], -- This is the linked service name for your ADLS Gen2
    FILE_FORMAT = 'CSV'
);

Then, you can query this external table using your dedicated SQL pool:

SELECT * FROM [dbo].[MyExternalTable];

This demonstrates how Synapse bridges the gap between data lake and data warehouse. You can also create Spark pools for big data processing using Apache Spark. Go to "Manage," then "Apache Spark pools," and click "+ New." Name it my-spark-pool and set the Node size to Small, with 3 nodes.

The most surprising true thing about Synapse is that the "serverless" SQL pools are actually running on a managed Spark infrastructure behind the scenes, allowing you to query data directly from your data lake without provisioning dedicated resources for that specific task. You can query your CSV, Parquet, or JSON files directly using SQL syntax.

The real power comes from integrating these different components. You can build data pipelines in the "Integrate" hub that copy data from various sources into your data lake or dedicated SQL pool, or trigger Spark jobs for complex transformations.

The one thing most people don’t know is that when you use serverless SQL pools to query data lake files, Synapse is actually executing these queries using Spark engines. It dynamically provisions and deallocates Spark resources to handle your SQL requests, which is why it’s "serverless" from your perspective – you don’t manage the underlying compute. This allows for incredibly flexible and cost-effective data exploration directly on your raw data.

With your workspace and a SQL pool set up, the next step is usually to explore data governance and security features, like creating role-based access controls for your data.

Want structured learning?

Take the full Azure course →