Latest Blog Posts

heavy-etl-on-a-budget-using-shortcuts-to-keep-data-accessible-after-pausing-your-high-power-capacity
Heavy ETL on a Budget: Using Shortcuts to Keep Data Accessible After Pausing Your High-Power Capacity
The real problem: compute costs do not sleep Fabric’s pricing model is capacity based. That means compute is your biggest lever for cost control. Typical enterprise pattern looks like: Yet many teams keep the engineering capacity running 24/7 and so the obvious question teams ask is: if we pause a capacity, does access to the […]

Read post
announcement
Announcement
Update: I’m moving back! Following some unresolvable hosting issues at vinsdata.in, I have decided to return to this blog site for now. Stay tuned for new posts and data platform insights right here. This blogsite has been moved to new address. All the new blogs & articles going forward will be published directly in the […]

Read post
mirrored-sql-databases-in-microsoft-fabric
Mirrored SQL Databases in Microsoft Fabric
Microsoft Fabric introduces a seamless way to bring your existing Azure SQL Database or SQL Server data into the unified Fabric platform through Mirrored SQL Databases. This capability enables realtime analytics and integration with Fabric services without the need for complex ETL pipelines or data duplication. What is a Mirrored SQL Database? A mirrored database […]

Read post
azure-private-link-for-microsoft-fabric
Azure Private Link for Microsoft Fabric
Azure Private Link enhances Azure’s connectivity by allowing secure, private access to Azure-hosted services. It enables organizations to create private connections between their on-prem environments and the Azure services, ensuring data remains secure and isolated from the public internet. It confines interactions to the Azure network, avoiding exposure to the public internet. Private link’s integration […]

Read post
overview-on-implementing-a-data-mesh-architecture-in-microsoft-fabric
Overview on Implementing a Data Mesh Architecture in Microsoft Fabric
In an era where data is pivotal, traditional centralized data systems often fall short of meeting the needs for scalability, agility, and decentralization. The concept of Data Mesh emerges as a contemporary solution, advocating for decentralized data ownership and treating data as a product. This article will delve into the principles of Data Mesh and illustrate how […]

Read post
microsoft-fabric-create-and-load-your-data-into-the-lakehouse-table
Build your first LakeHouse in Microsoft fabric
What is a LakeHouse? A Lakehouse is a data management architecture that is a combination of both data lakes and data warehouses. Before we jump into the definition of Lakehouse it is important to better understand the difference between Datalake and DataWarehouse individually. A Lakehouse allows you to use both the Datalake and DataWarehouse together […]

Read post
shortcuts-feature-in-microsoft-fabric
‘Shortcuts’ feature in Microsoft Fabric
Microsoft Fabric is a powerful and flexible data platform that enables organizations to process, analyze, and gain insights from their data at scale. One of the key features that sets Fabric apart is its ability to perform in-place IO on files using shortcuts, without the need for physical data movement or duplication. In this article, […]

Read post
deep-dive-on-shortcuts-feature-in-microsoft-fabric
Deep dive on ‘Shortcuts’ feature in Microsoft Fabric
Read post
mirroring-in-microsoft-fabric-enhancing-data-reliability-and-availability
Mirroring in Microsoft Fabric: Enhancing Data Reliability and Availability
In the realm of cloud computing and data management, Microsoft Fabric stands out as a versatile platform that provides a comprehensive suite of tools for data integration, storage, analysis, and management. One of the critical features in Microsoft Fabric is the “mirroring” capability, designed to enhance data reliability and availability, ensuring that businesses can maintain […]

Read post
unleash-the-power-of-data-activator-feature-in-microsoft-fabric
Unleash the Power of Data Activator feature in Microsoft Fabric
Microsoft Fabric offers a powerful feature called Data Activator that allows users to seamlessly integrate and activate their data for enhanced insights and decision-making. This feature enables users to connect various data sources, such as databases, APIs, and cloud services, to Fabric’s analytics platform. With Data Activator, users can easily transform raw data into actionable […]

Read post
microsoft-fabric-terminologies
Microsoft Fabric Terminologies
Following are the basic terminologies that are used inside Microsoft Fabric ecosystem. This has been referenced from official Fabric documentation to serve as a repo for all our future articles in Fabric. Generic Terms Synapse Data Engineering Data Factory Synapse Data Science Synapse data warehousing Synapse Real-Time Analytics OneLake Shortcut: Shortcuts are embedded references within OneLake […]

Read post
powerbi-vs-cleanlab-studio-the-best-tool-for-your-data-cleansing-needs
PowerBI Vs Cleanlab Studio: The Best Tool for your Data Cleansing needs
Dataset cleansing is an essential step in data analysis as it ensures your dataset’s accuracy and consistency and helps remove its inconsistencies and errors. Using the dataset without proper data cleansing activity will result in improper value and wrong insights into the organizations’ data-driven decisions. In this blog post, we will see how to get […]

Read post
why-should-you-migrate-from-azure-synapse-analytics-to-microsoft-fabric
Why should you Migrate from Azure Synapse Analytics to Microsoft Fabric
Microsoft Fabric is a cloud-based data platform that provides a range of services for data engineering, data science, and business intelligence. It is an extension of Azure Synapse Analytics that integrates all analytics workloads from the data engineer to the business knowledge worker. Fabric brings together Power BI, Data Factory, and the Data Lake, on […]

Read post
refreshing-the-built-in-roles-available-in-the-azure-synapse-analytics
Reviewing the Built-in Roles available in the Azure Synapse Analytics
Azure Synapse Analytics has many built-in roles that will help to manage access to Synapse resources. These roles allow you to control what users and applications can do within a Synapse workspace. Synapse RBAC Roles can be assigned by Synapse Administrators. A workspace-level Synapse Administrator can grant access to any workspace. A lower-level Synapse administrator […]

Read post
azure-synapse-workload-management
Workload Management in Azure Synapse Analytics
Managing varied workloads with proper resource allocation for multiple concurrent user environment is the biggest challenge a team might face when retrieving data from an azure synapse analytics dedicated sql pool db. Workload management in azure synapse analytics gives you access to control your workload that are utilizing your system resources. Setting up the best […]

Read post
dwhdata-warehouse-units-in-synapse-dedicated-pool
DWUs(Data Warehouse Units) in Synapse Dedicated Pool
Basically, there are two types of pools in Azure synapse analytics: Serverless SQL Pool and Dedicated SQL Pool. In serverless model as you might be aware that the costing is based on pay-per-usage model and calculated per TB or processing consumed on the queries that are run. Whereas the costing of Dedicated SQL pools is […]

Read post
implementing-change-data-capture-in-azure-data-factory
Implementing Change Data Capture in Azure Data Factory
Change Data Capture (CDC): For any ETL requirement that involves huge amount of data, most of the problem is solved when you eliminate repeated or redundant process in your data storage mechanism. Basically, you should not repeat the work to copy or move the data that you have it already in your destination datastore. Hence […]

Read post
change-data-capture-in-azure-synapse-analytics
Change Data Capture In Azure Synapse Analytics & Data Factory
What is Change Data Capture? In data terminology Change Data Capture or simply called CDC is a method to track and pick only the data that has been changed from the last known point of time. CDC is a feature that was already available in the SQL Server for finding the changed records in a […]

Read post
adf-delete-files-from-azure-storage-based-on-column-value-in-excel
ADF | Delete files from Azure storage based on column value in Excel
In this article we are going to discuss about how to pick and delete only specific files from the ADLS storage container by passing filenames taken from a excel/csv file column value. File deletion: Recently I came across a requirement for file deletion in ADLS. Azure Data Factory’s delete activity is enough to complete this […]

Read post
monitoring-azure-synapse-analytics-workloads-using-dmvs
Monitoring Azure Synapse Analytics Workloads Using DMVs
Introduction In this article we will look at Dynamic Management Views and how can we leverage them to monitor the workloads in an azure synapse analytics workload. We will learn this today with a practical use case and few examples focussing on synapse workload monitoring. Dynamic Management Views Dynamic Management View or simply called DMVs are nothing […]

Read post
configure-adf-pipeline-output-to-a-file
Configure ADF Pipeline Output to a File
At an enterprise level, every project schedules and runs multiple Azure Data Factory pipelines but tracking their outcomes in ADF studio is a cumbersome process. There are companies who after for every failed pipeline activity with some error, they must track them down by drilling down each activity until they find the failed one and […]

Read post
azure-synapse-security-static-data-masking
Azure Synapse Security- Static Data Masking
Data security is hot topic given the data breach we hear about it every day. Though there are various specialized tools available in the market, multiple questions arise on their accessibility, Sharing and data transfers within the organization. Mostly in an organization there might be need to refresh(copy) production sensitive data to multiple nonproduction environments […]

Read post
azure-synapse-security-dynamic-data-masking
Azure Synapse Security- Dynamic Data Masking
Dynamic data masking is a feature that is available in Synapse analytics to restrict the exposure of sensitive data to the end users. We can configure data masking to hide sensitive data in the result sets that are queries by the users. Using data masking we can not only restrict also specify the amount of […]

Read post
azure-synapse-analytics-link-for-sql-step-by-step-approach
Azure Synapse Analytics Link for SQL –Step by Step approach
This article provides a step-by-step guide for getting started with Azure Synapse Link for Azure SQL Database. I strongly recommend you go through my previous article which explains the basics of Synapse Link for SQL before proceeding with this (creating it) for better understanding. Configure Source Azure SQL Database Create a linked service to your […]

Read post
azure-synapse-link-for-sql
The Basics of Azure Synapse Link for SQL
The newly released feature ‘Synapse link for SQL’, enables near real-time analytics into Azure Synapse analytics over operational data from both Azure SQL and SQL Server 2022. It provides seamless integration between the SQL database and Azure Synapse analytics. The rich feature it provides enables users to run analytics, machine learning or BI workloads on […]

Read post
monitor-azure-synapse-analytics-using-log-analytics
Monitor Azure Synapse Analytics using Log Analytics
The log analytics will monitor the synapse pipelines and provide us more insights once if the job fails. The Azure Synapse integration with Log Analytics is particularly useful in the following scenarios: You want to write complex queries on a rich set of metrics that are published by Azure Synapse to Log Analytics. Custom alerts […]

Read post
lake-database-in-azure-synapse-analytics
Lake Database in Azure Synapse Analytics
Introduction: Azure synapse analytics provides standard database templates for various industries to use and create DB model as per their company needs. These are readymade templates which can be created with rich metadata for a clear understanding that can be implemented anytime with fewer steps. Database templates are in simple terms, business and technical data […]

Read post
monitor-azure-synapse-analytics-using-azure-monitor
Create a Copy of Azure Data Factory using Azure ARM Templates
Introduction: In day to day operations we must have faced requirements to backup and restore or copy an Azure data factory from existing to new ones. In todays demo we will see how can we backup and restore the Azure data factory using ARM templates export/import option in azure data factory studio. Steps: I will […]

Read post
pause-dedicated-sql-pools-with-azure-synapse-pipelines
Pause dedicated SQL pools with Azure Synapse Pipelines
Introduction: One of the main objective of any business that is using cloud services is to optimize resources and lower the on-going costs. Most of the organizations done need access to the data warehouse layer round the clock and they will be using reporting dashboards to view the information. In such scenarios it is best […]

Read post
parameterization-using-notebooks-in-azure-synapse-analytics
Parameterization using Notebooks in Azure Synapse Analytics
Introduction: Parameterization is very useful when you want a reusable code that you can use forever and get the output by executing it only by changing the parameter for all your future requirements. Traditionally while coding you will declare variables which are static(see image below) but with parameterization you can use dynamic parameters all through […]

Read post
create-synapse-notebook-and-run-python-and-sql-under-spark-pool
Create Synapse Notebook and run Python and SQL under Spark Pool
In this article we will look into how could we run both Python and SparkSQL queries in a single notebook workspace under the built-in Apache Spark Pools to transform the data in a single window. Introduction:In Azure synapse analytics, a notebook is where you can write live code, visualize and also comment text on them. […]

Read post
extract-file-names-and-copy-from-source-path-in-azure-data-factory
Extract file names and copy from source path in Azure Data Factory
We are going to see a real-time scenario on how to extract the file names from a source path and then use them for any subsequent activity based on its output. This might be useful in cases where we have to extract file names, transform or copy data from csv, excel or flat files from […]

Read post
cetas-creating-external-table-as-select-in-azure-synapse-analytics
CETAS (Creating External Table as Select) in Azure Synapse Analytics
Introduction: In this post we will discuss on how to create an external table and to store the data inside your specified azure storage parallelly using TSQL statements. What is CETAS: CETAS or ‘Create External Table as Select’ can be used with both Dedicated SQL Pool and Serverless SQL Pool to create an external table […]

Read post
parameterize-pipelines-and-datasets-in-azure-data-factory-with-demo
Parameterize Pipelines and Datasets in Azure Data Factory with Demo
Introduction: In continuation to our previous article, we will look at how could we use parameterization into datasets and pipelines. We will also implement a pipeline with a simple copy activity to see how and where we can implement parameters in azure data factory. Consider a scenario where you want to run numerous pipelines with […]

Read post
create-external-datasource-in-azure-synapse-analytics
Create External DataSource in Azure Synapse Analytics
Today we will check how to create an external data source to access data stored in other resources. If you could remember, in one of our previous articles we have discussed that there will be a Logical Data Warehouse (LDW) which will work similar to a database that you could see in azure synapse analytics. […]

Read post
distributions-in-azure-synapse-analytics
Distributions in Azure Synapse Analytics
In continuation to our previous article on Azure Synapse Analytics, we will deep dive into the sharding patterns(distributions) that are used in the Dedicated SQL Pool. In the background, the Dedicate SQL Pool divides a work into 60 smaller queries which will be run in parallel on your compute node. You will define the distribution […]

Read post
filter-real-time-error-rows-from-csv-to-sql-database-table-in-azure-data-factory-part-two
Filter real-time error rows from CSV to SQL Database Table in Azure Data Factory – Part Two
**This is a continuation of part one, I suggest you to check that first to get a clear understanding** Once the first condition is completed let’s check the second which I named as ValidRows as it is going to capture only the non-error values. Compared to the first condition this is very simple as we […]

Read post
filter-real-time-error-rows-from-csv-to-sql-database-table-in-azure-data-factory-part-one
Filter real-time error rows from CSV to SQL Database Table in Azure Data Factory – Part one
Azure Data Factory is a tool with tremendous capabilities when it comes to ETL operations. It has many features that would help the users to cure and transform the data that we load into it. The developers or the users face many real-time issues when performing their ETL operations one such common yet unavoidable scenario […]

Read post
azure-synapse-analytics-architecture
Azure Synapse Analytics Architecture
Azure Synapse SQL is a technology which resides inside the Synapse workspace. Totally we have two pools which we have discussed in detail in one of our articles few weeks ago. Dedicated SQL Pool Serverless SQL Pool The built-in ‘Serverless SQL Pool’ gets created automatically when you create the workspace and the ‘Dedicated SQL Pool’ […]

Read post
integrate-pipelines-with-azure-synapse-analytics
Integrate Pipelines with Azure Synapse Analytics
In line with our previous articles, today we will see how to create, schedule and monitor a pipeline in synapse using synapse analytics studio. Pipeline is ETL with workflow where we will execute and extract the results. A pipeline can be a single or group of activities to be run. Activity is a task to […]

Read post
analyze-data-with-spark-pool-in-azure-synapse-analytics-part-2
Analyze data with Spark Pool in Azure Synapse Analytics – Part 2
This article is a continuation from Part1 which I posted earlier. I strongly recommend you to go through part 1 before you go through this article. The demo we are going to see will use apache Spark serverless pool model where we will be loading a parquet sample data file into spark database (yes, we […]

Read post
analyze-data-with-spark-pool-in-azure-synapse-analytics-part-1-2
Analyze data with Spark Pool in Azure Synapse Analytics – Part 1
This is the part one article of the two part series with demo which explains analyzing data with spark pool in azure synapse analytics. Since the topic touches apache spark heavily, I have decided to write a dedicated article to explain apache spark in azure -hence this part one. Pls make sure to read the […]

Read post
query-csv-file-saved-in-adls-through-sql-query-azure-synapse-analytics
Query CSV File Saved In ADLS Through SQL Query – Azure Synapse Analytics
We are all aware that SQL is commonly used to query structured data but in Synapse Analytics we can use SQL to query unstructured data saved in files like csv, parquet etc., using OPENROWSET function and it is one of the many features that can be done using synapse analytics. In this week’s article we […]

Read post
creating-apache-synapse-analytics-workspace
Creating Apache Synapse Analytics Workspace
In continuation to our previous article in this article we will investigate how to create our first synapse workspace. I strongly recommend you have a look at my previous article where we have discussed the basics of azure synapse analytics and what can be done through it. To get started with azure synapse you must […]

Read post
dedicate-sql-pools-vs-serverless-sql-pools
Dedicate SQL pools vs Serverless SQL Pools
In Azure Synapse Analytics you will be frequently crossing over a term called SQL pools. Its good to know the difference and the working functionalities of both of them. No requirement will be similar to the one before and the end users may need different types of usage for each project. Microsoft has kept that in […]

Read post
what-does-azure-synapse-analytics-do
What does Azure Synapse Analytics do?
Azure Synapse Analytics is a single solution for all data needs like ingesting, processing, and serving the data. It delivers unified experience of data integration, data warehousing and big data analytics in a single workspace environment. Azure Synapse analytics can be easily integrated with other services provided by azure like Azure Machine Learning, CosmosDB and […]

Read post
incremental-file-copy-in-azure-data-factory
Incremental File Copy In Azure Data Factory
Introduction: In this article we will check how we can copy new and changed files based on last modification date. The steps have been given below with explanation and screenshots. As of this writing Azure Data Factory supports only the following file formats, but we can be sure that more formats will be added in […]

Read post
triggers-in-azure-data-factory
Triggers in Azure Data Factory
Introduction: In this blog, we will look into Azure Data Factory Triggers which is an important feature to scheduling the pipeline to run without manual intervention each time. Apart from regular advantage to schedule the pipeline for future runs (which is very common), the azure data factory trigger has a special feature to pick and process data from […]

Read post
parameterization-in-azure-data-factory-linked-services
Parameterization in Azure Data Factory Linked Services
Introduction: The linked services in azure data factory have the option to parameterize and pass dynamic values at run time. There might be requirement where we want to connect different databases from the same logical server or different database servers itself. Traditionally we would create separate linked services for each database or database servers but […]

Read post
how-to-copy-files-using-azure-data-factory-pipeline
How to Copy Files Using Azure Data Factory Pipeline
Introduction In this article we will look at our first hands-on exercise in Azure Data Factory by carrying out simple file copy from our local to blob storage. The steps has been given below with explanation and screenshots. Create a storage account After creating storage account, create container which will hold the data that we […]

Read post
how-to-get-started-with-azure-data-factory
How To Get Started With Azure Data Factory
As we all know that data is the new oil in the world, but it is more than that. The data projection and insights generated can make or break a company’s prospects. Every organization will face challenges in some form in any or all the below actions. Acquiring / data procurement Storing and archiving the […]

Read post
real-time-twitter-analysis-with-azure-stream-analytics-and-saving-the-results-in-to-azure-blob-storage
Real time twitter analysis with Azure Stream Analytics and saving the results in to Azure blob storage
Introduction A lot of consumer data is being posted on social media every minute and social media analysis has become a critical component in audience analysis, competitive research, and product research. Social media analytics and its tools are helping organizations around the world understand currently trending topics. Trending topics are those subjects and attitudes that […]

Read post
create-an-azure-event-hub
Create an Azure Event Hub
Obviously you should have an active Azure subscription. If you are testing out this feature you can create a free account for $200 free credit to explore azure and 12 months of popular free services. Creating resource group All resources are deployed and managed from a resource group. A resource group is a logical collection […]

Read post
send-or-receive-events-from-azure-event-hub-using-python
Send or receive events from Azure Event Hub using Python
This article is an quickstart demo of how one can send or receive events from Azure Event Hub using python script. If you are new to Event Hubs please check my previous post which explains the basics before you continue. We will be using two python scripts, ‘send.py’ and ‘recv.py’ for sending and receiving test […]

Read post
azure-stream-analytics
Azure Stream Analytics
Azure Stream Analytics is a fully managed PaaS (Platform-as-a-Service) and a real-time streaming service provided by Microsoft. It consists of a complex event processing engine designed to analyze and process vast volumes of real-time data like stock trading, credit card fraud detection, Web clickstream analysis, social media feeds & other applications. For quicker analysis of […]

Read post
azure-event-hubs-a-primer
Azure Event Hubs – A Primer
Azure Event Hubs is a highly scalable publish-subscribe PaaS service that can ingest millions of events per second with low latency and stream them into other applications. We can consider Event Hub as the starting point in an event processing pipeline often it represents the “front door” for an event pipeline. Event Hubs provides a […]

Read post