VinsData Blog

Latest Blog Posts

  • heavy-etl-on-a-budget-using-shortcuts-to-keep-data-accessible-after-pausing-your-high-power-capacity

    The real problem: compute costs do not sleep Fabric’s pricing model is capacity based. That means compute is your biggest lever for cost control. Typical enterprise pattern looks like: Yet many teams keep the engineering capacity running 24/7 and so the obvious question teams ask is: if we pause a capacity, does access to the […]

  • announcement

    Update: I’m moving back! Following some unresolvable hosting issues at vinsdata.in, I have decided to return to this blog site for now. Stay tuned for new posts and data platform insights right here. This blogsite has been moved to new address. All the new blogs & articles going forward will be published directly in the […]

  • mirrored-sql-databases-in-microsoft-fabric

    Microsoft Fabric introduces a seamless way to bring your existing Azure SQL Database or SQL Server data into the unified Fabric platform through Mirrored SQL Databases. This capability enables realtime analytics and integration with Fabric services without the need for complex ETL pipelines or data duplication. What is a Mirrored SQL Database? A mirrored database […]

  • azure-private-link-for-microsoft-fabric

    Azure Private Link enhances Azure’s connectivity by allowing secure, private access to Azure-hosted services. It enables organizations to create private connections between their on-prem environments and the Azure services, ensuring data remains secure and isolated from the public internet. It confines interactions to the Azure network, avoiding exposure to the public internet. Private link’s integration […]

  • overview-on-implementing-a-data-mesh-architecture-in-microsoft-fabric

    In an era where data is pivotal, traditional centralized data systems often fall short of meeting the needs for scalability, agility, and decentralization. The concept of Data Mesh emerges as a contemporary solution, advocating for decentralized data ownership and treating data as a product. This article will delve into the principles of Data Mesh and illustrate how […]

  • microsoft-fabric-create-and-load-your-data-into-the-lakehouse-table

    What is a LakeHouse? A Lakehouse is a data management architecture that is a combination of both data lakes and data warehouses. Before we jump into the definition of Lakehouse it is important to better understand the difference between Datalake and DataWarehouse individually. A Lakehouse allows you to use both the Datalake and DataWarehouse together […]

  • shortcuts-feature-in-microsoft-fabric

    Microsoft Fabric is a powerful and flexible data platform that enables organizations to process, analyze, and gain insights from their data at scale. One of the key features that sets Fabric apart is its ability to perform in-place IO on files using shortcuts, without the need for physical data movement or duplication. In this article, […]

  • deep-dive-on-shortcuts-feature-in-microsoft-fabric
  • mirroring-in-microsoft-fabric-enhancing-data-reliability-and-availability

    In the realm of cloud computing and data management, Microsoft Fabric stands out as a versatile platform that provides a comprehensive suite of tools for data integration, storage, analysis, and management. One of the critical features in Microsoft Fabric is the “mirroring” capability, designed to enhance data reliability and availability, ensuring that businesses can maintain […]

  • unleash-the-power-of-data-activator-feature-in-microsoft-fabric

    Microsoft Fabric offers a powerful feature called Data Activator that allows users to seamlessly integrate and activate their data for enhanced insights and decision-making. This feature enables users to connect various data sources, such as databases, APIs, and cloud services, to Fabric’s analytics platform. With Data Activator, users can easily transform raw data into actionable […]

  • microsoft-fabric-terminologies

    Following are the basic terminologies that are used inside Microsoft Fabric ecosystem. This has been referenced from official Fabric documentation to serve as a repo for all our future articles in Fabric. Generic Terms Synapse Data Engineering Data Factory Synapse Data Science Synapse data warehousing Synapse Real-Time Analytics OneLake Shortcut: Shortcuts are embedded references within OneLake […]

  • powerbi-vs-cleanlab-studio-the-best-tool-for-your-data-cleansing-needs

    Dataset cleansing is an essential step in data analysis as it ensures your dataset’s accuracy and consistency and helps remove its inconsistencies and errors. Using the dataset without proper data cleansing activity will result in improper value and wrong insights into the organizations’ data-driven decisions. In this blog post, we will see how to get […]

  • why-should-you-migrate-from-azure-synapse-analytics-to-microsoft-fabric

    Microsoft Fabric is a cloud-based data platform that provides a range of services for data engineering, data science, and business intelligence. It is an extension of Azure Synapse Analytics that integrates all analytics workloads from the data engineer to the business knowledge worker. Fabric brings together Power BI, Data Factory, and the Data Lake, on […]

  • refreshing-the-built-in-roles-available-in-the-azure-synapse-analytics

    Azure Synapse Analytics has many built-in roles that will help to manage access to Synapse resources. These roles allow you to control what users and applications can do within a Synapse workspace. Synapse RBAC Roles can be assigned by Synapse Administrators. A workspace-level Synapse Administrator can grant access to any workspace. A lower-level Synapse administrator […]

  • azure-synapse-workload-management

    Managing varied workloads with proper resource allocation for multiple concurrent user environment is the biggest challenge a team might face when retrieving data from an azure synapse analytics dedicated sql pool db. Workload management in azure synapse analytics gives you access to control your workload that are utilizing your system resources. Setting up the best […]

  • dwhdata-warehouse-units-in-synapse-dedicated-pool

    Basically, there are two types of pools in Azure synapse analytics: Serverless SQL Pool and Dedicated SQL Pool. In serverless model as you might be aware that the costing is based on pay-per-usage model and calculated per TB or processing consumed on the queries that are run. Whereas the costing of Dedicated SQL pools is […]

  • implementing-change-data-capture-in-azure-data-factory

    Change Data Capture (CDC): For any ETL requirement that involves huge amount of data, most of the problem is solved when you eliminate repeated or redundant process in your data storage mechanism. Basically, you should not repeat the work to copy or move the data that you have it already in your destination datastore. Hence […]

  • change-data-capture-in-azure-synapse-analytics

    What is Change Data Capture? In data terminology Change Data Capture or simply called CDC is a method to track and pick only the data that has been changed from the last known point of time. CDC is a feature that was already available in the SQL Server for finding the changed records in a […]

  • adf-delete-files-from-azure-storage-based-on-column-value-in-excel

    In this article we are going to discuss about how to pick and delete only specific files from the ADLS storage container by passing filenames taken from a excel/csv file column value. File deletion: Recently I came across a requirement for file deletion in ADLS. Azure Data Factory’s delete activity is enough to complete this […]

  • monitoring-azure-synapse-analytics-workloads-using-dmvs

    Introduction In this article we will look at Dynamic Management Views and how can we leverage them to monitor the workloads in an azure synapse analytics workload. We will learn this today with a practical use case and few examples focussing on synapse workload monitoring. Dynamic Management Views Dynamic Management View or simply called DMVs are nothing […]

  • configure-adf-pipeline-output-to-a-file

    At an enterprise level, every project schedules and runs multiple Azure Data Factory pipelines but tracking their outcomes in ADF studio is a cumbersome process. There are companies who after for every failed pipeline activity with some error, they must track them down by drilling down each activity until they find the failed one and […]

  • azure-synapse-security-static-data-masking

    Data security is hot topic given the data breach we hear about it every day. Though there are various specialized tools available in the market, multiple questions arise on their accessibility, Sharing and data transfers within the organization. Mostly in an organization there might be need to refresh(copy) production sensitive data to multiple nonproduction environments […]

  • azure-synapse-security-dynamic-data-masking

    Dynamic data masking is a feature that is available in Synapse analytics to restrict the exposure of sensitive data to the end users. We can configure data masking to hide sensitive data in the result sets that are queries by the users. Using data masking we can not only restrict also specify the amount of […]

  • azure-synapse-analytics-link-for-sql-step-by-step-approach

    This article provides a step-by-step guide for getting started with Azure Synapse Link for Azure SQL Database. I strongly recommend you go through my previous article which explains the basics of Synapse Link for SQL before proceeding with this (creating it) for better understanding. Configure Source Azure SQL Database Create a linked service to your […]

  • The newly released feature ‘Synapse link for SQL’, enables near real-time analytics into Azure Synapse analytics over operational data from both Azure SQL and SQL Server 2022. It provides seamless integration between the SQL database and Azure Synapse analytics. The rich feature it provides enables users to run analytics, machine learning or BI workloads on […]

  • monitor-azure-synapse-analytics-using-log-analytics

    The log analytics will monitor the synapse pipelines and provide us more insights once if the job fails. The Azure Synapse integration with Log Analytics is particularly useful in the following scenarios: You want to write complex queries on a rich set of metrics that are published by Azure Synapse to Log Analytics. Custom alerts […]

  • lake-database-in-azure-synapse-analytics

    Introduction: Azure synapse analytics provides standard database templates for various industries to use and create DB model as per their company needs. These are readymade templates which can be created with rich metadata for a clear understanding that can be implemented anytime with fewer steps. Database templates are in simple terms, business and technical data […]

  • monitor-azure-synapse-analytics-using-azure-monitor

    Introduction: In day to day operations we must have faced requirements to backup and restore or copy an Azure data factory from existing to new ones. In todays demo we will see how can we backup and restore the Azure data factory using ARM templates export/import option in azure data factory studio. Steps: I will […]

  • pause-dedicated-sql-pools-with-azure-synapse-pipelines

    Introduction: One of the main objective of any business that is using cloud services is to optimize resources and lower the on-going costs. Most of the organizations done need access to the data warehouse layer round the clock and they will be using reporting dashboards to view the information. In such scenarios it is best […]

  • parameterization-using-notebooks-in-azure-synapse-analytics

    Introduction: Parameterization is very useful when you want a reusable code that you can use forever and get the output by executing it only by changing the parameter for all your future requirements. Traditionally while coding you will declare variables which are static(see image below) but with parameterization you can use dynamic parameters all through […]

  • create-synapse-notebook-and-run-python-and-sql-under-spark-pool

    In this article we will look into how could we run both Python and SparkSQL queries in a single notebook workspace under the built-in Apache Spark Pools to transform the data in a single window. Introduction:In Azure synapse analytics, a notebook is where you can write live code, visualize and also comment text on them. […]

  • extract-file-names-and-copy-from-source-path-in-azure-data-factory

    We are going to see a real-time scenario on how to extract the file names from a source path and then use them for any subsequent activity based on its output. This might be useful in cases where we have to extract file names, transform or copy data from csv, excel or flat files from […]

  • cetas-creating-external-table-as-select-in-azure-synapse-analytics

    Introduction: In this post we will discuss on how to create an external table and to store the data inside your specified azure storage parallelly using TSQL statements. What is CETAS: CETAS or ‘Create External Table as Select’ can be used with both Dedicated SQL Pool and Serverless SQL Pool to create an external table […]

  • parameterize-pipelines-and-datasets-in-azure-data-factory-with-demo

    Introduction: In continuation to our previous article, we will look at how could we use parameterization into datasets and pipelines. We will also implement a pipeline with a simple copy activity to see how and where we can implement parameters in azure data factory. Consider a scenario where you want to run numerous pipelines with […]

  • create-external-datasource-in-azure-synapse-analytics

    Today we will check how to create an external data source to access data stored in other resources. If you could remember, in one of our previous articles we have discussed that there will be a Logical Data Warehouse (LDW) which will work similar to a database that you could see in azure synapse analytics. […]

  • distributions-in-azure-synapse-analytics

    In continuation to our previous article on Azure Synapse Analytics, we will deep dive into the sharding patterns(distributions) that are used in the Dedicated SQL Pool. In the background, the Dedicate SQL Pool divides a work into 60 smaller queries which will be run in parallel on your compute node. You will define the distribution […]

  • filter-real-time-error-rows-from-csv-to-sql-database-table-in-azure-data-factory-part-two

    **This is a continuation of part one, I suggest you to check that first to get a clear understanding** Once the first condition is completed let’s check the second which I named as ValidRows as it is going to capture only the non-error values. Compared to the first condition this is very simple as we […]

  • filter-real-time-error-rows-from-csv-to-sql-database-table-in-azure-data-factory-part-one

    Azure Data Factory is a tool with tremendous capabilities when it comes to ETL operations. It has many features that would help the users to cure and transform the data that we load into it. The developers or the users face many real-time issues when performing their ETL operations one such common yet unavoidable scenario […]

  • azure-synapse-analytics-architecture

    Azure Synapse SQL is a technology which resides inside the Synapse workspace. Totally we have two pools which we have discussed in detail in one of our articles few weeks ago. Dedicated SQL Pool Serverless SQL Pool The built-in ‘Serverless SQL Pool’ gets created automatically when you create the workspace and the ‘Dedicated SQL Pool’ […]

  • integrate-pipelines-with-azure-synapse-analytics

    In line with our previous articles, today we will see how to create, schedule and monitor a pipeline in synapse using synapse analytics studio. Pipeline is ETL with workflow where we will execute and extract the results. A pipeline can be a single or group of activities to be run. Activity is a task to […]

  • analyze-data-with-spark-pool-in-azure-synapse-analytics-part-2

    This article is a continuation from Part1 which I posted earlier. I strongly recommend you to go through part 1 before you go through this article. The demo we are going to see will use apache Spark serverless pool model where we will be loading a parquet sample data file into spark database (yes, we […]

  • analyze-data-with-spark-pool-in-azure-synapse-analytics-part-1-2

    This is the part one article of the two part series with demo which explains analyzing data with spark pool in azure synapse analytics. Since the topic touches apache spark heavily, I have decided to write a dedicated article to explain apache spark in azure -hence this part one. Pls make sure to read the […]

  • query-csv-file-saved-in-adls-through-sql-query-azure-synapse-analytics

    We are all aware that SQL is commonly used to query structured data but in Synapse Analytics we can use SQL to query unstructured data saved in files like csv, parquet etc., using OPENROWSET function and it is one of the many features that can be done using synapse analytics. In this week’s article we […]

  • creating-apache-synapse-analytics-workspace

    In continuation to our previous article in this article we will investigate how to create our first synapse workspace. I strongly recommend you have a look at my previous article where we have discussed the basics of azure synapse analytics and what can be done through it. To get started with azure synapse you must […]

  • dedicate-sql-pools-vs-serverless-sql-pools

    In Azure Synapse Analytics you will be frequently crossing over a term called SQL pools. Its good to know the difference and the working functionalities of both of them. No requirement will be similar to the one before and the end users may need different types of usage for each project. Microsoft has kept that in […]

  • what-does-azure-synapse-analytics-do

    Azure Synapse Analytics is a single solution for all data needs like ingesting, processing, and serving the data. It delivers unified experience of data integration, data warehousing and big data analytics in a single workspace environment. Azure Synapse analytics can be easily integrated with other services provided by azure like Azure Machine Learning, CosmosDB and […]

  • incremental-file-copy-in-azure-data-factory

    Introduction: In this article we will check how we can copy new and changed files based on last modification date. The steps have been given below with explanation and screenshots.  As of this writing Azure Data Factory supports only the following file formats, but we can be sure that more formats will be added in […]

  • triggers-in-azure-data-factory

    Introduction: In this blog, we will look into Azure Data Factory Triggers which is an important feature to scheduling the pipeline to run without manual intervention each time. Apart from regular advantage to schedule the pipeline for future runs (which is very common), the azure data factory trigger has a special feature to pick and process data from […]

  • parameterization-in-azure-data-factory-linked-services

    Introduction: The linked services in azure data factory have the option to parameterize and pass dynamic values at run time. There might be requirement where we want to connect different databases from the same logical server or different database servers itself. Traditionally we would create separate linked services for each database or database servers but […]

  • how-to-copy-files-using-azure-data-factory-pipeline

    Introduction In this article we will look at our first hands-on exercise in Azure Data Factory by carrying out simple file copy from our local to blob storage. The steps has been given below with explanation and screenshots. Create a storage account After creating storage account, create container which will hold the data that we […]

  • how-to-get-started-with-azure-data-factory

    As we all know that data is the new oil in the world, but it is more than that. The data projection and insights generated can make or break a company’s prospects. Every organization will face challenges in some form in any or all the below actions. Acquiring / data procurement Storing and archiving the […]

  • real-time-twitter-analysis-with-azure-stream-analytics-and-saving-the-results-in-to-azure-blob-storage

    Introduction A lot of consumer data is being posted on social media every minute and social media analysis has become a critical component in audience analysis, competitive research, and product research. Social media analytics and its tools are helping organizations around the world understand currently trending topics. Trending topics are those subjects and attitudes that […]

  • create-an-azure-event-hub

    Obviously you should have an active Azure subscription. If you are testing out this feature you can create a free account for $200 free credit to explore azure and 12 months of popular free services. Creating resource group All resources are deployed and managed from a resource group. A resource group is a logical collection […]

  • send-or-receive-events-from-azure-event-hub-using-python

    This article is an quickstart demo of how one can send or receive events from Azure Event Hub using python script. If you are new to Event Hubs please check my previous post which explains the basics before you continue. We will be using two python scripts, ‘send.py’ and ‘recv.py’ for sending and receiving test […]

  • azure-stream-analytics

    Azure Stream Analytics is a fully managed PaaS (Platform-as-a-Service) and a real-time streaming service provided by Microsoft. It consists of a complex event processing engine designed to analyze and process vast volumes of real-time data like stock trading, credit card fraud detection, Web clickstream analysis, social media feeds & other applications. For quicker analysis of […]

  • azure-event-hubs-a-primer

    Azure Event Hubs is a highly scalable publish-subscribe PaaS service that can ingest millions of events per second with low latency and stream them into other applications. We can consider Event Hub as the starting point in an event processing pipeline often it represents the “front door” for an event pipeline. Event Hubs provides a […]