Transform Your Data Warehouse with Azure Synapse Analytics
On Nov. 4, 2019, Microsoft unveiled Azure Synapse Analytics, the next evolutionary stage of Azure SQL Data Warehouse. On Dec. 3, 2020, this powerful solution finally became generally available. In this blog post, we will take a closer look at what Azure Synapse Analytics is, how it works, and how you can leverage it in your organization.
What is Azure Synapse Analytics?
As advertised, Azure Synapse Analytics is “a limitless analytics service that brings together enterprise data warehousing and Big Data analytics.”
But what does that mean? Azure Synapse Analytics (Synapse) is a petabyte-scale, cloud-based data warehouse much the same as Azure SQL Data Warehouse. The key difference between Synapse and its predecessor is how it seamlessly consolidates multiple technologies, workloads, and roles within a single service.
Synapse is organized around the Data Lake as a pseudo source of record for all organizational data. Synapse brings three flavors of compute to this architecture:
- Large-scale provisioned compute akin to Azure SQL Data Warehouse, which provides horsepower for analytics
- Integrated Apache Spark, which delivers a massively parallel processing (MPP) engine capable of processing Big Data and machine learning workloads at scale
- An on-demand SQL engine for exploration, Extract, Load, Transform (ELT), and fueling Synapse’s powerful embedded Pipeline features
With this solution, Microsoft has realized its vision for a consolidated platform that combines a sleek, unified user interface (UI) with deep integration across the entire Azure ecosystem. Synapse represents how the evolution of modern data has empowered organizations to unleash new insights in a way that was never before possible.
To see Azure Synapse Analytics in action, check out the video below:
Video: Intro to Synapse Studio
This session provides an overview of Synapse Studio, Microsoft’s Modern Data and Analytics Platform. This session will provide listeners with the knowledge they need to make strategic decisions about utilizing Azure Synapse in their organizations. https://play.vidyard.com/GatpbXyJRDUNo7PmCpXrFV.html?
What’s New in Azure Synapse Analytics?
Synapse is loaded with innovative features — so many, in fact, that we couldn’t possibly cover them all in a single article. That said, four components, in particular, deserve closer analysis:
Studio is Synapse’s UI, which you can access from the Azure Portal by clicking on the workspace web URL. When accessing the tool, you are presented with a unified design surface organized around multiple functional areas called hubs. You can find primary activities in the Data, Develop, and Integrate hubs, and configuration activities in the Monitor and Manage hubs.
We’ve already said a great deal about Synapse’s UI, but in general, this solution excels by bringing together a broad selection of complex technologies.
Let us take a closer look:
- The Data hub is dedicated to navigating and linking various resources, both within the workspace and externally. Directly access and manage your Data Lake from this area, explore Spark tables or access integration datasets.
- The Develop hub is where the various data engineering, data science, and data exploration activities take place; this is where the majority of users will likely spend their time. Some highlights from the Develop hub include embedded IntelliSense editors for code artifacts (Notebooks and SQL scripts), Power BI integration for the manipulation of Power BI reports, and Data Flows. The new Git integration, long awaited during the preview, brings source control and Data Factory-like patterns to the development workflow for easy integration into CI/CD patterns.
- The Integrate hub is where you can access Synapse Pipelines functionality. Data Factory users will feel very at home in this environment. It is easy to visually manipulate pipelines, and the selection of activities and logic building blocks enable you to construct complex orchestration tasks, integrate them with Logic Apps, and schedule them to automatically trigger as needed. For a quick start, check out the Gallery, which containers helpful starter pipelines, as well as more sophisticated examples for common tasks such as slowly changing dimensions. Like the Develop hub, the Integrate hub features seamless Git integration.
- Finally, you can access administrative tasks via the Manage hub. Using this hub, you can provision and manipulate pools, as well as configure sundry functionality such as Integration Runtimes. Administrative tasks are complimented by the Monitor hub, where pipeline runs and other activities can be monitored and analyzed.
Synapse SQL (SQL Pools) & SQL On-demand
Synapse SQL is the solution’s industry-standard T-SQL-based analytics engine, designed for high-performance manipulation of structured data. New in Synapse, this engine comes in both the traditional, provisioned flavor and as a new on-demand offering.
Provisioned compute in SQL pools is the next generation of SQL Data Warehouse. This feature has arguably not received as much press as the rest of the solution, possibly because its pedigree as one of the industry’s most robust and reliable Data Warehouse solutions is already well known. Despite this, Synapse has brought a range of improvements to SQL pools that should not be overlooked, starting with its sophisticated workload management capabilities, which enable users to fine-tune resource allocation across different workload groups. There is also the high-performance COPY feature for loading data from external storage accounts. Finally, enhancements such as the PREDICT clause integrate AI and machine learning by enabling native model scoring from within Transact-SQL. The theme of a unified platform continues with specialized Spark Notebooks capabilities, which enable high-speed loads of staged data (PolyBase) and simplified security.
The announcement of SQL on-demand is significant because it addresses a gap that has, in the past, been an inherent tradeoff in the design of enterprise data systems. The reality of complex data ecosystems is that demand patterns vary by workload, across users, and in many other ways. This reality can be challenging to manage because you need to make architectural decisions based on how much provisioned compute you need to run analytics and handle auxiliary tasks, such as data cleansing, data engineering, and data exploration and where to place them in the architecture. For provisioned compute, provision too much, and you could end up overpaying; provision too little, and you could experience unpredictable performance and other quality issues.
On-demand compute addresses spikey, unpredictable workloads like these by being always- available and provides another toolset within the data architecture. Exploring the Data Lake — whether it is stored as Parquet, Orc, or CSV (comma-separated values) — is as easy as a right-click, without the need for additional work or tooling. SQL On-demand also includes new enhancements for ELT/Extract, Transform, Load tasks, with features such as performance-optimized delimited text parsers; this adds yet another capability to the already-powerful Synapse Pipelines functionality. The raw power and familiarity of SQL Server can be leveraged when prototyping queries or conducting other ad hoc tasks without having to approximate the anticipated load on primary compute. SQL On-demand is billed based on the amount of data processed and this can be controlled by daily, weekly, and monthly limits, as necessary.
Apache Spark pools round out Azure Synapse Analytics’ list of compute options with a powerful MPP engine designed for in-memory processing of Big Data. MPP systems enable you to leverage compute in parallel and are ideally suited for semi-structured or unstructured workloads typical of internet of things (IoT) and machine learning use cases.
Synapse’s implementation is natively available from the Develop hub, where you can directly author Notebooks using a rich editor. Cognitive services and machine learning are natively integrated, too. With a right-click in the Data hub, which is populated by wizards intelligently configured from Linked Services and other configuration artifacts, you can create starter Notebooks that use these services.
It is clear that, with Synapse, Microsoft has put a great deal of thought into how to increase productivity, regardless whether the end user is a data scientist, data engineer, or a business user. Synapse facilitates basic data exploration through integrated charting and aggregation. IntelliSense is threaded through all editors, and you can use multiple languages within the same Notebook including Python (PySpark), C#, Scala, or Spark SQL.
Synapse Pipelines is analogous to Azure Data Factory, Azure’s well-known hybrid data integration service. Pipelines logically resides at the heart of the Synapse ecosystem and provides a data orchestration and movement framework to ingest data from multiple different sources, or land in the appropriate data store ready for compute activities. You can access Pipelines via its own hub in the Synapse UI.
As one might expect, you can move data into the Data Lake from almost any source system using over 90 different service connectors, from Oracle to REST APIs or use it to orchestrate ingestion into the Data Warehouse. You can integrate Spark Notebooks with any pipeline, rounding out Pipelines’ data engineering options.
For those familiar with Azure Data Factory, Data Flows are notably absent from Pipelines. Rest assured that this data enrichment functionality is available in Synapse and can be found in the Develop hub.
Azure Synapse Analytics Feature Spotlight: Synapse Link
As we have mentioned, Azure Synapse Analytics has an expansive list of features and capabilities, and we expect many more exciting innovations in the coming months. One key feature that is currently available is Synapse Link.
One challenge commonly associated with data warehousing solutions is the disconnect between hot path and cold path data. It takes an incredible amount of work to engineer large quantities of data into a semantic model appropriate for analytics, and this process imposes latency on the design. When the data in question flows into a system at scale, as is the case with IoT workloads, this architecture often struggles to keep up. Synapse introduces a new feature called Synapse Link that leverages hybrid transactional/analytics processing to surface data alongside the analytical store without affecting the performance of the source data system. In effect, Synapse Link provides a real-time window into data flow and timely analytics akin to change data capture technologies.
Synapse Link is currently available for Cosmos DB only, though we will undoubtedly see it added to other stores in the future.
How Can You Leverage Azure Synapse Analytics in Your Organization?
Azure Synapse Analytics is, in a sense, both a reinvigoration of key products and services within the Azure Data Services ecosystem and a reinvention. Any organization looking to modernize its data approach would benefit from implementing Synapse, whether it is new to the Modern Data Estate or already far along in its journey.
For mid-stage organizations that have already invested in Data Lakes, have built SQL Data Warehouses, or that have sizeable Power BI assets, you can radically democratize these investments thanks to Synapse’s extensive and tight integration with the existing Azure ecosystem. In general, the migration path is straightforward, and Synapse can leverage these investments almost out of the box. Organizations further along the data maturity curve can benefit from Synapse, too, because it integrates with Azure’s most compelling capabilities, including machine learning and cognitive services. You can deploy Synapse as a natural evolution to these investments, as a way of rationalizing your architecture down to fewer components, and as a way of simplifying access.
Synapse is a major asset to organizations that are relative newcomers to the cloud because it can simply the cloud adoption process and reduce much of the complexity associated with the traditional approach of multiple siloed applications. By tying different data sources and technology together, Synapse not only eliminates these impediments — it also inspires an agile approach to experimentation.
If you’re ready to start realizing the benefits of Azure Synapse Analytics, Hitachi Solutions can help. Our close relationship with Microsoft lends us with a deep understanding of these tools, making us ideally suited to provide long-term visions and actionable solutions for the entire Azure stack. We’ve worked with countless clients on similar Modern Data Estate projects — from introducing new users to Azure tools to spearheading the most complex Azure migrations, we’ve seen and done it all. As an added bonus, our industry IP and accelerators can help you execute your vision faster and with less risk.
So, what are you waiting for? Contact Hitachi Solutions to get started with Azure Synapse Analytics today.