Things You Need to Know About Using External Tables in Azure SQL

This post was jointly authored by Josh Reaves and Masood Pehlvi

What happens when the source table definition is changed?

Introduction

The elastic database query feature in Azure SQL allows you to run t-SQL statements that incorporate tables from other Azure SQL databases, meaning that you are able to run queries that span multiple databases. The elastic query feature allows you to perform cross-database queries to access remote tables and to connect BI tools (Excel, Power BI) to query across those multiple databases. You can also pull data from other databases to prepare (integrate) the data for analytics purposes. Using this feature, you can scale out queries to large data tiers in SQL Database and visualize the results in reports.

In addition, you can use the concept of external tables to:

  • query Hadoop or Azure blob storage data with t-SQL statements
  • import and store data from Hadoop or Azure blob storage into SQL Server database
  • for integration into BYOD staging environment for data from Dynamics 365 Finance & Operations or for data from Dynamics 365 Customer Engagement (D365 CE) via the Data Export Service (DES)
  • import and store data from Azure Data Lake Store into Azure SQL Data Warehouse

In order to do any of this, you first have to define the other database as an External Data Source, and using that source, you define the DDL for the tables that you will use in your queries as External Tables. The DDL must be known and once defined in your database, that definition is static and unchanged even when the source definition changes. But what happens to your queries when the source table definition is changed?

Let’s run some tests

In this blog, we present a series of tests where we have already defined external tables in one Azure SQL database, with the external data source being another, different Azure SQL database. We then modify the source table’s structure in the other database without modifying the external table definition in our database, and we run 2 types of SELECT statements to observe what happens: “SELECT *”, and “SELECT <col1>, <col2>, …”. We ran these SELECT statements in the following scenarios:

  • Adding a new column in the Source table without updating the External table DDL
  • Deleting a column in the Source table without updating the External table DDL
  • Changing a column data type in the Source table without updating the External table DDL

Adding a new column in the Source table without updating the External table DDL

  • SELECT * results:SELECT * FROM [SCHEMA].[EXTERNAL_TABLE_LINK];- Query results successfully returned but the new field is not included in the result-set. This is expected since the External table does not define the new column and hence, did not include that column in the SELECT statement.
  • SELECT <col1>, <col2>, … results:SELECT <col1>, <col2> FROM [SCHEMA].[EXTERNAL_TABLE_LINK];- Query results successfully returned but the new field is not included in the result-set. This is expected since we did not include that column in the SELECT statement.

Conclusions

  • Running SELECT * has no errors, but new column is not part of the result-set. This is expected and confirms that that External table DDL is used to formulate the SELECT statement.
  • Running SELECT <col1>, <col2> has no errors if new column is not included in the SELECT statement.
  • Running SELECT <col1>, <col2> does have errors if new column is included in the SELECT statement. This is expected since the External table DDL is missing the reference.

In both cases, the result-sets returned were successful (as long as we did not explicitly reference the new column), meaning the new column added to the Source table does not (necessarily) affect the SELECT statement of the External table.

Deleting a column in the Source table without updating the External table DDL

  • SELECT * results:SELECT * FROM [SCHEMA].[EXTERNAL_TABLE_LINK];- Msg 46823, Level 16, State 1, Line 4- Error retrieving data from <AZURE_SERVER_NAME>.database.windows.net.<AZURE_SOURCE_DATABASE>. ‘Invalid column name ‘COLUMN_NAME’.
  • SELECT <col1>, <col2>, … results:SELECT <col1>, <col2> FROM [SCHEMA].[EXTERNAL_TABLE_LINK];If the deleted column is included in the SELECT:- Msg 46823, Level 16, State 1, Line 4- Error retrieving data from <AZURE_SERVER_NAME>.database.windows.net.<AZURE_SOURCE_DATABASE>. ‘Invalid column name ‘COLUMN_NAME’.If the deleted column is NOT included in the SELECT:- Query results successfully returned but the new field is not included in the result-set. This is expected since we did not include that column in the SELECT statement.

Conclusions

  • Running SELECT * has errors, because the deleted column is part of the External table DDL and is used to formulate the SELECT statement.
  • Running SELECT <col1>, <col2> has no errors if deleted column is not included in the SELECT statement.
  • Running SELECT <col1>, <col2> does have errors if deleted column is included in the SELECT statement. This is expected since the External table DDL references a column that no longer exists in the source.

The SELECT was successful in the case where as we did not explicitly reference the deleted column, meaning the deleted column (from the Source table) does affect the SELECT statement of the External table.

Changing a column data type in the Source table without updating the External table DDL

  • SELECT * results:SELECT * FROM [SCHEMA].[EXTERNAL_TABLE_LINK];- Msg 46825, Level 16, State 1, Line 12- The data type of the column ‘COLUMN_NAME’ in the external table is different than the column’s data type in the underlying standalone or shared table present on the external source.
  • SELECT , … results:SELECT , FROM [SCHEMA].[EXTERNAL_TABLE_LINK];- Msg 46825, Level 16, State 1, Line 12- The data type of the column ‘COLUMN_NAME’ in the external table is different than the column’s data type in the underlying standalone or sharded table present on the external source.
  • SELECT <col1>, <col2>, … results:

 Conclusions

  • Running SELECT * has errors, because the changed column is part of the External table DDL and is used to formulate the SELECT statement.
  • Running SELECT <col1>, <col2> has no errors if changed column is not included in the SELECT statement.
  • Running SELECT <col1>, <col2> does have errors if changed column is included in the SELECT statement. This is expected since the External table DDL for the column is incompatible with the source.

The SELECT was successful in the case where as we did not explicitly reference the changed column, meaning the changed column (from the Source table) does affect the SELECT statement of the External table.  

NOTE: We did not test changing a column name, but we feel confident the behavior would be the same as the tests above.