識別狀況不良的擴充功能
在 Azure 入口網站中使用內建擴充功能健康情況儀錶板
您可以使用 Azure 入口網站中的
內建延伸模組健康情況儀錶板
,顯示所有已部署的 SQL Server Azure 擴充功能的健康情況。
從 sql-server-samples GitHub 儲存庫建立您自己的自訂儀表板:
支援 Arc 的 SQL Server Health.json
。
使用 Azure Resource Graph 查詢狀況不良的擴充功能
使用 Azure Resource Graph 來識別已啟用 Azure Arc 之伺服器上 SQL Server 的 Azure 擴充功能狀態。
如果您還不熟悉,請參閱 Azure Resource Graph:
什麼是 Azure Resource Graph
快速入門:使用 Azure 入口網站執行 Resource Graph 查詢
此查詢會從已安裝擴充功能但狀況不良的伺服器傳回 SQL Server 執行個體。
resources
| where type == "microsoft.hybridcompute/machines/extensions"
| where properties.type in ("WindowsAgent.SqlServer", "LinuxAgent.SqlServer")
| extend targetMachineName = tolower(tostring(split(id, '/')[8])) // Extract the machine name from the extension's id
| join kind=leftouter (
resources
| where type == "microsoft.hybridcompute/machines"
| project machineId = id, MachineName = name, subscriptionId, LowerMachineName = tolower(name), resourceGroup , MachineStatus= properties.status , MachineProvisioningStatus= properties.provisioningState, MachineErrors = properties.errorDetails //Project relevant machine health information.
) on $left.targetMachineName == $right.LowerMachineName and $left.resourceGroup == $right.resourceGroup and $left.subscriptionId == $right.subscriptionId // Join Based on MachineName in the id and the machine's name, the resource group, and the subscription. This join allows us to present the data of the machine as well as the extension in the final output.
| extend statusExpirationLengthRange = 3d // Change this value to change the acceptable range for the last time an extension should have reported its status.
| extend startDate = startofday(now() - statusExpirationLengthRange), endDate = startofday(now()) // Get the start and end position for the given range.
| extend extractedDateString = extract("timestampUTC : (\\d{4}\\W\\d{2}\\W\\d{2})", 1, tostring(properties.instanceView.status.message)) // Extracting the date string for the LastUploadTimestamp. Is empty if none is found.
| extend extractedDateStringYear = split(extractedDateString, '/')[0], extractedDateStringMonth = split(extractedDateString, '/')[1], extractedDateStringDay = split(extractedDateString, '/')[2] // Identifying each of the parts of the date that was extracted from the message.
| extend extractedDate = todatetime(strcat(extractedDateStringYear,"-",extractedDateStringMonth,"-",extractedDateStringDay,"T00:00:00Z")) // Converting to a datetime object and rewriting string into ISO format because todatetime() does not work using the previous format.
| extend isNotInDateRange = not(extractedDate >= startDate and extractedDate <= endDate) // Created bool which is true if the date we extracted from the message is not within the specified range. This bool will also be true if the date was not found in the message.
| where properties.instanceView.status.message !contains "SQL Server Extension Agent: Healthy" // Begin searching for unhealthy extensions using the following 1. Does extension report being healthy. 2. Is last upload within the given range. 3. Is the upload status in an OK state. 4. Is provisioning state not in a succeeded state.
or isNotInDateRange
or properties.instanceView.status.message !contains "uploadStatus : OK"
or properties.provisioningState != "Succeeded"
or MachineStatus != "Connected"
| extend FailureReasons = strcat( // Makes a String to list all the reason that this resource got flagged for
iif(MachineStatus != "Connected",strcat("- Machine's status is ", MachineStatus," -"),"") ,
iif(MachineErrors != "[]","- Machine reports errors -", ""),
iif(properties.instanceView.status.message !contains "SQL Server Extension Agent: Healthy","- Extension reported unhealthy -",""),
iif(isNotInDateRange,"- Last upload outside acceptable range -",""),
iif(properties.instanceView.status.message !contains "uploadStatus : OK","- Upload status is not reported OK -",""),
iif(properties.provisioningState != "Succeeded",strcat("- Extension provisiong state is ", properties.provisioningState," -"),"")
| extend RecommendedAction = //Attempt to Identify RootCause based on information gathered, and point customer to what they should investigate first.
iif(MachineStatus == "Disconnected", "Machine is disconnected. Please reconnect the machine.",
iif(MachineStatus == "Expired", "Machine cert is expired. Go to the machine on the Azure portal for more information on how to resolve this issue.",
iif(MachineStatus != "Connected", strcat("Machine status is ", MachineStatus,". Investigate and resolve this issue."),
iif(MachineProvisioningStatus != "Succeeded", strcat("Machine provisioning status is ", MachineProvisioningStatus, ". Investigate and resolve machine provisioning status"),
iff(MachineErrors != "[]", "Machine is reporting errors. Investigate and resolve machine errors",
iif(properties.provisioningState != "Succeeded", strcat("Extension provisioning status is ", properties.provisioningState,". Investigate and resolve extension provisioning state."),
iff(properties.instanceView.status.message !contains "SQL Server Extension Agent:" and properties.instanceView.status.message contains "SQL Server Extension Agent Deployer", "SQL Server extension employer ran. However, SQL Server extension seems to not be running. Verify that the extension is currently running.",
iff(properties.instanceView.status.message !contains "uploadStatus : OK" or isNotInDateRange or properties.instanceView.status.message !contains "SQL Server Extension Agent: Healthy", "Extension reported as unhealthy. View FailureReasons and LastExtensionStatusMessage for more information as to the cause of the failure.",
"Unable to recommend actions. Please view FailureReasons."
| project ID = id, MachineName, ResourceGroup = resourceGroup, SubscriptionID = subscriptionId, Location = location, RecommendedAction, FailureReasons, LicenseType = properties.settings.LicenseType,
LastReportedExtensionHealth = iif(properties.instanceView.status.message !contains "SQL Server Extension Agent: Healthy", "Unhealthy", "Healthy"),
LastExtensionUploadTimestamp = iif(indexof(properties.instanceView.status.message, "timestampUTC : ") > 0,
substring(properties.instanceView.status.message, indexof(properties.instanceView.status.message, "timestampUTC : ") + 15, 10),
"no timestamp"),
LastExtensionUploadStatus = iif(indexof(properties.instanceView.status.message, "uploadStatus : OK") > 0, "OK", "Unhealthy"),
ExtensionProvisioningState = properties.provisioningState,
MachineStatus, MachineErrors, MachineProvisioningStatus,MachineId = machineId,
LastExtensionStatusMessage = properties.instanceView.status.message
若要識別可能的問題,請檢閱 RecommendedAction 或 FailureReasons 欄位中的值。 RecommendedAction 欄位提供了解決問題的可能第一步或應首先檢查的線索。 FailureReasons 欄列出了資源被視為不健康的原因。 最後,檢查 LastExtensionStatusMessage,以查看代理程式最後回報的訊息。
建議的動作
動作詳細數據
計算機憑證已過期。
如需如何解決此問題的詳細資訊,請移至 Azure 入口網站上的電腦。
已啟用 Arc 的電腦必須重新上線至 Arc,因為用來向 Azure 驗證的憑證已過期。 在 Azure 入口網站中,Arc 機器狀態是 已過期。 請遵循此處的文件 來卸載代理程式, 然後重新加入 。 如果您重新上線,就不需要刪除入口網站中已啟用Arc的SQL Server資源。 只要啟用 自動上線,SQL 擴充功能就會再次自動安裝(預設值)。
計算機已中斷連線。
重新連線電腦。
Arc 機器位於 state = Disconnected
中。 此狀態可能基於各種原因:
已停止、停用或持續當機的Arc連線機器代理程式
或
代理程式與 Azure 之間的連線遭到封鎖。
檢查Arc連線機器服務/精靈的狀態,以確定它們已啟用並執行。
檢查連線。
使用詳細資訊記錄檔對代理程式進行疑難解答。
擴充功能回報為不正常。
如需失敗原因的詳細資訊,請檢視 FailureReasons 和 LastExtensionStatusMessage。
上次上傳超出可接受的範圍(在過去三天內)。
檢查 LastExtensionUploadTimestamp 欄位。 如果 不是時間戳,則永遠不會向 Azure 回報清查或使用量數據。
排除資料處理服務和遙測端點的連接問題。
如果最後一個上傳超出可接受的範圍(在過去三天內),其他所有專案看起來都沒問題,例如 LastExtensionUploadStatus、 ExtensionProvisioningState 和 MachineStatus,則可能會停止適用於 SQL Server 服務的 Azure 擴充功能/精靈。 找出它停止的原因,並再次啟動它。 檢查 LastExtensionStatusMessage 以尋找與問題相關的其他線索。
擴充套件配置狀態 失敗。
調查並解決延伸模組布建狀態。
SQL 擴充功能的初始安裝或更新失敗。針對適用於 SQL Server 部署的 Azure 擴充功能進行疑難解答。
檢查 LastExtensionStatusMessage中的值。
上傳狀態未報告正常
檢查儀錶板中的 LastExtensionMessage 數據行,並查看 uploadStatus 值,以及 uploadMessage 值(如果有,視版本而定)。
uploadStatus 值通常是 HTTP 錯誤碼。 檢閱 針對錯誤碼進行疑難解答。
uploadMessage 可能會有更具體的資訊。
解決數據處理服務和遙測端點的連線問題。
擴充功能配置狀態 更新
或
擴充元件配布狀態 建立中
或
擴充功能布建狀態 失敗
或
擴充功能設定狀態 刪除
如果指定的擴充功能保留在狀態中的一個超過 30 分鐘,則有可能布建會發生問題。 卸載延伸模組,然後使用 CLI 或入口網站重新安裝。 如果問題持續發生,檢查部署器和擴充套件日誌。
如果刪除失敗,請嘗試卸載代理程式,並視需要在入口網站中刪除Arc機器資源,然後重新部署。
請遵循此處的文件 來卸載代理程式, 然後重新加入 。
識別狀況不良的擴充功能 (PowerShell)
此範例在 PowerShell 執行。 此範例會傳回與上一個查詢相同的結果,但透過PowerShell腳本傳回。
# PowerShell script to execute an Azure Resource Graph query using Azure CLI
# where the extension status is unhealthy or the extension last upload time isn't in this month or the previous month.
# Requires the Az.ResourceGraph PowerShell module
# Login to Azure if needed
#az login
# Define the Azure Resource Graph query
$query = @"
resources
| where type == "microsoft.hybridcompute/machines/extensions"
| where properties.type in ("WindowsAgent.SqlServer", "LinuxAgent.SqlServer")
| extend targetMachineName = tolower(tostring(split(id, '/')[8])) // Extract the machine name from the extension's id
| join kind=leftouter (
resources
| where type == "microsoft.hybridcompute/machines"
| project machineId = id, MachineName = name, subscriptionId, LowerMachineName = tolower(name), resourceGroup , MachineStatus= properties.status , MachineProvisioningStatus= properties.provisioningState, MachineErrors = properties.errorDetails //Project relevant machine health information.
) on $left.targetMachineName == $right.LowerMachineName and $left.resourceGroup == $right.resourceGroup and $left.subscriptionId == $right.subscriptionId // Join Based on MachineName in the id and the machine's name, the resource group, and the subscription. This join allows us to present the data of the machine as well as the extension in the final output.
| extend statusExpirationLengthRange = 3d // Change this value to change the acceptable range for the last time an extension should have reported its status.
| extend startDate = startofday(now() - statusExpirationLengthRange), endDate = startofday(now()) // Get the start and end position for the given range.
| extend extractedDateString = extract("timestampUTC : (\\d{4}\\W\\d{2}\\W\\d{2})", 1, tostring(properties.instanceView.status.message)) // Extracting the date string for the LastUploadTimestamp. Is empty if none is found.
| extend extractedDateStringYear = split(extractedDateString, '/')[0], extractedDateStringMonth = split(extractedDateString, '/')[1], extractedDateStringDay = split(extractedDateString, '/')[2] // Identifying each of the parts of the date that was extracted from the message.
| extend extractedDate = todatetime(strcat(extractedDateStringYear,"-",extractedDateStringMonth,"-",extractedDateStringDay,"T00:00:00Z")) // Converting to a datetime object and rewriting string into ISO format because todatetime() does not work using the previous format.
| extend isNotInDateRange = not(extractedDate >= startDate and extractedDate <= endDate) // Created bool which is true if the date we extracted from the message is not within the specified range. This bool will also be true if the date was not found in the message.
| where properties.instanceView.status.message !contains "SQL Server Extension Agent: Healthy" // Begin searching for unhealthy extensions using the following 1. Does extension report being healthy. 2. Is last upload within the given range. 3. Is the upload status in an OK state. 4. Is provisioning state not in a succeeded state.
or isNotInDateRange
or properties.instanceView.status.message !contains "uploadStatus : OK"
or properties.provisioningState != "Succeeded"
or MachineStatus != "Connected"
| extend FailureReasons = strcat( // Makes a String to list all the reason that this resource got flagged for
iif(MachineStatus != "Connected",strcat("- Machine's status is ", MachineStatus," -"),"") ,
iif(MachineErrors != "[]","- Machine reports errors -", ""),
iif(properties.instanceView.status.message !contains "SQL Server Extension Agent: Healthy","- Extension reported unhealthy -",""),
iif(isNotInDateRange,"- Last upload outside acceptable range -",""),
iif(properties.instanceView.status.message !contains "uploadStatus : OK","- Upload status is not reported OK -",""),
iif(properties.provisioningState != "Succeeded",strcat("- Extension provisiong state is ", properties.provisioningState," -"),"")
| extend RecommendedAction = //Attempt to Identify RootCause based on information gathered, and point customer to what they should investigate first.
iif(MachineStatus == "Disconnected", "Machine is disconnected. Please reconnect the machine.",
iif(MachineStatus == "Expired", "Machine cert is expired. Go to the machine on the Azure portal for more information on how to resolve this issue.",
iif(MachineStatus != "Connected", strcat("Machine status is ", MachineStatus,". Investigate and resolve this issue."),
iif(MachineProvisioningStatus != "Succeeded", strcat("Machine provisioning status is ", MachineProvisioningStatus, ". Investigate and resolve machine provisioning status"),
iff(MachineErrors != "[]", "Machine is reporting errors. Investigate and resolve machine errors",
iif(properties.provisioningState != "Succeeded", strcat("Extension provisioning status is ", properties.provisioningState,". Investigate and resolve extension provisioning state."),
iff(properties.instanceView.status.message !contains "SQL Server Extension Agent:" and properties.instanceView.status.message contains "SQL Server Extension Agent Deployer", "SQL Server extension employer ran. However, SQL Server extension seems to not be running. Verify that the extension is currently running.",
iff(properties.instanceView.status.message !contains "uploadStatus : OK" or isNotInDateRange or properties.instanceView.status.message !contains "SQL Server Extension Agent: Healthy", "Extension reported as unhealthy. View FailureReasons and LastExtensionStatusMessage for more information as to the cause of the failure.",
"Unable to recommend actions. Please view FailureReasons."
| project ID = id, MachineName, ResourceGroup = resourceGroup, SubscriptionID = subscriptionId, Location = location, RecommendedAction, FailureReasons, LicenseType = properties.settings.LicenseType,
LastReportedExtensionHealth = iif(properties.instanceView.status.message !contains "SQL Server Extension Agent: Healthy", "Unhealthy", "Healthy"),
LastExtensionUploadTimestamp = iif(indexof(properties.instanceView.status.message, "timestampUTC : ") > 0,
substring(properties.instanceView.status.message, indexof(properties.instanceView.status.message, "timestampUTC : ") + 15, 10),
"no timestamp"),
LastExtensionUploadStatus = iif(indexof(properties.instanceView.status.message, "uploadStatus : OK") > 0, "OK", "Unhealthy"),
ExtensionProvisioningState = properties.provisioningState,
MachineStatus, MachineErrors, MachineProvisioningStatus,MachineId = machineId,
LastExtensionStatusMessage = properties.instanceView.status.message
# Execute the Azure Resource Graph query
$result = Search-AzGraph -Query $query
# Output the results
$result | Format-Table -Property ExtensionHealth, LastUploadTimestamp, LastUploadStatus, Message
若要識別可能的問題,請檢閱 RecommendedAction 或 FailureReasons 欄位中的值。 RecommendedAction 欄位提供了解決問題的可能第一步或應首先檢查的線索。 FailureReasons 欄列出了資源被視為不健康的原因。 最後,檢查 LastExtensionStatusMessage,以查看代理程式最後回報的訊息。
識別遺漏更新的延伸項目
識別沒有最新狀態更新的擴充功能。 此查詢會傳回 SQL Server 的 Azure 延伸項目清單,而此清單會依延伸項目上次更新其狀態後的天數排序。 值 '-1' 表示延伸模組當機,且在延伸模組狀態中有呼叫堆疊。
// Show the timestamp extracted
// If an extension has crashed (i.e. no heartbeat), fill timestamp with "1900/01/01, 00:00:00.000"
resources
| where type =~ 'microsoft.hybridcompute/machines/extensions'
| extend extensionStatus = parse_json(properties).instanceView.status.message
| extend timestampExtracted = extract(@"timestampUTC\s*:\s*(\d{4}/\d{2}/\d{2}, \d{2}:\d{2}:\d{2}\.\d{3})", 1, tostring(extensionStatus))
| extend timestampNullFilled = iff(isnull(timestampExtracted) or timestampExtracted == "", "1900/01/01, 00:00:00.000", timestampExtracted)
| extend timestampKustoFormattedString = strcat(replace(",", "", replace("/", "-", replace("/", "-", timestampNullFilled))), "Z")
| extend agentHeartbeatUtcTimestamp = todatetime(timestampKustoFormattedString)
| extend agentHeartbeatLagInDays = datetime_diff('day', now(), agentHeartbeatUtcTimestamp)
| project id, extensionStatus, agentHeartbeatUtcTimestamp, agentHeartbeatLagInDays
| limit 100
| order by ['agentHeartbeatLagInDays'] asc
此查詢會傳回依延伸項目上次更新其狀態後的天數分組的延伸項目的計數。 值為 '-1' 表示延伸模組崩潰,且延伸模組狀態中有呼叫堆疊。
// Aggregate by timestamp
// -1: Crashed extension with no heartbeat, we got a stacktrace instead
// 0: Healthy
// >1: Stale/Offline
resources
| where type =~ 'microsoft.hybridcompute/machines/extensions'
| extend extensionStatus = parse_json(properties).instanceView.status.message
| extend timestampExtracted = extract(@"timestampUTC\s*:\s*(\d{4}/\d{2}/\d{2}, \d{2}:\d{2}:\d{2}\.\d{3})", 1, tostring(extensionStatus))
| extend timestampNullFilled = iff(isnull(timestampExtracted) or timestampExtracted == "", "1900/01/01, 00:00:00.000", timestampExtracted)
| extend timestampKustoFormattedString = strcat(replace(",", "", replace("/", "-", replace("/", "-", timestampNullFilled))), "Z")
| extend agentHeartbeatUtcTimestamp = todatetime(timestampKustoFormattedString)
| extend agentHeartbeatLagInDays = iff(agentHeartbeatUtcTimestamp == todatetime("1900/01/01, 00:00:00.000Z"), -1, datetime_diff('day', now(), agentHeartbeatUtcTimestamp))
| summarize numExtensions = count() by agentHeartbeatLagInDays
| order by numExtensions desc
升級延伸模組版本
如需升級 SQL Server Azure 擴充功能的詳細資訊,請參閱 升級擴充功能。
若要取得目前的擴充功能版本,請檢閱 版本資訊 - 透過 Azure Arc 啟用的 SQL Server。