未加星标

Azure Virtual Machine Internals Part 1

字体大小 | |
[系统(windows) 所属分类 系统(windows) | 发布者 店小二05 | 时间 2017 | 作者 红领巾 ] 0人收藏点击收藏
Introduction

The Azure cloud services are composed of elements from Compute, Storage, and Networking. The compute building block is a Virtual Machine (VM), which is the subject of discussion in this post. Web search will yield large amounts of documentation regarding the commands, APIs and UX for creating and managing VMs. This is not a 101 or ‘How to’ and the reader is for the most part expected to already be familiar with the topics of VM creation and management. The goal of this series is to look at what is happening under the covers as a VM goes thru its various states.

Azure provides IaaS and PaaS VMs; in this post when we refer to a VM we mean the IaaS VM. There are two control plane stacks in Azure, Azure Service Management (ASM) and Azure Resource Manager (ARM). We will be limiting ourselves to ARM since it is the forward looking control plane.

ARM exposes resources like VM, NIC but in reality ARM is a thin frontend layer and the resources themselves are exposed by lower level resource providers like Compute Resource Provider (CRP), Network Resource Provider (NRP) and Storage Resource Provider (SRP). Portal calls ARM which in turn calls the resource providers.

Getting Started

For most of the customers, their first experience creating a VM is in the Azure Portal . I did the same and created a VM of size ‘Standard DS1 v2’ in the West US region. I mostly stayed with the defaults that the UI presented but chose to add a ‘CustomScript’ extension. When prompted I provided a local file ‘Sample.ps’ as the PowerShell script for the ‘CustomScript’ extension. The PS script itself is a single line Get-Process.

The VM provisioned successfully but the overall ARM template deployment failed (bright red on my Portal dashboard). Couple clicks showed that the ‘CustomScript’ extension had failed and the Portal showed this message:

{
"status": "Failed",
"error": {
"code": "ResourceDeploymentFailure",
"message": "The resource operation completed with terminal provisioning state 'Failed'.",
"details": [
{
"code": "DeploymentFailed",
"message": "At least one resource deployment operation failed. Please list deployment operations for details. Please see https://aka.ms/arm-debug for usage details.",
"details": [
{"code": "Conflict","message": "{\r\n \"status\": \"Failed\",\r\n \"error\": {\r\n \"code\": \"ResourceDeploymentFailure\",\r\n \"message\": \"The resource operation completed with terminal provisioning state 'Failed'.\",\r\n \"details\": [\r\n {\r\n \"code\": \"VMExtensionProvisioningError\",\r\n \"message\": \"VM has reported a failure when processing extension 'CustomScriptExtension'. Error message: \\\"Finished executing command\\\".\"\r\n }\r\n ]\r\n }\r\n}"
}
]
}
]
}
}

It wasn’t immediately clear what had gone wrong. We can dig from here and as is often true, failures teach us more than successes.

I RDPed to the just provisioned VM. The logs for the VM Agent are in C:\windowsAzure\Logs. The VM Agent is a system agent that runs in all IaaS VMs (customers can opt out if they would like). The VM Agent is necessary to run extensions. Let’s peek into the logs for the CustomScript Extension:

C:\WindowsAzure\Logs\Plugins\Microsoft.Compute.CustomScriptExtension\1.8\CustomScriptHandler
[1732+00000001] [08/14/2016 06:19:17.77] [INFO] Command execution task started. Awaiting completion...
[1732+00000001] [08/14/2016 06:19:18.80] [ERROR] Command execution finished. Command exited with code: -196608

The fact that the failure logs are cryptic hinted that something catastrophic had happened. So I re-looked at my input and realized that I had the file extension for the PS script wrong. I had it as Sample.ps when it should have been Sample.ps1. I updated the VM this time specifying the script file with the right extension. This succeeded as shown by more records appended to the log file mentioned above.

[3732+00000001] [08/14/2016 08:42:24.04] [INFO] HandlerSettings = ProtectedSettingsCertThumbprint: , ProtectedSettings: {}, PublicSettings: {FileUris: [https://iaasv2tempstorewestus.blob.core.windows.net/vmextensionstemporary-10033fff801becb5-20160814084130535/simple.ps1?sv=2015-04-05&sr=c&sig=M3qa7lS%2BZwp%2B8Tytqf1VEew4YaAKvvYn1yzGrPfSwyw%3D&se=2016-08-15T08%3A41%3A30Z&sp=rw], CommandToExecute: powershell -ExecutionPolicy Unrestricted -File simple.ps1 }
[3732+00000001] [08/14/2016 08:42:24.04] [INFO] Downloading files specified in configuration...
[3732+00000001] [08/14/2016 08:42:24.05] [INFO] DownloadFiles: fileUri = https://iaasv2tempstorewestus.blob.core.windows.net/vmextensionstemporary-10033fff801becb5-20160814084130535/simple.ps1?sv=2015-04-05&sr=c&sig=M3qa7lS+Zwp+8Tytqf1VEew4YaAKvvYn1yzGrPfSwyw=&se=2016-08-15T08:41:30Z&sp=rw
[3732+00000001] [08/14/2016 08:42:24.05] [INFO] DownloadFiles: Initializing CloudBlobClient with baseUri = https://iaasv2tempstorewestus.blob.core.windows.net/
[3732+00000001] [08/14/2016 08:42:24.22] [INFO] DownloadFiles: fileDownloadPath = Downloads\0
[3732+00000001] [08/14/2016 08:42:24.22] [INFO] DownloadFiles: asynchronously downloading file to fileDownloadLocation = Downloads\0\simple.ps1
[3732+00000001] [08/14/2016 08:42:24.24] [INFO] Waiting for all async file download tasks to complete...
[3732+00000001] [08/14/2016 08:42:24.29] [INFO] Files downloaded. Asynchronously executing command: 'powershell -ExecutionPolicy Unrestricted -File simple.ps1 '
[3732+00000001] [08/14/2016 08:42:24.29] [INFO] Command execution task started. Awaiting completion...
[3732+00000001] [08/14/2016 08:42:25.29] [INFO] Command execution finished. Command exited with code: 0

The CustomScript extension takes a script file which can be provided as a file in a Storage blob. Portal offers a convenience where it accepts a file from the local machine. I had provided Simple.ps1 which was in my \temp folder. Behind the scenes Portal uploads the file to a blob, generates a shared access signature (SAS) and passes it on to CRP. From the logs above you can see that URI .

This URI is worth understanding. It is a Storage blob SAS with the following attributes for an account in West US (which is the same region where my VM is deployed):

se=2016-08-15T08:41:30Z means that the SAS is valid until that time (UTC). Comparing it to the timestamp on the corresponding record in log (08/14/2016 08:42:24.05) it is clear that the SAS is being generated for a period of 24 hours. Sr=c means that this is container level policy. Sp=rw means that the access is for both read and write. The shared access signature (SAS) has the full descriptions

I asserted above that this is a storage account in West US. That may be apparent from the naming of the storage account (iaasv2tempstorewestus) but is not a guarantee. So how can you verify that this storage account (or any other storage account) is in the region it claims to be in?

A simple nslookup on the blob DNS URL reveals this

C:\Users\yunusm>nslookup iaasv2tempstorewestus.blob.core.windows.net
Server: PK5001Z.PK5001Z
Address: 192.168.0.1
Non-authoritative answer:
Name: blob.by4prdstr03a.store.core.windows.net
Address: 40.78.112.72
Aliases: iaasv2tempstorewestus.blob.core.windows.net

The blob URL is a CNAME to a canonical DNS blob.by4prdstr03a.store.core.windows.net. Experimentation will show that more than one storage accounts maps to a single canonical DNS URL. The ‘by4’ in the name gives a hint to what region it is located. As per theAzure Regions page, the West US region is in California. Looking up the geo location of the IP address (40.78.112.72) indicates a more specific area within California.

Understanding the VM

Now that we have a healthy VM, let’s understand it more. As per the Azure VM Sizes page, this is the VM that I just created:

Size

CPU cores

Memory

NICs (Max)

Max. disk size

Max. data disks (1023 GB each)

Max. IOPS (500 per disk)

Max network bandwidth

Standard_D1_v2

1

3.5 GB

1

Temporary (SSD) =50 GB

2

2x500

moderate

This information can be fetched programmatically by doing aGET.

Returns this:

{
"name": "Standard_DS1_v2",
"numberOfCores": 1,
"osDiskSizeInMB": 1047552,
"resourceDiskSizeInMB": 7168,
"memoryInMB": 3584,
"maxDataDiskCount": 2
}
Doing a GET on the VM we created
Returns the following. Let’s understand this response in some detail. I have annotated inline comments preceded and followed by //
{
"properties": {
"vmId": "694733ec-46a0-4e0b-a73b-ee0863a0f12c",
"hardwareProfile": {
"vmSize": "Standard_DS1_v2"
},
"storageProfile": {
"imageReference": {
"publisher": "MicrosoftWindowsServer",
"offer": "WindowsServer",
"sku": "2012-R2-Datacenter",
"version": "latest"

The interesting field here is the version. Publishers can have multiple versions of the same image at any point of time. Popular images are revved typically on a monthly frequency with the security patches. Major new versions are released as new SKUs. The Portal has defaulted me to the latest version. As a customer, I can chose to pick a specific version as well, whether I deploy thru Portal or thru an ARM template using the CLI or REST API; the latter being the preferred method for automated scenarios. The problem with specifying a particular version is that it can render the ARM template fragile. The deployment will break if the publisher unpublishes that specific version in one or more regions, as a publisher can do. So unless there is a good reason not to, the preferred value for the version setting is latest. As an example, the following images of the SKU 2012-R2-Datacenter are currently in the WestUS region, as returned by the CLI command azure vm image list.

MicrosoftWindowsServer WindowsServer 2012-R2-Datacenter Windows 4.0.20151120 westus MicrosoftWindowsServer:WindowsServer:2012-R2-Datacenter:4.0.20151120

MicrosoftWindowsServer WindowsServer 2012-R2-Datacenter Windows 4.0.20151214 westus MicrosoftWindowsServer:WindowsServer:2012-R2-Datacenter:4.0.20151214

MicrosoftWindowsServer WindowsServer 2012-R2-Datacenter Windows 4.0.20160126 westus MicrosoftWindowsServer:WindowsServer:2012-R2-Datacenter:4.0.20160126

MicrosoftWindowsServer WindowsServer 2012-R2-Datacenter Windows 4.0.20160229 westus MicrosoftWindowsServer:WindowsServer:2012-R2-Datacenter:4.0.20160229

MicrosoftWindowsServer WindowsServer 2012-R2-Datacenter Windows 4.0.20160430 westus MicrosoftWindowsServer:WindowsServer:2012-R2-Datacenter:4.0.20160430

MicrosoftWindowsServer WindowsServer 2012-R2-Datacenter Windows 4.0.20160617 westus MicrosoftWindowsServer:WindowsServer:2012-R2-Datacenter:4.0.20160617

MicrosoftWindowsServer WindowsServer 2012-R2-Datacenter Windows 4.0.20160721 westus MicrosoftWindowsServer:WindowsServer:2012-R2-Datacenter:4.0.20160721

},
"osDisk": {
"osType": "Windows",
"name": "BlogWindowsVM",
"createOption": "FromImage",
"vhd": {
"uri": https://blogrgdisks562.blob.core.windows.net/vhds/BlogWindowsVM2016713231120.vhd

The OS disk is a page blob and starts out as a copy of the source image that the Publisher has published. Looking at the meta data of this blob and correlating it to what the VM itself has is instructive. Using the Cloud Explorer in Microsoft Visual Studio the blob’s property window:


Azure Virtual Machine Internals   Part 1

This is a regular page blob that is functioning as an OS disk over the network. You will observe that the Last Modified date pretty much stays with NOW() most of the time the reason being as long as the VM is running there are some flushes happening to the disk regularly. The size of the OS disk is 127 GB; the max allowed OS disk in Azure is 1 TB.

Azure Storage Explorer shows more properties for the same blob than the VS Cloud Explorer.


Azure Virtual Machine Internals   Part 1

The interesting properties are the Lease properties. It shows the blob as leased with an infinite duration. Internally to VM creation, when a page blob is configured to be an OS/data disk for a VM, we take a lease on that blob before attaching it to the VM. This is so that the blob for a running VM is not deleted out of band. If you see a disk-backing blob has no lease while it shows as attached to a VM then that would be an inconsistent state and will need to be repaired.

RDPing in the VM itself, we can see two drives mounted and the OS drive is about the same size as the page blob in Storage. The pagefile is on D drive; so that faulted pages are fetched locally rather than over the network from Blob Storage. The temporary storage can be lost in case of events that case a VM to be relocated to a different node.


Azure Virtual Machine Internals   Part 1
},
"caching": "ReadWrite"
},
"dataDisks": []

there are no data disks yet but we will add some soon

},
"osProfile": {
"computerName": "BlogWindowsVM",

The name we chose for the VM in Portal is the hostname as well. The VM is DHCP enabled and gains its DIP address thru DHCP. The VM is registered in an internal DNS and has a generated FQDN.

C:\Users\yunusm>ipconfig /all
Windows IP Configuration
Host Name . . . . . . . . . . . . : BlogWindowsVM
Primary Dns Suffix . . . . . . . :
Node Type . . . . . . . . . . . . : Hybrid
IP Routing Enabled. . . . . . . . : N

本文系统(windows)相关术语:三级网络技术 计算机三级网络技术 网络技术基础 计算机网络技术

分页:12
转载请注明
本文标题:Azure Virtual Machine Internals Part 1
本站链接:http://www.codesec.net/view/524134.html
分享请点击:


1.凡CodeSecTeam转载的文章,均出自其它媒体或其他官网介绍,目的在于传递更多的信息,并不代表本站赞同其观点和其真实性负责;
2.转载的文章仅代表原创作者观点,与本站无关。其原创性以及文中陈述文字和内容未经本站证实,本站对该文以及其中全部或者部分内容、文字的真实性、完整性、及时性,不作出任何保证或承若;
3.如本站转载稿涉及版权等问题,请作者及时联系本站,我们会及时处理。
登录后可拥有收藏文章、关注作者等权限...
技术大类 技术大类 | 系统(windows) | 评论(0) | 阅读(13)