A Battle of Persistence: Truimphing over 73 Incremental Failures, From a Windows Developer.

A Battle of Persistence: Truimphing over 73 Incremental Failures,  From a Windows Developer.

Have you ever been coerced into doing a task with the promise of "oh don't worry, It'll be simpLe." An opporunity to take on some more responsibility? ABsoluTely ! Time to tackle this task with all the energy and enthusasm in the world. But Oh Man; did this this task blow up. Not only on me, but the entire team, and potentially countless organizations that use AWS for hosting their products. With a set-in-stone go live date 2 weeks into the future, buckle up to find out how this simple task almost crippled a product launch schedule.

Defining the Problem

I know web application software is the rage these days but believe it or not other software stacks do exist. In this case, we're talking about Windows Driver Software, so how do we sell this to customers? The old fashion way is to buy a software license and download the driver onto your machine, but in the era of cloud computing -- some customers want the ability to spin up Virtual Machines with the software already installed. This is where Cloud Marketplace's come into play, an online store that allows end-users to purchase virtual machines with specific software already installed.

Start selling in AWS Marketplace

For our purposes, we'll be putting the software on a fresh version of Windows Server 2016, 2019, and 2022 by utilizing AWS cloudformation templates. When a customer buys our product in the marketplace, the marketplace vendor will create a virtual machine with the specific OS, and run a series of YAML steps to build the Software Environment to the end-users specification.

Cloudformation Logo

So the overall goal was pretty simple, given our new updated driver version, update the existing Cloudformation templates to deploy say... MyDriver v3.1.0 instead of MyDriver v3.0.0.

To my colleagues credit, this is normally a pretty trivial task... however... there was one sneaky trick lurking that would make this update much... more... difficult. An AWS service software update.

Getting Started -- Deploying Software

Deploying stuff onto AWS is new to me, I don't come from a DevOps background, and when it comes to Cloud Platforms my professional experience is almost exclusively on Azure, but thanks to YAML syntax and some ChatGPT, I picked it up pretty quick. Let me walk you through how I started getting our products ready for launch:

POV Me:

  • Step 1: Alright, Let' install our Driver on some fresh versions of Windows Server 2016, 2019, 2022.

  • Step 2: Check to make sure windows is up to date

  • Step 3: Check the event-log.. make sure the Software is running okay... Awesome Alright!

  • Step 4: Configure some settings in EC2Launch so that the next time the machine boots up, it has a random password, Administrative account, and other niche items.

  • Step 5: Create a new AMI Image for the 3 machines from above

  • Step 6: Associate the AMI's with our AWS Marketplace products...

  • Step 7: Test our cloudformation stack deployments with the new AMI and.... ERROR: AWS::SSMDocument, Failed to Initialize Secondary Disk 'D:\' for Windows Server 2022

Oh no... That wasn't on the deployment manual. Let me try it one more time just to make sure I'm not losing my sanit---

ERROR: AWS::SSMDocument, Failed to Initialize Secondary Disk 'D:\' for Windows Server 2022

Hmm... okay ladies and gents. This is the point of critical divergence. An uncalculated technical issue. A point when you realize 'I do not fully understand the scope of the problem'. How are we going to proceed?

Information Gathering -- Creating a Context

It's 2024 so Naturally, I copy + paste my error message into Google. There's a great video the philosphy of how to gather information about technical issues on youtube.

I find out that Windows Server 2022 has an updated version of the EC2Launch tool mentioned earlier.

EC2Launch is software comes with any Windows Server virtual machine that is created on AWS. It performs a variety of tasks to prepare an instance for use -- making it easier to manage and configure Windows instances in the cloud. And it just received a whole new update that replaced scripts with a configuration GUI.

This is a vital part of setting up windows services with custom software and allows developers to communicate to AWS how we want the Virtual Machines to boot up. As a Driver Developer, we need to perform some pretty niche tasks on these AWS machines. It was now my job to decisively solve how we were going to support deploying AWS EC2 Windows Server 2022 instances going forward.

Decision Making Time!

So we now know why we were receiving that issue. A Powershell script we used to rely on (EC2Launch v1), was no longer present on the Windows Server 2022 box (EC2Launch v2). So naturally, we have 2 choices to make:

  1. Conform to this new version of EC2Launch v2

  2. Handle the Logic ourselves.

Choice 1 sounded appealing, after all, additional code on our end is additional technical debt. I read the documentation supplied and was surprised at how it didn't work as expected. I would have preferred to take the time to learn how this tool worked -- after all I was most likely using it wrong, but with limited time on the clock, I didn't want to risk the potential for the tool falling short.

Which leads me to option 2. What are we trying to do? According to the error message above, we're trying to initialize our D:\ drive (An attached EBS Volume) on Windows Startup.

Let's open up one of those powershell scripts off of Windows Server 2019 and see what it does, below is the psuedo code for InitializeDisk.ps1 :

##### DISCLAIMER: This is psuedo code for what the entire script is doing. 
... 
foreach ($disk in (Get-CimInstance -ClassName Win32_DiskDrive)){

  $DiskIndex = $disk.Index
  $disk = Get-Disk -Number $DiskIndex
  Initialize-Disk -Number $DiskIndex -PartitionStyle MBR | Out-Null

  # Create a partition with the given drive index and letter.
  partition = New-Partition $DiskIndex -MbrType IFS -DriveLetter $driveLetter -UseMaximumSize -IsActive

  # Check if volume is formatted for the disk.
  # If volume is not in OK status, we need to format the volume with given parameters.
  $formatted = Get-Volume -Partition $partition
  if (-not $formatted -or $formatted.OperationalStatus -ne "OK")
  {
    Write-Log "Formatting the volume ..."
  # Format the volume on the created/exsiting partition using the partition reference.
    if ($IsEphemeral)
    {
      $formatted = Format-Volume -Partition $partition -FileSystem NTFS -NewFileSystemLabel "Temporary Storage $($EphemeralCount)" -Confirm:$False
    }
    else
    {
      $formatted = Format-Volume -Partition $partition -FileSystem NTFS -Confirm:$False
    }
  }

}
...

That's a summarized verision of what's happening, but you can imagine the logic is pretty sound when it comes to Windows Systems.

Get each drive, initialize the Volume as MBR, Create a Partition, and Format the disk with that partition. I don't know how EC2 Launch v2 works but I definately know how these powershell scripts work and that is some solid logic I can get behind.

TLDR, we attempted Choice 1 -- didn't have any initial luck, checked the doability of step 2, and were satisified with the simplicity of the logic to handle this ourselves.

Solving the Problem

We have a clear goal in mind. Getting Disks to Initialize automatically on Windows Server 2022. There's an initutive way to run powershell-commands in yaml using an AWS::SSMDocument resource in cloudformation templates.

...  
  MySSMDocument:
    Type: AWS::SSM::Document
    Properties:
      DocumentType: Automation
      Content:
        schemaVersion: "0.3"
        description: Run PowerShell commands on Windows instance
        mainSteps:
          - name: InitializeDisk
            action: aws:runCommand
            inputs:
              DocumentName: AWS-RunPowerShellScript
              InstanceIds:
                - '{{ Instance ID of my Windows Server 2022 resource}}'
              Parameters:
                commands:
                  - |
                    '< Insert Powershell commands here >'
                    ./myInitializeDiskScript.ps1    <-- Example

Did I mention I'm new to creating resources on AWS? This... this is where the war began... and where the title got it's name from.

If I was confident in my AWS logic, I could have copied + pasted some of this code into our deployment stack and tested it this way. However, using this method, it takes approximately 2 hours to get results back on whether or not the Script would pass.

This is how long our production level cloudformation template takes to get initialized. I needed something quick. Something iterable to get my results and know I was on the right track. This led me to the decision to make my own AWS Cloudformation template with the following resources.

  • An EC2 Instance, configured to use Windows Server 2022

  • An EBC Volume to attach onto the instance

  • An SSM Document for automating the Launch of my powershell script

What I didn't anticipate is that I would also need....

  • A VPC, Subnet, Security Group

  • IAM::Role

  • IAM::InstanceProfile

  • SSMWaitCondition

  • SSMWaitHandle

Trial and error... Trial and Error... Finally all the resources spun up successfully. I was then able to repetivity review the results of the automated execution document with this simple CLI command.

aws ssm describe-automation-executions --profile <profile> --region <region>  --max-items 5

Its a pretty handy command and will get your most recent SSM::Document executions with a failure / success message. It took alot but finally... we had a successful deployment with a Second Disk initialized.

K and Roy Batty...BLADERUNNER...only now has it it now dawned on me that K  "retired" similarly as Roy Batty had. Wonderful. : r/bladerunner

(Literally me after successful disk initialization).

Conclusion.

So why take the time to write a blog post about a seemingly straightforward task like creating a CloudFormation template? It's about more than just the task—it's about the embodyment of a principle crucial to those of us in Software, DevOps, and Security Engineering. The Virtue of Persistence. Everyday our teams are faced with challenges where we don't exactly know the solution. We have ideas... principles... and foundational knowledge in our domains. By using these three things with a clear end goal -- we can endure and battle our way to a solution with persistence. I wasn't expecting to run into an issue with EC2Launch... I wasn't ready to have a Powershell script disapear on me. However, with a deadline approaching and decisive decision making we were able to conjure a solution to fit our business needs. This is ultimately the software development lifecycle. I share this story in hopes of connecting with others who understand that while we may not have ALL the answers, with a little bit of persistence, we can surmount unforeseen challenges.

Cheers! Till next time!