It’s almost midnight on Wednesday and you are logged into a server waiting for the seconds to tick by to make that change to your production environment. It had been decided that the best time for the server outage was midnight until two in the morning. You are tired, after working a full day even though you were supposed to have the afternoon off because you would be working late. However, an issue popped up at three in the afternoon that caused you to work until almost six in the evening. You were able to spend a few precious hours with your family before putting them all to bed and now you have been pounding coffee since 10 PM so you can stay awake. You are hoping for a quick change at midnight that will maybe last ten minutes and then can go to bed so you can get enough sleep to make the 8 AM conference call in the morning. The change had been approved even though little testing information was provided to the change board. The project team had “tested” the change in their local environment but there really wasn’t any testing that closely mirrored production. They were able to convince the Change Board that there was little risk to the change so the change was approved.
The clock ticks to midnight and you prep the server to make the change. Everything looks like it is configured correctly, so you click on the Apply button and then trigger a restart of the server. The server boots back up and you can logon. You access the web site and see the sign on screen and assume that everything is working because you don’t have a password to get onto the system. You consider the change completed, log off the server and go to bed hoping you can get some sleep. Two hours later, your phone starts ringing. The Project Manager is on the phone frantically telling you that the web site is not working and asking what changes you made. After some discussion, you can determine that users can login to the site but once they login, there is an error that they cannot find the data. So, at 2:30 in the morning you hop back on the server and search frantically for something that might have caused the issue. There really isn’t any documentation on changes so you must resort to logs to see what has happened. Another two hours pass and now your manager is involved in the issue. They determine that you need to roll back the change you had made at midnight. You roll back the change, reboot the server and the application is still broken. After digging through the event logs you can see that quite a few patches were installed two days earlier but the server was never restarted. One of those patches updated the version of .Net framework on the server. The change didn’t take effect until you restarted the server the first time. You decide to remove that patch and restart the server and bingo the application starts working. It is now 6 AM and you have a call in two hours to discuss the change. You realize you won’t be getting any more sleep tonight so you start pounding more coffee to make it through the next day.
As an IT Pro or sysadmin, you can easily replace this story with something similar that you have personally experienced. Over the past few years, the industry has started learning from the major technology companies like Microsoft, Amazon, Netflix and Google as well as many other Internet startups that must manage hundreds or thousands of servers with very few IT staff. These learnings have brought about the DevOps movement and more specifically Infrastructure as Code. With Infrastructure as Code you can change your Infrastructure changes from clicks with a mouse to scripts that can be versioned and repeated as necessary. Infrastructure as Code allows Infrastructure changes to be:
Michael Greene from Microsoft and Steve Murawski from Chef Software have put together an excellent white paper called The Release Pipeline Model that describes some practical methods of using Infrastructure as Code as an IT Pro or sysadmin. Let’s dive into The Release Pipeline Model.
Overview of The Release Pipeline Model
The Release Pipeline Model takes the concepts from DevOps of a Continuous Integration/Continuous Delivery (CI/CD) pipeline model and translates it to an Infrastructure as Code delivery mechanism. The core concept is that ALL changes to your Infrastructure environment should be delivered through some sort of code or command line interface.
The Release Pipeline Model consists of four primary stages and answers the following questions about our Infrastructure. They include:
- Who change the environment?
- What did they change, exactly?
- When did the change occur?
- How will I catch problems at the earliest possible moment?
- Can elements be combined clearly to produce the correct results?
- How will I be notified of a problem?
- How do we check for regulatory issues?
- How do I know this change will not cause an outage?
- Will this change work across every variation I have in my environment?
- Does this configuration meet your business requirements?
- How do I make changes without granting long term administrative access?
- Does anyone need to sign-off before deployment?
- How do I keep services consistent across my environments?
- Can I integrate service management?
Source control is extremely important for IT Operations. There is a great quote in the DevOps Handbook: “In Puppet Labs’ 2014 Sate of DevOps report, the use of version control by Operations was the highest predictor of both IT performance and organizational performance. In fact, whether Ops used version control was a higher predictor for both IT performance and organizational performance than whether Dev used version control.”
Let’s face it left to our own devices, we often end up with source control that looks something like this:
Looking at those three files which one should you run?
Source control would let us know which version should be used and give us descriptions of all the changes to that script.
There are many great source control systems that are available. Chances are very good that your organization already has source control in their development environment that you might be able to use. I would highly encourage you to use a Git based source control system. If you want to learn more about git and source control, I put together this Sway presentation for our local user group in Austin. If you need to create your own source control environment there are several that you should consider including Visual Studio Team Foundation Server, Visual Studio Team Services, Github, and Gitlab.
So, what should you include in your source control? EVERYTHING except for passwords or certificates. Every PowerShell script, DSC Configuration file, Cisco configuration text file, etc. should all be in your source control
Going back to our story from earlier, if source control had been in place we would have known the patches were installed and who installed them and why they were installed. That would have saved several hours of troubleshooting when a failure occurred.
The concept of a Build is a little bit more difficult for an IT Pro to understand. When I think of build, I think about a process that returns an EXE or MSI file that can then be executed on a machine. In this case a build is essentially an orchestration process that takes your source control as it is checked in, runs tests on that source control, and then runs that source against an environment.
There are several great build systems available including Visual Studio Team Services, Jenkins, and Team City. However, all we really need our build system to do is be able to read from our source control environment and run a PowerShell script. There is a community driven solution for creating build scripts called Psake (pronounced like a drink sake). Your build system just needs to be able to read the output from Psake to determine if the build completed successfully.
The following are items that you should include in your build script:
- Linting – Checking your code for formatting issues or code that does not meet your organizations standards
- Testing – Both unit tests of your scripts and integration tests to deploy a test environment and verify the output
- Deploy/Release – Call a deployment script if the results from the tests pass
Here is an example of a build script available here:
Testing is the most crucial phase in The Release Pipeline. Without proper testing, you are essentially creating a more efficient way to create failures within your environment. There are four types of tests you should perform:
- Linting – Syntax checking or regulation checking to make sure you follow your organizations standards. For example, you could check if there are passwords inside of your code or that comments are set properly for your functions
- Unit Testing – This will test to make sure that all the code in your script performs the way you think it will. A good unit test will include both positive testing (success conditions) and negative testing (error conditions)
- Integration Testing – Builds a test environment and tests for functionality that can be automated. This would include items like authentication, accessing data, pulling data from other servers, etc.
- Acceptance Testing – This includes tests that need to be run manually by an owner of an application or service. If a manual item can be automated it should be moved to Integration testing,
Pester is another community provided PowerShell tool that can be used to for Linting, unit testing, and integration testing. Pester is included in both Windows 10 and Windows Server 2016. Pester provides a framework for running your tests and has become the standard for testing with PowerShell. Pester can even be used to verify your environment is functioning after deployment. Here is a great example of using Pester to test the health of Active Directory.
Testing is a process that will always be improving. You will not catch all potential errors when you start your release pipeline. The important thing is that once you run into a scenario that you weren’t testing you can modify your testing code to catch that condition the next time before you deploy.
Once all the tests have completed you are now ready to release your changes into the production environment. If you have a very high degree of confidence in your testing, you could have this completely automated through your Build process so that if all the tests pass you immediately deploy into production. This process is called Continuous Deployment. Since you are making Infrastructure changes chances are you will need to get approval before deploying into production. Your build process can stage everything for a production deployment and create a change ticket in a system like ServiceNow including the results of all your tests in the Change Ticket. That greatly simplifies your change board meetings because all of your changes are clearly visible in your source code and the results of your tests are easily identified. Once the change is approved it can be scheduled for automatic deployment
An automated release process also allows you to limit the number of administrators managing your environment. The pipeline process becomes the agent of change and you can limit admin access to just the process that runs the pipeline.
PSDeploy is another great community driven tool that allows you to automate the release phase. PSDeploy can move your source to a production location that has limited permissions and then your automation can pick up the script for performing the release.
Revisiting the Horror Story
How could the horror story we started with have changed using a release pipeline?
- Proper integration testing could have identified the issue with the patch prior to deployment in production
- The Change Board would have had a more accurate view of the impact to the change through testing
- You would not have to be awake at midnight to make the change. The pipeline would have deployed the change at a scheduled timeframe and sent out the results
- If a failure had occurred during deployment, the pipeline could have rolled back the change automatically and sent the results to the project team
In summary, you would have had been able to get some sleep and not worry about the change that was occurring and be awake for the 8 AM status call to review the results of the change
Moving from a manual release process to an automated release process may seem like a daunting task. Here are some steps that will help you get started:
- Learn PowerShell – If you haven’t already done so learn PowerShell. It is crucial to the future of your career
- Start using source control – Pick a source control system and start using it for all of your scripts
- Learn the community tools like Psake, Pester, and PSDeploy
When starting this process, start with a simple workload and completely automate the process for that workload. Develop a minimal viable product for your release pipeline and then build on top of it. You don’t need every possible test case initially. You can add more tests as they are needed.
Lastly, have fun automating your environment! With some effort, you are well on your way to getting back all of your nights and your weekends