Deep Dive into Terraform Testing: Advanced Integration Testing with the Terraform 'test' Command

Christoffer Windahl Madsen
Mar 27, 2024
13 min read

Introduction

Hello all and welcome yet again to a blog post about Terraform! This new post is a continuation of the last post that also talked about both "Unit & Integration" Testing using the new Terraform "test" Command. If you haven't read that post yet, I highly encourage you to take a peek at that first => https://www.codeterraform.com/post/how-to-use-the-terraform-test-command-for-use-in-unit-and-integrations-testing

Where the last post defined a lot of principles and simple use-cases, this blog post is aimed to be more technical and in-depth with the Terraform testing framework. We will more specifically be going through how to define highly dynamic integration tests, where we aim to deeply test our Terraform modules in different configuration scenarios to ensure that the code runs as expected. One thing to note about integration testing compared to unit testing is that these more "in-depth" tests are designed to be run on our module every time something "major" happens to the codebase, like trying to merge changes to some module code in a pull request. We do not want to run these tests via CI for every single commit as these tests often take multiple minutes to complete each time, and the value gathered from the actual tests does not justify the time spent in such scenarios.

With all this pretext out of the way, lets begin diving in - The below list summarizes what each section below is about:

Lets quickly recap on some core principels of the Terraform "test" Command which I talk about extensively in my first post about Terraform test, link to that is in the top.
Diving into the use of the specific folder strategy using sub-folders when defining a Terraform test setup & the reason this choice of structure is used.
Getting practical now, as we go through all the test code required to run a integration test on a Terrraform module.
Providing a visual representation of the Terraform test setup used in this post.
Continuing with being practical by actually running a real Terraform test and following the outcome.
Deep diving into the mechanics of errors within Terraform test which includes a matrix containing concrete error scenarios and how to solve them.
Sharing direct experience using the system while also giving general advise.
Wrap-up and final thoughts.

Please be aware that all code shared below is created in the context of the Terraform module "azurerm-vm-bundle" to simply make the entire post walk-through as practical as possible. ALL concepts described in this post can be directly reused in all your own and personal Terraform module code.

Futhermore - ALL code described can be cloned directly from github at => https://github.com/ChristofferWin/codeterraform.git Where the specific folder that will be interesting for this post is located at path "terraform projects\blog posts\post 1 - integration tests"

Part 1 - Recap of concepts and folder-structure strategy

Lets start with a very short but important recap about the basics of the new Terraform "test" Command and how such tests must be structured. We can boil it down to:

When running the executable as terraform test, Terraform will execute within the scope of the current folder and will NOT look in any subfolders unless specific paths are specified in the test file.
- This behavior is exactly the same as "terraform plan & apply"
A new file type has been introduced as part of the test framework called .tftest.hcl. This file contains the specific code defining the actual tests to run. Multiple of these files can be defined, which will then simply be squashed into one single test file on execution.
- This behavior is exactly the same as "terraform plan & apply" as these commands also allow multiple ".tf" files to be run at the same time.
Variables behave exactly the same as for any other Terraform command, as we can both define a "variables.tf", any number of .tfvars files, plus all the other possible ways of passing information to the tests being run. Adding to this, we can even scope variables to have specific values in specific tests, which only increases the level of flexibility we have available when defining our tests.

When getting started, I highly advise against creating multiple different test files in the root directory where you want to run the tests from, as this only increases complexity without bringing much value. The main reason for this, and why it differs from how I would recommend anyone to structure a normal Terraform folder for resource creation, is that defining tests are so specifically scoped that simply grouping them into one file creates a greater overview of all tests to run, expected results, and so on.

To the above, we will in some cases use a testing strategy, where resources are required ahead of time for our tests. In these scenarios, we will define a folder structure with sub-folders that will be called from the root test folder. The reason for this is, we can during integration testing define it so that one test runs Terraform code from a subfolder, creating resources which are then directly used by other tests. This speaks directly to the heart of integration testing and also makes the new testing framework super awesome and a life-saver for continuing to secure stability in module code changes.

Lets get more practical, shall we!

Part 2 - Folder strategy using sub-folders

Lets start looking at the more practical sides to the above concepts described in section 1. Below a "tree" Output from the folder "post 1 - integration tests" can be seen. All the different files will be explained in the detail below:

./post 1 - integration tests

├── integration_test.tf

├── integration_test.tftest.hcl

├── pre-deployment

│ ├── outputs.tf

│ ├── pre_deployment.tf

│ └── variables.tf

└── variables.tf

Starting from the top file "integration_test.tf"

The actual Terraform code that will be executed when we run the test command. The resources defined within this file can be run in different contexts, which we will define in the test file.
The "integration_test_tftest.hcl" file will contain the specific instructions on how to run the code defined within the ".tf" files.
A variables file in the root folder, which is an absolute standard for populating our module code, which can also be utilized by the upcoming tests to directly access these defined variables and overwrite values, adding to the flexibility of the testing framework.
1. With the above structure we need to have a variables file within the sub-folder of "pre-deployment" As well to populate all parameters required within "pre_deployment.tf"
An output file, which is also an absolute standard for defining which specific outputs this "new" Terraform codebase can produce. We only want an output file as it makes it easier to parse variable values between test runs - It's not a requirement as we can within the test file directly reference modules and their return output, but this approach will make our test cases more human-readable.

With all of that information out of the way, lets get serious and start looking at some awesome code!

Part 3 - Our first integration test (Code)

Since we use the folder structure as defined in section 2, we need to define two different Terraform resource definition files. The plan is for the script kept inside of the folder "pre-deployment" To have a module definition defined, where we call the "azurerm-vm-bundle" Module using all the different "create" Switches.

Inside "pre_deployment.tf" (Will be executed first within the test file)

As said about the "azurerm-vm-bundle" Module, we need to check the different modes of operation, and we start by defining all possible "create" Switches and a few vms. The values of the vms are not important for the example as the objects will only contain "name" and "os_name".

The variables file located together with "pre_deployment.tf" Contains all required input variables as they are seen being used in the module definition above. Some static values are also simply provided for convienence, as these will never have to change for this module definition.

Inside variables.tf (With all default values set directly, we do not have to worry about it in the test file later)

We also need to define the 'output.tf' file present in the same directory. The main reason for this is that we want to retrieve all the resource IDs from the 'pre_deployment' deployment as part of test 2 so that tests 1 and 2 are directly integrated.

Inside "outputs.tf" (Do not worry about the function wrappers, its simply to retrieve only string IDs instead of objects)

With the sub-folders 3 files out of the way, lets define the module definition which will end up using the IDs provided from module 1´s definition.

Inside "integration_tests.tf" (All ID parameters will have their corresponding input variable values parsed via the test file)

It's very important to realize from the above code that it seems unnatural to use "static" input variables to pass all the resource IDs, as we would normally either create a "local" variable capturing the return output from module call 1 or simply directly reference the module return on each of the parameters to pass to module call 2. However, specifically for this use case, this is the appropriate method, as we will see when we get into the test file.

A variables file is also defined for the Terraform file present in the root directory - We need these input variables defined otherwise it will be impossible for us to parse the IDs from test 1 inside of the test file.

Inside "variables.tf" (Only location is not "null" in the root file. The reason is all other variables will be either directly parsed or be statically defined inside of the test file)

So, to quickly summarize what we have delved into so far: A Terraform file has been created that defines two different calls to the same module. The idea is to have module 1 run first and let module 2 absorb all resource IDs from the created resources in module 1. This way, we end up testing the module in two completely different modes but also directly integrating the two different tests by letting one depend on the other.

Inside "integration_tests.tftest.hcl" (Defining our tests)

As seen in the above code, two tests are defined. By directly using the return output from test 1 to test 2, Terraform understands the underlying dependencies and builds its overall dependency graph accordingly. This behavior mirrors creating relationships between resources using any other direct return output from any resource definition. It also implies that when the plan executes and applies test 1, it is not destroyed before test 2 is completed. Additionally, note that we statically define values for both VM object types directly in the 'run' block. This is simply to showcase that it's just another configuration option for parsing values into tests.

Before showcasing the execution of our newly defined tests, what would the above code look like if we created a graphical representation?

Part 4 - Visual representation of our integration test

The above drawing describes not only the types of resources deployed as part of both tests but also the flow of the Terraform test engine. Further details regarding the actual deployment and destruction flow are provided in the next section, where we will execute the 'terraform test' command and follow the tests as they execute.

Part 5 - Executing the Terraform "test" Command on the described codebase

From the root folder of our test codebase, let's initialize and run our tests:

//Make sure to be in the correct folder
//terraform projects\blog posts\post 1 - integration tests
terraform terraform init

//Init output
terraform test //No extra arguments

//First test starts and will take around 5minutes to finish
//integration_tests.tftest.hcl... in progress

While the first test runs, we can check out how its going in Azure:

Picture of Azure resources deployed by Terraform

Because we didn't parse any arguments for the 'test' command, the next terminal output will not appear until everything is 100% provisioned from Test 1 or Test 1 fails. (Bastion always takes at least 8 minutes to deploy)

After several minutes the first test completes:

run "integration_test_1_create_apply"... pass

Only because a direct dependency has been established between Test 1 and Test 2, the infrastructure is not destroyed immediately after Test 1 completes. As depicted in the graphic above, the first test will remain 'alive' so that Test 2, which is now executing, can directly access all required resource IDs.

In this specific testing scenario, Test 2 does not take long and now reports:

run "integration_test_2_ids_apply"... pass
integration_tests.tftest.hcl... tearing down

Notice how immidiatly after the last test has "passed" Terraform begins to destroy the infrastructure automatically - Because we have linked the tests, test 2s vm additions are destroyed first, just like its also shown on the graphic.

The message reported after the destruction of all resources is complete:

Notice that even for the 'tearing down' of all resources, Terraform reports whether this was successful or not. If the destruction fails, a list will be produced specifying all the deployed objects that must be manually destroyed by us. We will cover more of this error behavior in the next section.

Finally, let's just double-check that everything is gone in Azure (so we do not get billed).

Example of how the blade "resources" Looks in Azure when empty

Well, these tests went well, and we proved that at least two of the modes of operation for the module 'azurerm-vm-bundle' work. But what about situations where a test fails? What happens then?

Part 6 - Dealing with failed tests

Continuing from the last section where we practically went through a real test, we need to delve deeper into how the "Terraform test" Command behaves in the event of different errors. I phrase it this way because errors that can occur within this testing system come in 'layers,' somewhat like an onion. We can divide the different "layers" Like so:

The most outer layer - All defined Terraform tests will, at a minimum, run a 'terraform plan' under the hood of the test command. As part of this planning flow, Terraform will catch any syntax and condition errors.
1. Errors are simply returned together with a status of 'failed' if the specific test fails.
If any defined test is set to run a 'terraform apply,' all errors associated with deploying any resource to any provider will be caught here.
1. Same as 1.a
If any two or more tests are dependent on output from one another and the output is not as expected, the test will fail. This only concerns running dependencies in tests and setting the tests to 'apply.'
1. Same as 1.a
If any 'assertion' block(s) are defined in any test definition. These are typically used in unit tests and can validate return values given some custom logic, forcing the test to fail if the logic is not met.
1. A custom 'error_message' Associated with the failed assertion. These custom error messages can be super helpful in troubleshooting scenarios.
In case one or more tests are defined with 'apply,' and during the destruction of resources, the operation failed.
1. The specific test(s) which failed to destroy resources will be marked as 'failed,' and the overall test will also fail. Furthermore, Terraform will output all the objects which it was not able to destroy, which must then be manually handled.
Finally, at the most inner layer of this 'onion,' We can experience that all tests has status "passed" But the final clean-up failed.
1. The overall test will still be defined as 'passed,' and if it ran in a pipeline, the pipeline would have a status of OK. Terraform will also output the objects it was not able to destroy.

So we have all these so-called 'layers' of errors, but how can we then deal with them effectively? In the matrix below, scenarios of errors have been added with descriptions on how to deal with them:

Scenario	Type of error	How to	Notes
Specific test throws error from within the source code of the module being tested	Layer 1 or 2 and contains a specific error of the area in the module that for some reason failed	Open the module source code and head to the line(s) described in the error	Is most likely a syntax error - Or variables has been defined in the test with unexpected values
Same scenario	Layer 3 and will most likely also contain an error from within the module that is being tested	Take a close look at the error provided - Make sure the error is not simply caused by either null or wrong data type return output from another test	Its very important to note that errors at this layer is most likely NOT caused by "bad" Code in the tested module. Changing the code is the last resort.
Specific test throws error from assertion block	Layer 4 and spawns from one or more failed conditions. The error message is user-defined	Validate the condition logic within the assertion blocks that failed. It might just be written incorrectly	When testing custom conditions for the first time to make sure they work as intended, use a new and simple main.tf script with an output block to validate the behavior of the logic
One or more tests fail to destroy resources	Layer 5 and a list spawns defining which specific resource objects could not be destroyed	Can have multiple causes such as miss configuration of resources casuing them to "lock" Check ALL values used for the specifc tests and verify that they are correct	Again at this layer it will most likely NOT be the module source code being at fault. Only change it as a last resort.
The final clean up fails after test completion. This error can occur regardless of the status of the overall test run	Layer 6 - Same error type as for layer 5, only difference being at this stage the overall terraform test status wont change	Most likely caused by Terraform not accuretly being able to perform a valid "dependency Graph" for the entire test	Terraform test can fail to create a valid dependency graph in situations where the tests are created in specific configurations. This will be described in more detail later on in this post

Errors within layers 1-4 typically stem from issues in the source code or defined tests. In such cases, it falls upon us familiar with the module and its tests, to pinpoint the problematic code. However, as we move to layers 5-6, we step into territory where issues may be beyond our control.

These more 'complex' errors may arise due to the Terraform 'test' command still being in its developmental stages. While not a significant concern, it's essential to recognize the potential pitfalls we may encounter. These pitfalls will be defined in the next section.

Part 7 - What to be aware of when testing

This section is important as it will go through the more behavioral side of the Terraform 'test' command. We can use this to better solve the more complex errors described at the bottom of the last section.

Within the so called "test file" As it has been mentioned multiple times so far, we have whats called "run" Blocks - We wont go through all the basics, as these are already described in my last post about test basics => https://www.codeterraform.com/post/how-to-use-the-terraform-test-command-for-use-in-unit-and-integrations-testing

These "run" Blocks can have many different arguments and 2 very important ones for behavior are:

"plan_options"
"module"

The first block type is not used as part of the example written in section 5, where we define the code for the 'azurerm-vm-bundle' Module tests. The reason for this is we would create a very different folder structure for the entire test setup had we used it. We will explore that in more detail a little later.

The second block type is used within our example in section 5 as it fits our need to separate the 'pre-deployment' Test and the integration test using values from the 'pre-deployment'.

Now you might wonder, could we instead of using sub-folders, simply merge the 2 Terraform files defining resources for both tests and then use the 'plan_options' within the test file to simply specify when running test 1, only run on module 1, and then define it again for test 2. Why can't we just do that? Well, we can BUT it will fail... Remember last section where we talked about "layer 6" Errors? Yes.. We simply reach that. The fix was to simply seperate the deployments and remove the "plan_options" Block from both tests.

This is NOT to say you can't make a 'flat' folder structure when defining tests; instead, it's simply to clarify that there will be situations where tests should only run on one out of multiple resource definition calls inside of a '.tf' file. In cases where the module code being tested is very complex in nature, we may need to separate the '.tf' files associated with each test.

Please see the below list of more areas to be aware of when using Terraform test:

Before creating complex conditions inside 'assertion' blocks, ensure they work as intended. Otherwise, you might mistakenly attribute issues to the source code when it's actually the logic within the test condition.
Have a really good understanding of the module being tested as you will then know potentiel error causes right away, which makes for more efficient testing.
If you're new to Terraform testing, start by defining simple unit tests. This approach will facilitate your learning process in understanding how the Terraform 'test' command operates.
Make use of direct dependencies whereever possible instead of static values. This scales more with the development of some codebase having to have many tests done over the development life-cycle. In contrast static values tend to create regiet code that requires a lot of manuel maintenance.
Create a pipeline to automically run the Terraform "test" Command on newly changed code, this is especially valueable for continous integration.
As stated in earlier sections, due to current limitations within the Terraform "test" Command I will highly advise using a sub-folder strategy for more complex integration tests where potential "pre-deployments" / "post-deployments" Are to run within a test setup. For more simple tests and especially unit-testing, use a flat structure to simplify the test configuration as much as possible.

Part 8 - Wrapping-up

So just to summarize, in this post we have now gone through the entire cycle of developing and testing integration tests on a very advanced Terraform module. Additionally, we have explored both Terraform behavior when it comes to errors and also how the general testing flow works. Additionally, a very concrete matrix has been provided to help us pinpoint which layer a given error is occurring from as it will assist us in understanding where the error spawns from. Don't be fooled, Terraform errors come in great variety and sometimes an error does not seem to make sense at first, which can even be caused by something else happening which THEN causes something else to fail - something like a 'Domino effect'.

Going forward, I plan to talk more about both testing strategies and also more test scenarios. Terraform 'test' is still fairly new, so it's a very exciting time to explore these new features that come with it.

Well, that was all for today. Thank you so much for reading along; I really, really appreciate it. Also, HAPPY Easter! I hope you and your family & friends will have a fantastic holiday.

See you soon :)

CODETERRAFORM

Deep Dive into Terraform Testing: Advanced Integration Testing with the Terraform 'test' Command

Recent Posts