Datasources in Terraform play a crucial role when you need to reference existing infrastructure that is not directly managed by your Terraform configuration. Whether you’re working with resources provisioned manually, via other tools, or in different Terraform directories, datasources allow you to bring that external information into your configuration seamlessly.
This guide will walk you through how datasources work in Terraform, why they matter, and how to implement them effectively across projects of all sizes.
Table of Contents
What Are Datasources in Terraform?
Datasources in Terraform allow your configuration to read external resource information without managing the lifecycle (create, update, delete) of those resources.
They are especially useful when:
- You want to refer to infrastructure provisioned manually.
- You’re consuming outputs from another Terraform configuration.
- You need dynamic values from cloud services, like an AMI ID from AWS.
Unlike managed resources, which Terraform fully controls, datasources are read-only references. This allows you to reuse data without altering the original resource.
Why Use Datasources in Terraform?
There are several practical reasons to leverage datasources:
- Cross-environment integration: You may want to link different layers of your infrastructure (e.g., backend database created in one module and frontend in another).
- Access existing resources: Resources created via Ansible, CloudFormation, or manual provisioning can be read and utilized.
- Reduce duplication: Instead of hardcoding values or duplicating configuration, use datasources to dynamically fetch the latest information.
In short, datasources make Terraform more modular, flexible, and reliable.
Understanding the Datasource Block Syntax
A datasource is defined using the data
block in Terraform. Its syntax closely resembles that of the resource
block, making it easy for anyone familiar with Terraform to adopt.
Example Structure
data "local_file" "dog" {
filename = "/root/dogs.txt"
}
Here’s how it breaks down:
data
: Indicates this is a data source block."local_file"
: Specifies the type of data source. In this case, it reads content from a local file."dog"
: A logical name used to reference the data elsewhere.
You can then reference the data using:
data.local_file.dog.content
This can be plugged into other resource definitions. For example, the content of one file can be used as the input of another.
Real-World Use Case of Datasources
Let’s say you’ve created a file called dogs.txt
using a shell script. This file contains the line:
“Dogs are awesome.”
Now, you want another resource managed by Terraform—like petstore.txt
—to use this file’s content. Since Terraform didn’t create dogs.txt
, it cannot manage it directly. But with a datasource, you can still read it.
Implementing the Data Reference
data "local_file" "dog" {
filename = "/root/dogs.txt"
}
resource "local_file" "petstore" {
content = data.local_file.dog.content
filename = "/root/petstore.txt"
}
Terraform will read the content from dogs.txt
and write it into petstore.txt
during the apply phase.
Datasources vs Resources in Terraform
Understanding the difference between these two is vital:
Feature | Resources | Datasources |
---|---|---|
Purpose | Create, update, delete resources | Read existing resource data |
Block keyword | resource | data |
Management | Fully managed by Terraform | Read-only reference |
Examples | AWS EC2, S3, local_file | AWS AMI, local_file (read), outputs from other configs |
Think of resources as builders, and datasources as readers.
Supported Datasources Across Providers
Every major provider on the Terraform Registry offers various datasources. For example:
- AWS:
aws_ami
,aws_vpc
,aws_security_group
- Azure:
azurerm_resource_group
,azurerm_storage_account
- GCP:
google_compute_image
,google_dns_managed_zone
- Kubernetes:
kubernetes_namespace
,kubernetes_secret
Each datasource has its required and optional arguments, as well as a set of attributes it returns. Always refer to the provider’s documentation to get the exact schema.
When and When Not to Use Datasources
Ideal Times to Use Datasources
- To fetch values from another tool’s infrastructure (e.g., Ansible-created instances).
- To use outputs from different Terraform configurations.
- To avoid hardcoded values and ensure more dynamic setups.
When to Avoid
- If the resource should be managed by Terraform (then use
resource
block). - If the external resource is unreliable or frequently changes format—avoid referencing unstable sources directly.
Conclusion
Datasources in Terraform offer a powerful way to bridge the gap between managed and unmanaged infrastructure. By integrating existing resources into your configuration logic, you gain the flexibility to work across tools, teams, and environments without losing consistency.
They’re essential when scaling Terraform projects, promoting modularization, and avoiding configuration drift.
Whether you’re fetching the latest AMI ID or referencing a file created outside Terraform, datasources enable smarter infrastructure provisioning.
Frequently Asked Questions (FAQs)
1. What is the main purpose of datasources in Terraform?
Datasources allow Terraform to read external resource attributes without managing their lifecycle, enabling modular and flexible configurations.
2. How do datasources differ from resources?
Resources manage the entire lifecycle (create, update, delete), while datasources are read-only and cannot modify external infrastructure.
3. Can I use datasources for cloud providers?
Yes, most Terraform providers support datasources. AWS, Azure, and GCP all offer extensive data blocks for existing infrastructure.
4. Can Terraform data blocks refer to outputs from another module?
Absolutely. You can reference outputs from modules as data sources, making it easier to reuse values across configurations.
5. Is it mandatory to use datasources in every project?
No. Use them when needed, especially when interacting with infrastructure outside your current Terraform control or module scope.