Ansible set_fact enables infrastructure-as-code engineers to dynamically set variables during playbook runtime based on external data sources. This provides unprecedented flexibility for customizing automation to each managed node.
In this comprehensive guide, we will dig deep on set_fact – from basic usage to advanced applications, best practices, troubleshooting, and more.
What Makes Set_Fact Powerful?
Traditional variables in Ansible depend on predefined static values in var/var_files or inventory sources. This suffices for simple usage, but poses challenges:
- Repeating static values leads to duplication
- Changes require modifying code or config
- Per-host differences complicate workflows
Set_fact overcomes these limits by providing dynamic sources for variable data. Engineers can integrate external data feeds like infrastructure APIs, monitoring systems, CMDB lookup, etc. directly into playbook runtime.
For example, rather than manually maintaining inventory info, use AWS APIs to pull current data on EC2 instances. Or reference libraries like NetBox to align with latest network configs and IPAM records.
By relying on canonical sources, variables stay up-to-date automatically. This improves consistency while enabling unprecedented flexibility to customize automation.
Ansible Set_Fact Parameters and Features
The set_fact module includes several key configuration parameters:
cacheable – As noted earlier, set_fact variables apply only to the current play by default. Setting cacheable: yes persists the value by converting into an Ansible fact stored for future runs. This requires that fact caching is enabled in ansible.cfg.
Based on Red Hat performance testing, fact caching can optimize execution speed by up to 95%. However, values do go stale until the cache expires.
Complex data types – While set_fact creates strings or booleans by default, you can build arrays, hashes, and complex structures using the var parameter:
- set_fact:
my_servers: "{{ [{‘name‘: ‘web01‘, ‘ip‘: ‘1.1.1.1‘}, {‘name‘: ‘web02‘, ‘ip‘: ‘2.2.2.2‘}] }}"
This provides flexibility for custom variable payloads.
Variable precedence – As outlined in Ansible‘s documentation, set_fact variables can get overridden by other sources depending on their priority tier.
For reference, the precedence order is:
extra_vars(highest)- Inventory vars
- Block/Task vars
- Role vars
- Play vars
- Set_facts
- Registered vars (lowest)
So a var defined in group_vars/web.yml would override a same-named set_fact variable. This becomes important when troubleshooting unexpected values.
Integrating Infrastructure APIs and External Data
One of set_fact‘s most powerful applications involves integrating live infrastructure APIs and databases into playbook runtime. For instance:
- name: Lookup instance data
uri:
url: "https://api.mycloud.com/v1/instances/{{ inventory_hostname }}"
return_content: yes
register: instance_data
- set_fact:
my_zone: "{{ instance_data.json.zone }}"
my_type: "{{ instance_data.json.type }}"
This allows dynamically setting vars based on the latest details for each instance. By relying on a canonical source, the variables will accurately reflect current properties.
For even more flexibility, use blocks and error handling to build robust API integrations:
- block:
- name: Call instance API
uri:
...
register: instance_data
- set_fact:
my_zone: "{{ instance_data.zone }}"
rescue:
- set_fact:
my_zone: "unknown"
This graceful degradation ensures playbooks don‘t fail due to an API outage. Mixing conditionals, registered results, and error handling unlocks new integration possibilities.
Dynamic Inventory Integration
Set_fact also shines for syncing Ansible dynamic inventory with configuration management databases (CMDBs) like ServiceNow or cloud provider inventory.
For example, populate EC2 inventory like:
plugin: aws_ec2
aws_region: us-east-1
keyed_groups:
# Populates tag_Type_web, tag_Env_prod groups
- prefix: tag
key: tags
Then align live groups with CMDB data:
- name: Enrich web server CMDB data
uri:
url: "http://cmdb.company.com/api/servers/web/{{ inventory_hostname }}"
register: cmdb_output
- set_fact:
cmdb_id: "{{ cmdb_output.json.cmdb_id }}"
owner: "{{ cmdb_output.json.owner }}"
class: "{{ cmdb_output.json.class }}"
Now additional CMDB variables relate to the inventory without manual upkeep. Integrations like these customize automation and reporting based on accurate, external data.
Idempotence and Security Considerations
As a best practice when using set_fact:
Check for existing vars – Avoid overwriting predefined inventory variables, as this could cause issues. First check if vars are already defined:
- name: Set region
set_fact:
region: "{{ my_region | default(ec2_region) }}"
Idempotence – For repeatability, conditionally check if vars are populated:
- set_fact:
apps: "{{ my_apps }}"
when: my_apps is not defined
Lookup secrets at runtime – Rather than hardcoding passwords or API keys, use lookup plugins to fetch from secret stores. Limit exposure only to playbook runtime in memory using set_fact.
Adopting patterns like these helps manage risks when relying extensively on set_fact. Consult Ansible‘s security best practices for additional guidance.
Troubleshooting Set_Fact Issues
When investigating set_fact problems, start by validating precedence and data types:
- Use the debug module to output the variable – is it the expected type and value?
- Examine other variable sources like group_vars – could they override set_fact?
- Review set_fact syntax – are quotes, brackets, etc. correct for the data structure?
Enable additional verbosity with -v or -vv flags on the command line or in ansible.cfg. This reveals more details on variable evaluation and origins.
Finally, trace the dependency chain backwards – if a play fails using a set_fact var, determine what populates that variable in the first place. Issues in registered tasks or integrations can manifest as set_fact bugs.
Thoroughly probing these areas helps isolate the root cause when dealing with stubborn issues.
Alternative Techniques
Beyond set_fact, Ansible offers additional approaches for dynamic variables:
register – Perfect for temporary, localized vars based on task output. But values don‘t persist across plays.
include_vars – Dynamically load vars using free-form logic and custom scripts. However, this re-reads files repeatedly instead of caching.
Inventory plugins – As shown earlier for AWS EC2, expose inventory sources as ansible variables automatically. Avoid constantly redefining common cloud metadata.
Each approach suits different use cases – consult the dynamic inventory guide to learn more.
In summary, while related options exist, set_fact uniquely offers flexibility plus fact caching for optimized performance.
Takeaways for Infrastructure Automation Engineers
Ansible set_fact unlocks new possibilities for infrastructure-as-code by bridging playbooks to external data sources. With greater insight into current system state and configurations, automation adapts easier to disparate environments.
To recap key insights for engineers designing enterprise Ansible architectures:
-
Integrate CMDBs and cloud APIs – Centralize and reuse canonical data to improve consistency, accuracy and dev velocity.
-
Build modular, dynamic roles – Parameterize logics around set_fact variables for reusability across environments.
-
Debug using data-driven methodology – Tracing variables to source by following the facts enables fixing the root issue.
-
Secure secrets access at runtime – Storing credentials risks exposure, but set_fact can fetch safely on demand.
By mastering these set_fact techniques, Ansible enables unparalleled flexibility compared to traditional imperative scripts. Infrastructure-as-code engineers gain powerful new mechanisms for simplifying and scaling automation to the entire estate.


