by Joseph Brady, Director of Business Development at Treehouse Software, Inc.; Dan Vimont, Director of Innovation at Treehouse Software, Inc.; and Ram Dhakne, Staff Solutions Engineer at Confluent

Customers who are planning to modernize their enterprise mainframe systems on Cloud, Multi-Cloud, and Hybrid Cloud environments can be faced with decades of mission-critical and historical legacy mainframe data in disparate databases, as well as a variety of other data stores inherited through mergers, acquisitions, and other company growth scenarios.
Customers are stating their needs clearly…
“We want to modernize our mainframe data without disrupting the existing critical work on our legacy systems. We also want to bring together, view, and manage data from applications, databases, data warehouses, etc. that have been spread over many vastly different systems.”
Enterprises also want to tap into today’s advanced data analytics platforms, such as Amazon Redshift, Snowflake, and Amazon Athena/S3, where an ever-expanding array of machine learning and artificial intelligence (ML/AI) tools are available to generate vital insights from their enterprise’s data. The customers’ data science teams are eagerly awaiting the arrival of critical data from their mainframes to supercharge their predictive analytics and generative AI frameworks.
The Solution = Mainframe CDC Data Replication + Unlimited Scaling and Storage

For those customers looking to move mainframe data to the Cloud, Rocket Data Replicate and Sync (RDRS) is the mainframe data replication tool that performs real-time synchronization of data sources, allowing for rapid data movement to newer data sinks/target platforms on AWS, Azure, Google Cloud, and other services. RDRS supports data replication from many mainframe data sources, including Db2 z/OS, Db2 z/VSE, VSAM, IMS/DB, IDMS, DATACOM, Adabas, or flat files.
RDRS allows customers’ legacy mainframe environment to operate normally while replicating data on a variety of Cloud and Hybrid Cloud environments. The technology focuses on changed data capture (CDC) when transferring information between mainframe data sources and Cloud-based databases and applications. Through an innovative set of technologies, changes occurring in any mainframe datastore are tracked and captured, and published to various Cloud targets.

Additionally, Treehouse Software offers the Treehouse Dataflow Toolkit (TDT), a set of Lambda-based microservices that greatly enhances the architecture’s connectivity to high performance, non-relational, massive parallel processing data stores (Amazon Redshift, Snowflake, Amazon Athena/S3) that are primed to supply the most advanced ML/AI tools to data science teams.
To your data scientists, enterprise data history is GOLD…
TDT not only keeps things up to date faster than any conceivable ODBC-based solution, but the “delta tables” into which it loads data also inherently retain the entire history of source data ever since mainframe-to-target synchronization began. So, for example, after TDT has been syncing a target table for 5 years, a data scientist now has 5 years’ worth of historical data to work with for trend analysis, predictive analytics, prescriptive analytics, ML, etc.

Confluent Cloud offers enhanced productivity, improved scalability, minimized downtime, and much more—all while reducing total cost of ownership. Confluent Cloud offers:
- Elastic scaling: Scale up and down quickly to meet fluctuating customer demand, without the ops burden that comes with scaling your data infrastructure
- Infinite Storage: Enable powerful use cases by never having to worry about Kafka retention limits again, while only paying for the storage used
- Built-in Resiliency: Ensure high availability and offload Kafka ops with 99.99% uptime SLA, multi-AZ clusters, and no-touch Kafka patches
How does it all work?
Figure 1: An enterprise can now keep its options open by propagating data to the highly reliable, very scalable Confluent Cloud that can be “subscribed to” by any number of current or yet-to-be-invented ETL toolsets and target data stores.

- We start at the source – the mainframe – where an agent (with a very small footprint) extracts data (in the context of either bulk-load or CDC processing).
- The raw data is securely passed from the mainframe to RDRS which speedily transforms mainframe-formatted data into Unicode/JSON and publishes the results to a Kafka topic in Confluent Cloud.
- TDT functions consume the data from Confluent and land it in S3 buckets, where Treehouse’s proprietary crawler technology is used to automatically prepare landing tables, views, and additional infrastructure for various analytics friendly targets. Then the mainframe data is loaded into Redshift, Snowflake, or S3 (all the while adhering to AWS’s and Snowflake’s recommended “best practices” for massive data loading, thus assuring shortest and surest loads). The inherent reliability and scalability of the entire pipeline infrastructure assure near-real-time synchronization between mainframe sources and the target tables.
This Treehouse/Confluent framework allows data in staging tables to be constantly accruing the most current data, ideally suited for data scientists looking to do trend analysis, predictive analytics, ML, and AI work. For business analysts and others who prefer structured data representations of potentially complex hierarchical data, this framework also automatically provides structured user-views, providing the look and feel of a SQL database.
Contact Treehouse Software today to schedule a product demonstration.