Comprehensive Connectivity and Rapid Data Flow for Enterprise Customers with Treehouse Software and Confluent

by Joseph Brady, Director of Business Development at Treehouse Software, Inc., Dan Vimont, Director of Innovation at Treehouse Software, Inc., and Ram Dhakne, Staff Solutions Engineer at Confluent

Enterprise customers who are planning to modernize their data on Cloud environments are stating their needs clearly… We want a way to unify and manage data from our applications, databases, data warehouses, etc., which have long operated in silos.”

These customers also have a crucial need to tap into today’s advanced data analytics platforms, such as Snowflake, Amazon Redshift, and Amazon Athena/S3, where an ever-expanding array of machine learning and artificial intelligence (ML/AI) tools are available to generate vital insights from their enterprise’s data.  Data science teams are eagerly awaiting the arrival of critical data from their enterprise’s data sources to supercharge their predictive analytics and generative AI frameworks.

Data Transfer + Unlimited Scaling and Storage

To address the need for rapid, high-volume data transfer from source DBs to Analytics/ML/AI-friendly platforms, Treehouse Software has recently gone to market with two powerful new offerings: Treehouse Dataflow Toolkit (TDT) for Mainframe Data Sources and TDT-DIRECT for Non-Mainframe Data Sources. These Cloud-native, fully automated, turn-key solutions work hand-in-hand with the premiere data streaming platform, Confluent to empower enterprise customers to rapidly migrate data – both bulk-load and change data capture (CDC) – to Snowflake, Amazon Redshift, Amazon Athena/S3, and Amazon S3 Express One Zone.

The TDT offerings are much more than mere “connectors”, providing an innovative and robust Lambda-based microservices infrastructure that automatically generates all target resources required for data transfer. Without TDT-DIRECT’s fully automated approach, a customer can spend months designing and creating target resources, such as delta tables, views, schemas, etc.

TDT-DIRECT extracts data directly from a source DB and loads it via Confluent into Snowflake’s “delta tables”, which inherently retain the entire history of source data ever since the source-to-target synchronization began (perfect for time-based trend/predictive/prescriptive analytics).

Figure 1: TDT-DIRECT automatically creates all Snowflake target structures (schemas, history tables, current views, user views, stages, and file formats), and Confluent delivers the data (e.g., insert, update, delete transactions) via bulk-load and CDC.

Leveraging AWS CloudFormation for ease of implementation…

For ease of implementation, TDT is delivered via CloudFormation templates, allowing customer sites to be up and running with a fully preconfigured implementation of a new data transfer pipeline in minutes. The TDT CloudFormation Templates create stacks consisting of all principal framework components, along with related IAM policies and roles which are carefully engineered to comply with “best practices” (such as a “least privileges” approach to permissions).

The TDT CloudFormation Templates also optionally provide for automatic creation of a VPC, its subnets, and all required standard VPC-oriented resources, as well as optional creation of a source database cluster (consisting of either a sample database provided by Treehouse for a quick trial/POC, or your own database and data).

The Confluent Advantage…

Treehouse Software’s TDT solutions fully support data transfers from mainframe and non-mainframe data sources to Confluent Cloud, which offers enhanced productivity, improved scalability, minimized downtime, and much more—all while reducing total cost of ownership. Confluent Cloud brings customers a Fully Managed Kafka Service and Complete Pre-Built Ecosystem that includes:

  • Elastic Scaling: Scale up and down quickly to meet fluctuating customer demand, without the ops burden that comes with scaling your data infrastructure.
  • Infinite Storage: Enable powerful use cases by never having to worry about Kafka retention limits again, while only paying for the storage used
  • Built-in Resiliency: Ensure high availability and offload Kafka ops with 99.99% uptime SLA, multi-AZ clusters, and no-touch Kafka patches
  • Serverless stream processing for Apache Flink®: Flink is the de facto industry standard for stream processing. Confluent Cloud for Apache Flink provides a cloud-native, serverless service for Flink that enables simple, scalable, and secure stream processing that integrates seamlessly with Apache Kafka®. Your Kafka topics appear automatically as queryable Flink tables, with schemas and metadata attached by Confluent Cloud.

A Powerful, Combined Solution…

Treehouse Software and Confluent provide a comprehensive framework that allows the target platform to constantly accrue the most current source data, which is ideally suited for data scientists looking to do trend analysis, predictive analytics, ML, and AI work. 

Treehouse Dataflow Toolkit (TDT) and TDT-DIRECT are Copyright ©Treehouse Software, Inc. All rights reserved.

____Treehouse_AWS_Badges

Contact Treehouse Software for a TDT Demo Today!

Treehouse Software offers SIs and consulting companies free “deep dive” learning sessions to educate your team on the value of bringing these turn-key data transfer solutions your customers.

Contact us today to schedule your session! 

Treehouse Dataflow Toolkit (TDT) Brings Added Value to Systems Integrators and Enterprise Consulting Companies

TDT_AI_ML

With decades of experience, Treehouse Software has helped systems integrators (SIs) and enterprise consulting companies streamline the migration of mainframe data to modern Cloud and Open Systems platforms—leveraging automation and innovation to accelerate time to value.

Treehouse Software is excited to introduce two powerful new offerings: Treehouse Dataflow Toolkit (TDT) for Mainframe Data Sources and TDT-DIRECT for Non-Mainframe Data Sources. These Cloud-native, fully automated, turn-key solutions empower enterprise customers to rapidly migrate data – both bulk-load and change data capture (CDC) – to advanced cloud and analytics targets such as Amazon Redshift, Snowflake, Amazon Athena/S3, Amazon S3 Express One Zone, and Amazon Aurora PostgreSQL.

TDT for Mainframe Data Sources…

01_Generic_MSK_TDT02

TDT-DIRECT for Non-Mainframe Data Sources…

TDT_DIRECT_03

  •  

With TDT and TDT-DIRECT, migrations take weeks – not months or years – supported by Treehouse Software’s 40+ years of leadership in data replication.

For SIs and consulting firms, TDT solutions act as critical accelerators – moving enterprise modernization initiatives swiftly into the value capture phase with Cloud and analytics platforms.

Substantial value of solutions that are more than merely “connectors”

  • TDT and TDT-DIRECT are ready to go: Customers can start pumping data into data analytics targets in days, rather than months, or years.
  • TDT and TDT-DIRECT are massively scalable through an efficient, event-driven AWS Lambda-based architecture.
  • TDT’s intelligent crawlers automatically generate JSON-based views and infrastructure – saving developers time and simplifying deployment to analytics environments where SQL-based handling is cumbersome.
  • TDT and TDT-DIRECT are delivered as robust CloudFormation Templates, automating the setup of the full TDT stack (including Lambda functions and other AWS components) within your AWS environment.
  • Treehouse Software provides dedicated technical expertise to ensure fast implementation and continuous support.
  • We say “NO!” to using only generic ODBC connections for data transmission, because:
    • To load large volumes of data, TDT and TDT-DIRECT use native bulk load utilities from target vendors – delivering superior scalability compared to ODBC, which relies on a narrow, transaction-based pipeline.
    • It is important to recognize that Snowflake and Redshift are analytical platforms – NOT OLTP systems—making ODBC-based CDC transfers both inefficient and misaligned with vendor best practices, often causing significant performance bottlenecks.
    • For Snowflake’s bulk-load functionality to operate effectively, proprietary objects beyond basic tables and views are required. TDT’s crawler automatically generates the necessary DDL to provision these components – saving time and preventing errors.

Challenges and impact of building a custom solution

A decision by an enterprise not to use TDT, but instead to build its own Kafka-to-Analytics/ML/AI-friendly targets solution, could result in any, or all, of the following:

  • accumulation of technical debt
  • extensive/unpredictable time to production (6 to 12 months of upfront development on average)
  • ongoing resource planning to maintain home-grown technologies (administrative and development)
  • vendor lock for maintenance of custom-made technologies designed and developed by consultants
  • managing a mix of manual and automated functions (requiring additional ongoing manpower)
  • difficulty in tracking cobbled together components created by multiple staff and consultants
  • limited agility for future customization and innovation (as technologies continue to rapidly evolve)
  • problems adhering to rapidly evolving best practices over time
  • high costs for future growth/scaling
  • potential lack of proper security/ongoing security updates
  • your organization, or your customer has now become an enterprise software development company, along with all of its associated costs!

Simply put, TDT and TDT-DIRECT are comprehensive, turn-key solutions that eliminate the need for months or even years of in-house development and associated costs.

Treehouse Dataflow Toolkit (TDT) and TDT-DIRECT are Copyright © 2024 Treehouse Software, Inc. All rights reserved.

____Treehouse_AWS_Badges

Contact Treehouse Software for a TDT Demo Today!

Treehouse Software offers SIs and consulting companies free “deep dive” learning sessions to educate your team on the value of bringing these turn-key data transfer solutions your customers.

Contact us today to schedule your session! 

Using AWS CloudFormation Templates for rapid solutions configuration and deployment

by Joseph Brady, Director of Business Development at Treehouse Software, Inc., and Dan Vimont, Director of Innovation at Treehouse Software, Inc.

TDT_CloudFormation_Template

This blog focuses on the value of delivering an AWS application offering via AWS CloudFormation Templates.  For those already familiar with CloudFormation, we invite you to skip to the next section of this blog entry (to the heading, “How TDT Leverages AWS CloudFormation”), where we describe how we use CloudFormation Templates for delivery and rapid deployment of the Treehouse Dataflow Toolkit (TDT), our turn-key solution for loading massive quantities of mainframe and non-mainframe data into Analytics/AI/ML-friendly targets on AWS.

Simply put, a CloudFormation Template is a formatted file (written in either JSON or YAML), that acts as a blueprint for automatically defining and deploying infrastructures in AWS by specifying the different resources needed, such as EC2 instances, databases, and security groups. A CloudFormation Template allows you to create and manage an entire application infrastructure as a single unit (called a “stack”) with a single command. Whenever you create a stack, you designate a template that CloudFormation uses to create the components described in that template.

CloudFormation Templates permit not only the building of complex sets of resources, but also the reuse of those templates in multiple contexts. To heighten flexibility and reusability, input parameters allow options to be specified when you create a CloudFormation stack. For example, you can specify an optional value like the type of an EC2 instance when you create a stack, making the template easier to reuse in different situations. For those familiar only with laboriously architecting and building applications one component at a time via the AWS console, the value of instead using CloudFormation Templates for more streamlined and assured deployment and management of your Cloud infrastructure cannot be overstated.

Advantages of using CloudFormation for managing infrastructure include…

  • Automating infrastructure provisioning
  • Defining infrastructure as code
  • Enabling reusability across environments
  • Facilitating easy deployment
  • Cost control through resource management
  • Scalability by quickly scaling up or down resources
  • Seamless integration with the AWS ecosystem
  • Reducing manual configuration errors

Video: Introduction to AWS CloudFormation…

How TDT leverages AWS CloudFormation

In the case of our TDT offering, Treehouse provides highly-detailed CloudFormation Templates which automate and accelerate the process of installing and configuring the complete TDT application (including AWS Lambda functions and a number of other AWS resources) in your AWS account(s). The TDT CloudFormation Templates create stacks consisting of all principal framework components, along with related IAM policies and roles which are carefully engineered to comply with “best practices” (such as a “least privileges” approach to permissions).

The TDT CloudFormation Templates also optionally provide for automatic creation of a VPC, its subnets, and all required standard VPC-oriented resources, as well as optional creation of a source database cluster (consisting of either a sample database provided by Treehouse for a quick trial/POC, or your own database and data).

____0_TDT_CloudFormation

Our customers and partners appreciate the delivery of a self-contained TDT solution via CloudFormation Templates. This eliminates weeks (potentially months) of engineering work and associated deployment time and costs, and allows a site to be up and running with TDT in minutes. Additionally, customers can optionally customize TDT’s CloudFormation Templates in order to adhere to enterprise architectural, security, and naming standards.

Download: TDT AWS Partner Solution Brief…

DOWNLOAD…AWS_TDT_Product_Brief_Thumb01

Treehouse Dataflow Toolkit (TDT) is Copyright © 2024 Treehouse Software, Inc. All rights reserved.


____Treehouse_AWS_Badges 

Contact Treehouse Software for a Demo Today!

Contact Treehouse Software today for more information or to schedule a product demonstration.

Enterprise Mainframe Customers Can Tap Into Today’s Most Advanced Data Analytics Platforms with Treehouse Software and Confluent

by Joseph Brady, Director of Business Development at Treehouse Software, Inc.; Dan Vimont, Director of Innovation at Treehouse Software, Inc.; and Ram Dhakne, Staff Solutions Engineer at Confluent

Treehouse_Confluent_001

Customers who are planning to modernize their enterprise mainframe systems on Cloud, Multi-Cloud, and Hybrid Cloud environments can be faced with decades of mission-critical and historical legacy mainframe data in disparate databases, as well as a variety of other data stores inherited through mergers, acquisitions, and other company growth scenarios.

Customers are stating their needs clearly…

“We want to modernize our mainframe data without disrupting the existing critical work on our legacy systems. We also want to bring together, view, and manage data from applications, databases, data warehouses, etc. that have been spread over many vastly different systems.”

Enterprises also want to tap into today’s advanced data analytics platforms, such as Amazon Redshift, Snowflake, and Amazon Athena/S3, where an ever-expanding array of machine learning and artificial intelligence (ML/AI) tools are available to generate vital insights from their enterprise’s data.  The customers’ data science teams are eagerly awaiting the arrival of critical data from their mainframes to supercharge their predictive analytics and generative AI frameworks.

The Solution = Mainframe CDC Data Replication + Unlimited Scaling and Storage

TRDRS_Logo

For those customers looking to move mainframe data to the Cloud, Rocket Data Replicate and Sync (RDRS) is the mainframe data replication tool that performs real-time synchronization of data sources, allowing for rapid data movement to newer data sinks/target platforms on AWS, Azure, Google Cloud, and other services. RDRS supports data replication from many mainframe data sources, including Db2 z/OS, Db2 z/VSE, VSAM, IMS/DB, IDMS, DATACOM, Adabas, or flat files. 

RDRS allows customers’ legacy mainframe environment to operate normally while replicating data on a variety of Cloud and Hybrid Cloud environments. The technology focuses on changed data capture (CDC) when transferring information between mainframe data sources and Cloud-based databases and applications. Through an innovative set of technologies, changes occurring in any mainframe datastore are tracked and captured, and published to various Cloud targets.

TDT_Logo

Additionally, Treehouse Software offers the Treehouse Dataflow Toolkit (TDT), a set of Lambda-based microservices that greatly enhances the architecture’s connectivity to high performance, non-relational, massive parallel processing data stores (Amazon Redshift, Snowflake, Amazon Athena/S3) that are primed to supply the most advanced ML/AI tools to data science teams.

To your data scientists, enterprise data history is GOLD…

TDT not only keeps things up to date faster than any conceivable ODBC-based solution, but the “delta tables” into which it loads data also inherently retain the entire history of source data ever since mainframe-to-target synchronization began.  So, for example, after TDT has been syncing a target table for 5 years, a data scientist now has 5 years’ worth of historical data to work with for trend analysis, predictive analytics, prescriptive analytics, ML, etc.

Confluent_logo_400x400

Confluent Cloud offers enhanced productivity, improved scalability, minimized downtime, and much more—all while reducing total cost of ownership. Confluent Cloud offers:

  • Elastic scaling: Scale up and down quickly to meet fluctuating customer demand, without the ops burden that comes with scaling your data infrastructure
  • Infinite Storage: Enable powerful use cases by never having to worry about Kafka retention limits again, while only paying for the storage used
  • Built-in Resiliency: Ensure high availability and offload Kafka ops with 99.99% uptime SLA, multi-AZ clusters, and no-touch Kafka patches

How does it all work?

Figure 1: An enterprise can now keep its options open by propagating data to the highly reliable, very scalable Confluent Cloud that can be “subscribed to” by any number of current or yet-to-be-invented ETL toolsets and target data stores.

____0_Confluent01

  1. We start at the source – the mainframe – where an agent (with a very small footprint) extracts data (in the context of either bulk-load or CDC processing).
  2. The raw data is securely passed from the mainframe to RDRS which speedily transforms mainframe-formatted data into Unicode/JSON and publishes the results to a Kafka topic in Confluent Cloud.
  3. TDT functions consume the data from Confluent and land it in S3 buckets, where Treehouse’s proprietary crawler technology is used to automatically prepare landing tables, views, and additional infrastructure for various analytics friendly targets. Then the mainframe data is loaded into Redshift, Snowflake, or S3 (all the while adhering to AWS’s and Snowflake’s recommended “best practices” for massive data loading, thus assuring shortest and surest loads).  The inherent reliability and scalability of the entire pipeline infrastructure assure near-real-time synchronization between mainframe sources and the target tables.

This Treehouse/Confluent framework allows data in staging tables to be constantly accruing the most current data, ideally suited for data scientists looking to do trend analysis, predictive analytics, ML, and AI work.  For business analysts and others who prefer structured data representations of potentially complex hierarchical data, this framework also automatically provides structured user-views, providing the look and feel of a SQL database.


Treehouse_AWS_Badge

Contact Treehouse Software today to schedule a product demonstration.

Leveraging the Application of Generative AI/LLM/Machine Learning tools on Mainframe data

by Joseph Brady, Director of Business Development at Treehouse Software; Eleonora Savova, Cloud Solutions Architect at Treehouse Software; and Dan Vimont, Director of Innovation at Treehouse Software

TDT_RDRS_AI_ML

This blog focuses on the value of the Rocket Data Replicate and Sync (RDRS) data replication tool in conjunction with Treehouse Dataflow Toolkit (TDT) as a turn-key solution to leverage the application of Generative AI/LLM/Machine Learning tools on Mainframe data.

There is a growing demand among enterprises to run Large Language Models (LLMs) on their Mainframe data to unlock new insights and historical patterns that were previously unattainable. However, all generative AI and Machine Learning (ML) model training requires massive parallel processing, which cannot be practicably achieved on the mainframe. To train Large Language Models (LLMs) on Mainframe data, one must first transfer that data from the mainframe to a location where generative AI frameworks can access it. Data scientists and Machine Learning (ML) specialists working in environments like Snowflake or Amazon Redshift are anxiously awaiting mainframe data to process. RDRS and TDT offer the turn-key solution to deliver the Mainframe data to the appropriate target for ML processing in as little as a few days!

Taking it apart, only to put it all back together again!

When loading massive quantities of mainframe data into relational databases, especially hierarchical, it needs to be broken down into complex arrays of parent, child, and grandchild tables. The challenge for developers is stitching these arrays back together to produce the equivalent hierarchical structures present on the mainframe. TDT effectively maintains the mainframe’s hierarchies in the target because all the supported end targets are JSON-friendly – SnowflakeAmazon RedshiftAmazon Athena/S3Amazon S3 Express One Zone Buckets, and Amazon Aurora PostgreSQL. While many developers favor JSON, TDT also offers relational database-like Views for those who prefer to leverage SQL instead. These Views are generated from the parent, child, and grandchild constructs behind the underlying original JSON.

____0_RDRS_TDT_To_AI_ML

RDRS moves data from the mainframe into Kafka pipelines. From there, TDT moves it to AWS Athena/S3 (a generic data storage solution), making it available for leading AWS ML tools, such as Amazon Bedrock and Amazon SageMaker, to perform their processing. Alternatively, TDT can move the data into the native data stores of Amazon RedShift or Snowflake, with many Treehouse customers expressing a strong preference for targeting Snowflake.

Now that the mainframe data has been quickly loaded into the desired target(s) using RDRS and TDT, ML tools can begin their intensive, complex mathematical computations to generate ML models. Recall that this is the part that was impossible to do on the mainframe because it does not support distributed processing – hence the need to move the mainframe data out, perform complex mathematical training and learning on the data, and generate ML models. Overall, our customers and partners are increasingly recognizing Treehouse as a skilled player in the AI space. Our RDRS and TDT solutions efficiently transfer mainframe data to the necessary targets for generating Machine Learning models, seamlessly maintaining hierarchies out of the box, and accomplishing all of this in just a few days.

Partners tell us how they seek to use AI…

Treehouse technology and systems integrator partners are increasingly encouraged by their leaders to explore new data analytics opportunities after transferring data from the mainframe. They are also urged to explore how AI can expedite both the data loading process and the subsequent validation of data accuracy. Due to the large volumes involved, migration often requires multiple transfer iterations from mainframe to target, which can be time-consuming.

The leading Analytics/AI/ML technologies are now available, and Treehouse Software has the tools to quickly get you there…

Our partners appreciate our self-contained, turn-key solutions that can eliminate months (or even years) of research and development time and costs, allowing customers to be up and running in minutes. RDRS focuses on change data capture (CDC) when transferring information between mainframe data sources and Cloud databases and applications. Through its innovative technology, RDRS tracks and captures changes occurring in any mainframe application data, then publishes them to a variety of targets. With TDT, customers benefit from high-speed, large scale data movement that strictly adheres to AWS’s recommended use of massively scalable bulk load utilities. This adherence to AWS’s best practices is a key differentiator of TDT from other “connector” offerings on the market. TDT provides the turn-key solution for rapidly transferring data from Kafka into advanced analytics, AI, and ML-friendly targets.

Download: TDT AWS Partner Solution Brief to share with your team…

DOWNLOAD…AWS_TDT_Product_Brief_Thumb01

Treehouse Dataflow Toolkit (TDT) is Copyright © 2024 Treehouse Software, Inc. All rights reserved.


____Treehouse_AWS_Badges 

Contact Treehouse Software for a Demo Today!

Contact Treehouse Software today for more information or to schedule a product demonstration.

Treehouse Software and Snowflake can help you build custom AI models, provide near-infinite scale data storage, or just have a chat with your enterprise data

by Joseph Brady, Director of Business Development at Treehouse Software, Inc.; Dan Vimont, Director of Innovation at Treehouse Software, Inc.; and Eleonora Savova, Cloud Solutions Architect at Treehouse Software

Treehouse_Snowflake_001

Today’s infectious enthusiasm about Machine Learning (ML) and Artificial Intelligence (AI) has become one of the prime motivators for customers wanting to move enterprise data to the Cloud. Snowflake is the platform of choice for many of these enterprises looking for a Cloud platform onto which they can mobilize data at near-unlimited scale and performance, and tap into advanced ML/AI.

Snowflake’s unified platform is designed for secure development and deployment of Machine Learning (ML) and Large Language Models (LLMs). Here are some of the most exciting new Snowflake AI/ML technologies on offer… 

  • Snowflake Cortex AI: Efficiently process data at scale using cutting-edge models in Snowflake Cortex AI. Using serverless SQL and Python functions, it is easy to process data using fine-tuned models or foundation models such as Snowflake Arctic, Meta Llama 3, Mistral Large, Reka Core, and more. 
  • Snowflake Cortex AI Chat Services: Get answers from analytical tables such as sales transactions without writing any SQL with Cortex Analyst (public preview) text-to-answer service. Quickly find answers hidden among a large set of documents using Cortex Search (public preview) fully managed hybrid search and retrieval service.
  • Snowflake ML: Run distributed feature engineering and custom model training using popular python libraries. Manage features and models at scale with Snowflake Feature Store (public preview) and Model Registry. Democratize predictive model development by using SQL functions that abstract complexity of ML algorithms.

Snowflake also provides optimized storage that includes unstructured, semi-structured, and structured data together with near-infinite scale. The platform gives fast and efficient access, optimized compression, and secure data — all automated. Customers can work with data on-premises, or in open table formats, thus removing lock-in, which allows adaptation to any current or future architectural patterns.

Video: Using AI Within Snowflake For Everyday Analytics…

First things, first…

Before you can start using Snowflake’s AI, ML, and LLM capabilities, you must first load your data into Snowflake. This is where Treehouse Dataflow Toolkit (TDT) comes in with a state-of-the-art, fully automated offering. Working in tandem with Rocket Data Replicate and Sync (RDRS), TDT assures highly-available, auto-scalable, and event-driven data transfers from Kafka pipes to Snowflake, delivered as a set of proprietary microservices. TDT stands out with its strict adherence to Snowflake’s and AWS’s recommended best practices for massive data loading, making it MUCH more than just a “connector”.

____0_TDT_Snowflake_Diagram

The greatest “value added” our customers see among all TDT’s functions and features is our Snowflake connectivity. TDT’s innovative Lambda-based microservices approach loads data into Snowflake tables, which are architected to retain the entire history of source data ever since the source-to-target synchronization began (perfect for time-based trend/predictive/prescriptive analytics).  

Simply put, TDT is the self-contained, turn-key solution that gets your valuable data into Snowflake today, eliminating months, or even years of research and development time/costs. TDT’s high-speed, massive data movement to Snowflake only takes minutes to ramp up.


Further Reading… 

AWS TDT Product Brief

TDT: Much more than a mere “data connector” for Snowflake

Just What is the New Treehouse Dataflow Toolkit?

Treehouse Software and Confluent offer High-Speed Mainframe Dataflow for Cloud-based Advanced Analytics

Treehouse Dataflow Toolkit (TDT) is Copyright © 2024 Treehouse Software, Inc. All rights reserved.

____Treehouse_AWS_Badges 

Contact Treehouse Software for a Demo Today!

Contact Treehouse Software today for more information or to schedule a product demonstration.

Getting Started with Rocket Data Replicate and Sync [RDRS] and Treehouse Dataflow Toolkit [TDT]

by Joseph Brady, Director of Business Development at Treehouse Software, Inc. and Dan Vimont, Director of Innovation at Treehouse Software, Inc.

Introduction
Treehouse Software offers customers and partners a hands-on opportunity to configure and execute data transfers using Rocket Data Replicate and Sync [RDRS] and the Treehouse Dataflow Toolkit (TDT) in the form of “Getting Started” CloudFormation templates in an AWS Virtual Private Cloud (VPC).

About RDRS
RDRS offers a multi-platform solution for real-time, continuous, and bidirectional data synchronization and replication. Transfer mainframe data to your AWS targets continuously and in real-time for Data Analytics, Business Intelligence, ERP, CRM, or for application modernization, mainframe offload initiatives, or mainframe migration. RDRS considerably simplifies and accelerates data exchange configurations through its intuitive Dashboard interface, which presents source, target, and mapping metadata in a straightforward, user-friendly format – even for users with no mainframe knowledge. Supported sources and targets include a wide range of relational and non-relational databases: DB2, IMS/DB, VSAM, sequential files, Adabas, Datacom DB, IDMS/DB, Oracle, SQL Server, PostgreSQL, and others. Additionally, RDRS can publish transactions in JSON and Avro format to pipelines such as Kafka and Kinesis.

About the Getting Started CloudFormation stack
Your Getting Started session covers the deployment and operation of a fully preinstalled RDRS implementation, generated as a CloudFormation stack via a template provided by Treehouse Software. As shown in the diagram below, the chief components of RDRS come preinstalled on two EC2 instances launched within a new or existing securely-provisioned VPC, with the main engine (the RDRS Agent) running on a Linux instance and the user interface (the RDRS Dashboard) running on a Windows instance.

The intent of the Getting Started stack is to allow new users to quickly skip past the complexities of designing and building the VPC shown above, and to bypass some of the initial installation/configuration steps of RDRS. The Getting Started stack allows the user to work with the product’s data transfer functionality using the sample pre-loaded databases within just minutes.

In addition to the RDRS product components, the stack includes three database instances for the purpose of the user’s initial experimentation with data transfers. They consist of a pre-loaded SOURCE database (the SQL Server AdventureWorks sample database), a TARGET database (PostgreSQL), and a REPOSITORY database for RDRS’s repository tables (PostgreSQL). The sample data in the SOURCE database tables may also be transferred to an MSK cluster, which is also created in the Getting Started stack.

TDT components provided in the Getting Started stack
The Getting Started stack includes components of TDT, which provide event-driven Lambda-based functions to automatically consume messages from MSK/Kafka (messages produced by RDRS, as shown in the diagram above), to land data in JSON format in S3 buckets, and to perform advanced crawling of the data to derive target table and view structures. Separate documentation is provided for configuration and operation of TDT for targeting Athena/S3.

Adding mainframe-based data sources
If you wish to add mainframe sources to the mix, you are encouraged to contact Treehouse Software to obtain RDRS’s mainframe components. Note that the usage of SQL Server as a source in this Getting Started stack very strongly resembles the usage of a mainframe source; the RDRS Dashboard is designed to provide a consistent interface, regardless of whether the data source is mainframe or non-mainframe based, and regardless of whether the source datastore is relational or nonrelational in nature.


____Treehouse_AWS_Badges 

Schedule your session today…

Contact us to schedule a Getting Started with RDRS and TDT session.

Copyright 2024 by Treehouse Software, Inc.

A Treehouse Dataflow Toolkit Treetip: Amazon RDS Proxy

by Joseph Brady, Director of Business Development at Treehouse Software, Inc. and Dan Vimont, Director of Innovation at Treehouse Software, Inc.

Treehouse_Dataflow_Toolkit03

As mentioned in several Treehouse blogs, our innovative offering, Treehouse Dataflow Toolkit (TDT), provides the turn-key solution for rapidly transferring data from Kafka into advanced Analytics/AI/ML-friendly targets, such as  SnowflakeAmazon RedshiftAmazon Athena/S3Amazon S3 Express One Zone Buckets, as well as Amazon Aurora PostgreSQL.

This blog focuses on TDT’s utilization of Amazon RDS Proxy, a brilliantly designed, fully managed, and highly available database proxy for Amazon Relational Database Service (RDS). RDS Proxy is the ultimate “traffic cop” that makes applications more scalable, more resilient to database failures, and more secure.

When TDT targets Amazon PostgreSQL, RDS Proxy auto-maintains a pool of connections to PostgreSQL, which ensures the target does not become overwhelmed during times of massive data flow. Additionally, there is no need to provision or manage any additional infrastructure to begin using RDS Proxy.

____0_Amazon_RDS_Proxy_TDT03

Amazon RDS Proxy Use Cases…

Applications that support highly variable workloads may attempt to open a burst of new database connections. RDS Proxy’s connection governance allows customers to gracefully scale applications dealing with unpredictable workloads by efficiently reusing database connections.

    • In the TDT context, our Lambda-based infrastructure autoscales up and down in alignment with dataflows, and the RDS Proxy responds by brilliantly managing any increased and decreased connection requirements.

Applications that frequently open and close database connections. RDS Proxy allows customers to maintain a pool of database connections to avoid unnecessary stress on database compute and memory for establishing new connections.

    • As TDT’s autoscaling results in Lambda instances being spun up or shut down, the RDS Proxy maintains stability with its well-managed connection pool.

Applications that can transparently tolerate database failures without needing to write complex failure handling code. RDS Proxy automatically routes traffic to a new database instance while preserving application connections.

    • Treehouse strongly recommends that customers take advantage of multi-AZ configurations of their RDS databases, which can be fully leveraged by RDS Proxy to assure continuity of service in the event of AZ-specific outages.

Applications that need extra security, including option to enforce IAM based authentication with relational databases. RDS Proxy also enables customers to centrally manage database credentials through AWS Secrets Manager.

    • This a great feature of RDS Proxy:  TDT requires no access whatsoever to database security credentials, instead letting RDS Proxy (in concert with IAM security and AWS Secrets Manager) manage everything to provide state-of-the-art, best practices security for your target database.

VIDEO – Introduction to Amazon RDS Proxy:

Conclusion
TDT is a self-contained, turn-key solution that eliminates months (possibly years) of research and development time and costs, and customers can be up and running in minutes. With TDT, customers are assured of high-speed and massive data movement that strictly adheres to AWS’s recommended use of massively scalable bulk load utilities, as well as Amazon RDS Proxy for the most efficient and secure connectivity. This adherence to AWS’s best practices is one of TDT’s primary differentiators from other “connector” offerings on the market.

Treehouse Dataflow Toolkit (TDT) is Copyright © 2024 Treehouse Software, Inc. All rights reserved.

Download: TDT AWS Partner Solution Brief to share with your team…

DOWNLOAD…AWS_TDT_Product_Brief_Thumb01

Treehouse Dataflow Toolkit (TDT) is Copyright © 2024 Treehouse Software, Inc. All rights reserved.


____Treehouse_AWS_Badges 

Contact Treehouse Software for a Demo Today!

Contact Treehouse Software today for more information or to schedule a product demonstration.

TDT: Much more than a mere “data connector” for Snowflake

by Joseph Brady, Director of Business Development at Treehouse Software, Inc. and Dan Vimont, Director of Innovation at Treehouse Software, Inc.

____0_TDT_Snowflake_Splash

Over the past few months, we have been rolling out information on Treehouse Dataflow Toolkit (TDT), a state-of-the-art, fully automated offering for data transfer from Kafka pipes to Analytics/ML/AI frameworks.  TDT is a set of proprietary microservices that assures highly-available, auto-scalable, and event-driven data transfers to your data science teams’ favorite analytics frameworks, such as Snowflake, Amazon Redshift, Amazon Athena/S3Amazon S3 Express One Zone Buckets, as well as Amazon Aurora PostgreSQL, all the while adhering to AWS’s and Snowflake’s recommended best practices for massive data loading. Make no mistake, TDT is MUCH more than merely a “connector”.

In this blog, we will focus on how TDT handles data transfers to perhaps the most complex environment: Snowflake.  Out of all TDT functions and features, our Snowflake connectivity offers the biggest “value added” to customers, because Snowflake has quickly become a top choice for enterprises looking for a Cloud platform onto which they can mobilize data at near-unlimited scale and performance, and bring advanced ML/AI capabilities.

Snowflake overview video…

Connectivity using Snowflake’s best practices vs. traditional ODBC…

TDT’s innovative Lambda-based (microservices) approach enables faster data flow than any conceivable ODBC-based solution, which is the standard tool used for most “roll your own” approaches, or “we have a connector for that” offerings.  

To load massive quantities of data to a target, TDT uses Snowflake’s (hugely scalable) bulk load utilities—not ODBC. It is vital to note that Snowflake is NOT a relational (OLTP) database, so doing CDC transfers to these targets via ODBC (with update, insert, delete transactions) goes directly against “best practices” advice from Snowflake, and would almost assuredly result in unwieldy bottlenecks.

____0_TDT_Snowflake01

TDT loads data into Snowflake’s “delta tables”, which inherently retain the entire history of source data ever since the source-to-target synchronization began (perfect for time-based trend/predictive/prescriptive analytics). Again, TDT adheres to Snowflake’s best practices recommendation for pulling data from S3 for bulk loading massive quantities of data…

____0_TDT_Snowflake02

Publishing both bulk-load and CDC data to a reliable and scalable framework like Kafka allows you to maintain a broad array of options to ultimately feed your legacy data to any number of JSON-friendly ETL tools, target data stores, and data analytics packages (some of which have not even been invented yet!). 

The “build vs buy” question is put to rest…

The Snowflake-proprietary target DDL/metadata/resources that TDT automatically produces for the staging of data in Snowflake are of such complexity that it is easy to justify the “buy” option in the “build vs buy” conversations customers have. A decision by an enterprise not to use TDT, but instead to build its own Kafka-to-Snowflake solution, could result in any or all of the following:

  • accumulation of technical debt
  • extensive/unpredictable time to production
  • ongoing resource planning to maintain home-grown technologies
  • potential vendor lock for maintenance of custom-made technologies designed and developed by consultants
  • managing a mix of manual and automated functions
  • tracking cobbled together components created by multiple staff and consultants
  • limited agility for future customization and innovation
  • problems adhering to evolving best practices over time
  • higher costs for future growth/scaling
  • potential lack of proper security/ongoing security updates
  • your organization has now become an enterprise software development company, whether or not you intended that, and whether or not you realized that!

Simply put, TDT is a self-contained, turn-key solution that can eliminate months, or years, of research and development time and costs. With TDT, high-speed and massive data movement to Snowflake takes minutes to ramp up.

Download the TDT AWS Partner Solution Brief to share with your team…

DOWNLOAD…AWS_TDT_Product_Brief_Thumb01

Treehouse Dataflow Toolkit (TDT) is Copyright © 2024 Treehouse Software, Inc. All rights reserved.

____Treehouse_AWS_Badges 

Contact Treehouse Software for a Demo Today!

Contact Treehouse Software today for more information or to schedule a product demonstration.

So, You’ve Managed to Start Streaming Your Legacy Data into Kafka Pipelines… Now What?

by Joseph Brady, Director of Business Development at Treehouse Software, Inc. and Dan Vimont, Director of Innovation at Treehouse Software, Inc.

Treehouse_Dataflow_Toolkit_Splash

Treehouse Software is helping customers modernize their valuable enterprise data on Cloud and Hybrid Cloud environments without disrupting the existing critical work on their legacy systems. However, a new strategic imperative has been added to the modernization game—the requirement to utilize today’s advanced Analytics/AI/ML-friendly platforms, such as Amazon Redshift, Snowflake, Amazon Athena/S3, Amazon S3 Express One Zone Buckets, as well as Amazon Aurora PostgreSQL, where an ever-expanding array of AI/ML tools are available to generate vital insights from the customer’s data. Many of these customers are already using software tools provided by Treehouse, or other vendors to replicate their data into various target data stores, but also more crucially into Kafka pipelines (i.e., Amazon MSK, Confluent, etc.). Kafka is now the top choice for high-speed streaming of massive volumes of mission critical data, providing stable performance under extreme loads. This is especially valuable for enterprises that require up-to-the-second data delivery for use cases that include e-commerce, financial services, logistics, telecommunications, and government IT.

Traditionally, Treehouse customers utilized our data replication technologies to load legacy data into Kafka pipelines, and that was where our involvement generally ended…

____0_Traditional_Mainframe_To_Kafka

However, once Kafka is designated as a target in the customer’s architecture, we have increasingly become involved in two questions: “What now?”, and/or “What is the best mechanism for us to rapidly transfer data from Kafka to advanced analytics platforms?” Our answer: Look no further than Treehouse Software!

Treehouse Software brings a state-of-the-art, fully automated offering for data transfer from Kafka pipes to Analytics/ML/AI frameworks: the Treehouse Dataflow Toolkit (TDT).  TDT is a set of proprietary microservices that assures highly-available, auto-scalable, and event-driven data transfers to your data science teams’ favorite analytics frameworks, all the while adhering to AWS’s and Snowflake’s recommended best practices for massive data loading, thus assuring shortest and surest loads. Additionally, TDT provides a frictionless and instant implementation, accelerating your path to deep data insights for optimizing business processes.

Why do AWS’s and Snowflake’s best practices recommend against using ODBC?

Your data science teams need large quantities of the very latest data in near-real-time, and ODBC doesn’t really do the job, offering only single-threaded, difficult to scale pipes. By contrast, TDT’s approach not only keeps things up-to-date faster than any conceivable ODBC-based solution, but the “delta tables” into which it loads data also inherently retain the entire history of source data ever since the source-to-target synchronization began (perfect for time-based trend/predictive/prescriptive analytics).  To load massive quantities of data to a target, TDT uses the target vendors’ (massively scalable) bulk load utilities—not ODBC. It’s vital to note that Snowflake and Redshift are NOT relational (OLTP) databases, so doing CDC transfers to these targets via ODBC (with update, insert, delete transactions) goes directly against “best practices” advice from the vendors, and would almost assuredly result in unwieldy bottlenecks.

What if my data is not on a mainframe?

No worries. Treehouse Software’s messaging is primarily mainframe-centric, since that has been our area of expertise and bread-and-butter for over 40 years. However, data movement is data movement, and if your mainframe, or non-mainframe, data is being pumped to a Kafka pipeline, TDT will take it from there. When a data replication tool publishes both bulk-load and CDC data in JSON format to a reliable and scalable framework like Kafka, it sets the stage for TDT to feed legacy data to any number of JSON-friendly ETL tools, target data stores, and the latest (or yet to be invented) data analytics packages. TDT is the turn-key solution for the easiest and fastest implementation of Kafka data transfer…

Treehouse_Dataflow_Toolkit03

TDT allows you to quickly ramp up your data analytics game by providing a rapid flow of data fresh off your enterprise data systems.

Download: TDT AWS Partner Solution Brief to share with your team…

DOWNLOAD…AWS_TDT_Product_Brief_Thumb01

Treehouse Dataflow Toolkit (TDT) is Copyright © 2024 Treehouse Software, Inc. All rights reserved.


____Treehouse_AWS_Badges 

Contact Treehouse Software for a Demo Today!

Contact Treehouse Software today for more information or to schedule a product demonstration.