Visual Flow is an ETL/ELT tool designed for effective data management via convenient and user-friendly interface. The tool has the following capabilities:
- Can integrate data from heterogeneous sources:
- Azure Blob Storage
- AWS S3
- Cassandra
- Click House
- DB2
- Databricks JDBC (global configuration)
- Databricks (Databricks configuration)
- Dataframe (for reading)
- Google Cloud Storage
- Elastic Search
- IBM COS
- Kafka
- Local File
- MS SQL
- Mongo
- MySQL/Maria
- Oracle
- PostgreSQL
- Redis
- Redshift
- REST API
- It supports the following file formats:
- Delta Lake
- Parquet
- JSON
- CSV
- ORC
- Avro
- Text
- Binary (PDF, DOC, Audio files)
- Leverage direct connectivity to enterprise applications as sources and targets
- Perform data processing and transformation
- Run custom code
- Leverage metadata for analysis and maintenance
- Allows to deploy in two configurations and run jobs in Spark/Kubernetes and Databricks environments respectively
- Leverages Generative AI capabilities via tasks like Parse text, Generate data, Transcribe, Generic task
Visual Flow application is divided into the following repositories:
- Visual-Flow-frontend
- Visual-Flow-backend (current)
- Visual-Flow-jobs
- Visual-Flow-deploy
- Visual-Flow-backend-db-service
- Visual-Flow-backend-history-service
Visual Flow backend is the REST API app, that serves as a middleware between frontend application and k8s-like orchestration environments, that run jobs with Spark. It gives you ability to manage Visual Flow entities (projects, jobs, pipelines) in the following ways:
- Create/delete project which serves as a namespace for jobs and/or pipelines
- Manage project settings
- User access management
- Create/maintain a job
- Job execution and logs analysis
- Create/maintain a pipeline
- Pipeline execution
- Cron pipelines
- Import/Export jobs and pipelines
Visual Flow is an open-source software licensed under the Apache-2.0 license.