As organizations unlock value from ever-growing datasets, sharing and managing access to data at scale is crucial. Amazon Redshift‘s "Datashares" capability allows producers to easily share live data with consumers without manually copying or syncing.
In this comprehensive 2600+ word guide, we dive deep into alter datashare, the command to modify datashares. We‘ll analyze use cases, technical configuration details, usage metrics, access control best practices, and more from the lens of a full-stack developer.
Datashare Overview
Launched in 2020, Redshift datashares establish live connections between producer databases and consumer clusters. Consumers can directly query the producer’s data without copying or transforming the data.

Redshift datashare architecture (Source: AWS)
Producer databases share data at the schema- and object-level granularity. Consumers use the shared datasets through remote Read-Only connections.
Compared to traditional ETL pipelines, datashares provide low-latency access, reduced operational complexity, and flexibility as data needs evolve. Datashares also log all usage metrics for observability and audit purposes.
Use Cases Driving Datashare Adoption
Datashares unlock analytics use cases like:
Centralized Data Hub
Companies can consolidate data from applications, pipelines and databases into a production-grade analytics-optimized datashare. Business units get performant access to clean, timely data.
Self-Service Analytics
Data teams build and manage datashares that are discoverable enterprise-wide. Stakeholders can directly analyze shared data without IT queue bottlenecks.
Value Chain Analytics
Manufacturers can share inventory, supply chain data with resellers and suppliers to coordinate and optimize planning.
Analytics-as-a-Service
Data providers like financial data firms can monetize data products for clients to consume on-demand.
Compliance Reporting
Banks use datashares to securely share financial data with regulators to demonstrate compliance with reporting mandates.
AI/ML Data Access
Models needing reliable access to the latest, production datasets can leverage datashares.
The above use cases demand flexible control over shared data as business needs evolve – driving adoption of alter datashare.
Alter Datashare Syntax
The ALTER DATASHARE command allows modifying datashares by:
- Adding/Removing objects like tables, schemas
- Configuring settings like access control, automatic refresh
Here is the syntax:
ALTER DATASHARE datashare_name
{
ADD | REMOVE
}
{
TABLE schema.table
| SCHEMA schema
| FUNCTION schema.function_name()
| ALL TABLES IN SCHEMA schema
| ALL FUNCTIONS IN SCHEMA schema
}
ALTER DATASHARE datashare_name
{
SET PUBLICACCESSIBLE [=] TRUE | FALSE
| SET INCLUDENEW [=] TRUE | FALSE FOR SCHEMA schema
}
Key notes on alter datashare:
- Modifications take effect instantly without needing to rebuild datashare
- Control access by granting USAGE/ALTER privileges to users
- Manage costs by sharing at schema or object-level granularity
Now let‘s analyze some example use cases.
Compliance Reporting Use Case
Banks using Redshift to generate regulatory reports need to share sensitive financial data with external agencies frequently.
For instance, the Securities Exchange Commission (SEC) in the USA requires periodic sharing of trading activity to detect fraud or insider trading.
Banks have to balance making timely data available to the SEC while closely controlling access to sensitive information.
Redshift datashares can achieve this securely by:
1. Create base datashare
CREATE DATASHARE financial_datashare;
ALTER DATASHARE financial_datashare
ADD SCHEMA financial;
This adds the entire financial schema containing all trading data.
2. Grant access to SEC
GRANT USAGE
ON DATASHARE financial_datashare
TO SEC_Users;
This allows the SEC read-only access to all trade data.
3. Alter datashare for compliance
ALTER DATASHARE financial_datashare
REMOVE TABLE financial.trades;
ALTER DATASHARE financial_datashare
ADD TABLE financial.regulatory_trades;
Here the bank modifies the datashare to remove the core trades table containing sensitive transaction details.
It adds a new regulatory_trades table with aggregated data sufficient for the SEC to run compliance reports.
This way the bank can securely share timely data with regulators without exposing unnecessary sensitive information.
Next let‘s look at a machine learning use case.
Machine Learning Use Case
AI teams need reliable access to the freshest datasets for model training and inferences. Data drift from using outdated datasets leads to inaccurate predictions.
Redshift datashares allow seamlessly connecting models to production data. The steps are:
1. Create ML datashare
CREATE DATASHARE ml_datashare;
ALTER DATASHARE ml_datashare
ADD SCHEMA production;
This shares the production schema containing application data like user activity events.
2. Data science teams connect their Redshift instances
Data scientists configure IAM access to query the datashare from their analytic Redshift clusters.
Most AI tools like SageMaker, Databricks natively connect to Redshift, making adoption frictionless.
3. Retrain models incrementally
Data teams rebuild models on schedules using the latest datashare data instead of stale copies.
daily_data = unload_redshift_datashare_to_dataframe()
retrain_model(daily_data)
4. Alter datashare as data evolves
Over time as new features are logged or schema changes, alter datashare helps effortlessly handle evolution.
For example, to share a new user feedback column:
ALTER DATASHARE ml_datashare
ADD TABLE production.user_engagement;
Now let‘s analyze datashare access control and security considerations.
Fine-grained Access Control
Datashares provide fine-grained control over permissions through IAM policies and SQL GRANT privileges.
For example, a media company sharing customer engagement data with external analytics vendors can configure:
IAM Policy
The policy grants the DatashareConsumer role permissions to access datashares in their account:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "VisualEditor0",
"Effect": "Allow",
"Action": ["datashare:DescribeConsumer",
"datashare:AcceptShareInvitation",
...
],
"Resource": "*"
}
]
}
Table-level Privileges
Further lock down visibility using SQL:
REVOKE SELECT ON shared_table FROM PUBLIC; -- Revoke access
GRANT SELECT
ON shared_table
TO analytics_vendor_group; -- Allow access
IAM Condition Keys
Granular sharing can also be enforced using IAM condition keys.
For example, restrict access to business hours:
"Condition": {
"DateGreaterThan": {"aws:CurrentTime": "2022-07-04T09:00:00Z"},
"DateLessThan": {"aws:CurrentTime": "2022-07-04T17:00:00Z"}
}
These authorization mechanisms allow implementing least-privilege and need-to-know access controls when sharing sensitive data.
Now let‘s look at usage metrics and statistics.
Analyzing Datashare Usage
Redshift provides detailed metrics on datashare usage – essential for monitoring costs and performance.
Admins can view hourly/daily metrics like:
- Bytes scanned
- No. of rows returned
- Query run times
- Table/Schema access patterns
For example, aggregate statistics metrics can be retrieved using this SQL:
SELECT
date_trunc(‘hour‘, usage_timestamp) AS hour,
SUM(rows_accessed) AS rows_returned,
SUM(bytes_scanned) AS bytes_scanned,
COUNT(DISTINCT query_id) AS query_count
FROM
svv_datashare_usage
GROUP BY 1
ORDER BY 1;
| hour | rows_returned | bytes_scanned | query_count |
|---|---|---|---|
| 2022-12-01 13:00:00+00 | 562,123 | 97 GB | 342 |
| 2022-12-01 14:00:00+00 | 781,247 | 112 GB | 512 |
Sample datashare usage metrics
These metrics combined with consumer-side visibility from STL tables offer visibility into usage as shown below:

Analyzing usage across accounts (Source: AWS)
Trends like surging scan volumes or cross-account activity can inform decisions to alter datashares:
- Add/Remove objects to optimize costs
- Apply filters to limit result size
- Adjust replication and refresh settings
Next, let‘s compare datashares to other sharing options.
Comparison to Other Data Sharing Methods
Beyond datashares, Redshift also enables sharing using:
-
Amazon Redshift Spectrum to directly run SQL queries against exabytes of unstructured data in S3. No data loading needed.
-
Federated Query to analyze data across databases like Redshift, Aurora, and MySQL.
How does alter datashare compare?
| Dimension | Datashare | Spectrum | Federated Query |
|---|---|---|---|
| Architecture | Producer-consumer clusters | External S3 tables | Distributed databases |
| Performance | Optimized for analytics | Lower, varies by file format | Depends on endpoints |
| Access control | SQL GRANT commands | S3 + IAM policies | Per database users/roles |
| Use cases | Analytics, reporting | Ad-hoc exploration | Consolidated dashboards |
| Cost | Pay per query | Pay per scan | Per standard rates |
Datashares uniquely enable optimized joint analytics by producer and consumer Redshift clusters. This high performance motivates use for production reporting and machine learning.
Advanced Topics and Best Practices
Now that we‘ve covered basic datashare usage, let‘s discuss some advanced considerations when sharing analytics datasets:
Schema and Table Optimization
When sharing large datasets, optimize organisation for analytics:
✅Well-partitioned schemas: Partition big tables by time or product to accelerate queries through partition elimination. Alter partitions over time.
✅Sort keys: Define sort keys based on common join or filter conditions. Improve scan performance.
✅Distribution keys: Choose the right data distribution style – AUTO, KEY or ALL to minimize data movement during joins and aggregations.
❌Avoid over-normalization: Excessively granular row-store tables slow down queries. Avoid complexity beyond what analytics need.
Query Isolation and Prioritization
Use Workload Management (WLM) tools like queues, concurrency scaling, and monitoring rules to achieve:
- Query isolation between production and analytics workloads
- Preventing shared data queries from interfering with database performance
- Query prioritization within datashares when there are multiple consumers
Caching and Refresh Strategies
Balance the trade-off between data freshness and query costs:
- Configure refresh intervals through automation based on usage patterns
- Cache common drill-down reports and refresh periodically instead of per-query
- For applications needing 100% real-time data, use other integration mechanisms
Key Takeaways
We covered a lot of ground discussing Redshift alter datashare including:
- Datashare architecture and typical use cases
- Syntax for modifying datashares using ALTER commands
- Usage examples spanning analytics, machine learning and compliance
- Fine-grained access control best practices
- Query performance optimization considerations
- Tools for monitoring datashare usage
The ability to effortlessly share live, analytics-optimized data at scale unlocks tremendous innovation across organizations. Mastering alter datashare opens up this collaborative potential while maintaining world-class performance, security and governance.


