Scatter plots are one of the most common and useful plots for visualizing relationships between two continuous variables. The plotly Python library provides a powerful graph objects framework for building customizable interactive scatter plots.

In this comprehensive guide, we will explore how to:

  • Create basic scatter plots with graph objects
  • Customize markers, colors, scales, and styles
  • Animate scatter plot traces over time
  • Add interactivity with hover text, click events, selections
  • Plot geographic data on scatter mapbox plots
  • Leverage datasets to quickly generate plots
  • Combine multiple traces for advanced use cases
  • Statistical analysis and model fitting
  • Best practices for scatter plot design

Getting Started with Basic Scatter Plots

To get started, import plotly.graph_objects and instantiate a Figure containing a Scatter trace:

import plotly.graph_objects as go

fig = go.Figure(data=go.Scatter(
    x=[0, 1, 2, 3],
    y=[2, 1, 4, 3]  
))

fig.show()

By default, Plotly will draw line connectors between the points. To switch to markers-only mode, set the mode parameter to "markers":

fig = go.Figure(data=go.Scatter(
    x=[0, 1, 2, 3],
    y=[2, 1, 4, 3],
    mode="markers"
))

Customizing Markers, Colors, and Styles

We have extensive control over the visual styling of the markers. Here are some of the customizations we can apply:

Size, Symbol, Color

fig = go.Figure(data=go.Scatter(
    x=[0, 1, 2, 3],
    y=[2, 1, 4, 3], 
    mode="markers", 
    marker=dict(
        size=30,
        symbol=‘pentagon‘,
        color=‘rgb(200, 0, 0)‘,
    )
))

Border Width, Color

fig = go.Figure(data=go.Scatter(
    marker=dict(
        line=dict(
            width=4,
            color=‘rgb(0, 0, 0)‘
        )
    ) 
))

Opacity

fig = go.Figure(data=go.Scatter(
    marker=dict(
        opacity=0.5
    )
))

We can also set these properties to arrays to visually encode an extra variable.

For example, mapping the marker size to a z-dimension:

fig = go.Figure(data=go.Scatter(
    x=[0, 1, 2, 3],
    y=[2, 1, 4, 3],
    marker=dict(
        size=[10, 20, 30, 40], 
    )
))

Customizing Colorscales

To visualize a third continuous variable, we can map a colorscale to the markers.

First we import colorscales from plotly.express, then create the scale and pass to the marker parameter along with showscale=True:

from plotly.express import px
fig = go.Figure(data=go.Scatter(
    x=[0, 1, 2, 3],
    y=[2, 1, 4, 3],
    mode="markers", 
    marker=dict(
        color=[180, 220, 280, 340],
        colorscale=px.colors.sequential.Viridis, 
        showscale=True
    )
))
Colorscale Description
Viridis Perceptually uniform, printable friendly
Cividis Colorvision deficiency friendly
Turbo Distinct colors

We can also color the markers using a categorical column from a dataset:

import pandas as pd
df = pd.DataFrame({
    ‘x‘: [0, 1, 2, 3],
    ‘y‘: [2, 1, 4, 3],
    ‘category‘: [‘a‘, ‘b‘, ‘a‘, ‘b‘]    
})

fig = go.Figure(data=go.Scatter(
    x=df[‘x‘],
    y=df[‘y‘],
    marker=dict(
        color=df[‘category‘],
        colorscale=[‘blue‘, ‘red‘],
        showscale=True
    ) 
))

This makes it easy to quickly visualize clusters and categories on different axes.

Animating Scatter Plot Traces Over Time

To create animated scatter plots, we add frames defining the data for each timestep:

import numpy as np
t = np.linspace(0, 20, 100)
x = np.sin(t) + np.random.randn(100)*0.2
y = np.cos(t) + np.random.randn(100)*0.2

fig = go.Figure(data=go.Scatter(
    x = [x[0]],
    y = [y[0]], 
    mode="markers+lines"
), frames=[go.Frame(
    data=go.Scatter(
        x=x[:k+1],
        y=y[:k+1]
   )
) for k in range(len(x))]
)

fig.show()

This plots the points sequentially over time. We can also parameterize the styles so colors, sizes etc change over time too.

Some common animation examples include:

  • Simulating model predictions
  • Visualizing movement over time
  • Showing temporal patterns and seasonality

Adding Interactivity to Scatter Plots

Plotly figures have built-in support for hover tooltips, click events, and selections.

Hover Text

To set the text displayed when hovering over a point, use a hovertext or text parameter:

import numpy as np

fig = go.Figure(data=go.Scatter(
    x=np.random.rand(10),
    y=np.random.rand(10),
    hovertext=[‘Point A‘, ‘Point B‘, ‘Point C‘], 
    mode=‘markers‘ 
))

Click Events

We can also execute Python callbacks when clicking on points with clickmode=‘event+select‘:

import numpy as np

def click_handler(trace, points, state):
    ind = points.point_inds[0]
    print(f‘Clicked on Point {ind}‘) 

fig = go.Figure(data=go.Scatter(
    x=np.random.rand(10),
    y=np.random.rand(10),
    mode=‘markers‘,
    clickmode=‘event+select‘ 
))

fig.data[0].on_click(click_handler)
fig.show()

Selections

There is also built-in support for selecting points with rectangles or lasso shapes:

fig = go.Figure(data=go.Scatter(
    x=np.random.rand(10),
    y=np.random.rand(10), 
    mode=‘markers‘,
    selectedpoints=[]
))

fig.update_layout(
    dragmode=‘lasso‘  
)

These interactivity features make Plotly scatter plots much more engaging and usable for things like data cleaning, identification of outliers, and dynamic linking to other plots.

Geographic Scatter Plots with Mapbox

For geographic data, we can create interactive scatter plots mapped onto Mapbox maps.

First install plotly-geo, import px.scatter_mapbox, and pass in the location data:

import plotly.graph_objects as go
fig = go.Figure(data=go.Scattermapbox(
    lat=[45, 53, 38],
    lon=[-75, -3, -97], 
    mode = "markers",
    marker = go.scattermapbox.Marker(
        size = 14
    )
))

Then configure the Mapbox layout:

fig.update_layout(
    mapbox_style="open-street-map",
    mapbox_center_lat=40,
    mapbox_center_lon=-20, 
    mapbox=dict(
        zoom=3
    )
)

Mapbox access tokens are required for public visualization. Offline development works without a token.

Common use cases include:

  • Plotting travel paths over time
  • Visualizing spatial point patterns
  • Linking geographic regions to multivariate data

Statistical Analysis and Model Fitting

Scatter plots provide an intuitive visualization for statistical analysis between two variables.

We can fit models to assess the correlation and relationship. For example, using numpy polyfit to fit a polynomial:

import numpy as np
x = np.array([0, 1, 2, 3, 4, 5])
y = np.array([1, 3, 2, 4, 7, 10])

z = np.polyfit(x, y, 2)
f = np.poly1d(z)

xp = np.linspace(0, 5, 100)
yp = f(xp)

fig = go.Figure(data=go.Scatter(
    x=x, y=y,
    mode=‘markers‘
))
fig.add_trace(go.Scatter(x=xp, y=yp, 
                line=dict(color=‘darkblue‘, width=2)))

fig.show()

We can also add linear regression, confidence intervals, compute correlation coefficients, and more. These modeling capabilities make Plotly scatter plots useful for quantitative analysis.

Leveraging Datasets for Quick Plots

For rapid data exploration, Plotly Express lets you instantly visualize DataFrames:

import plotly.express as px
df = px.data.iris()

fig = px.scatter(df, x="sepal_width", y="sepal_length") 
fig.show() 

Plotly Express handles details like axes labels, hovers, colors, and legends automatically.

We can then use the generated figure to customize as needed with graph objects:

import plotly.graph_objects as go
from plotly.express import scatter

fig = scatter(df, x="sepal_width", y="sepal_length")

fig.update_traces(
    hovertemplate="Species: %{customdata}<extra></extra>"  
)
fig.update_layout(
    title=‘My Custom Iris Plot‘,
    width=800,
    height=500 
)

This gives the best of both worlds: quick exploration with Express then customization with graph objects.

Combining Multiple Trace Types

By layering different trace types, we can build rich scatter plot use cases:

import numpy as np
import plotly.graph_objects as go

t = np.linspace(0, 10, 200)

fig = go.Figure() 

fig.add_trace(go.Scatter(
    x=t, y=np.sin(t), 
    mode=‘lines‘, 
    name=‘sin function‘
))

fig.add_trace(go.Scatter(
    x=t, y=np.cos(t),
    mode=‘markers‘,
    name=‘cos samples ‘,  
    marker=dict(size=10)
))

fig.show()

Some ideas for multiple traces:

  • Regressions + data points
  • Fitted models + confidence intervals
  • Data points + smoothed trends
  • Timeseries forecast vs actual
  • Geographic regions + categories

By overlaying scatter traces, you can build rich, customized data visualizations.

Design Best Practices for Scatter Plots

Here are some key tips for effective scatter plot design:

Label Clearly

  • Give the plots and axes clear descriptive labels
  • Include units of measurement where applicable
  • Use plot subtitles and captions if needed for clarity

Show Origins

  • Anchor axes at zero when possible
  • If zoomed/transformed clearly indicate on axes

Visualize Distributions

  • Use marginal histograms or kde plots to show distribution of each variable
  • Can help identify patterns like gaps, natural clusters, etc

Color and Symbol Encode

  • Use color or symbols to visually encode categories
  • Apply distinct easily differentiable colors/symbols

Control Density

  • Adjust marker size, opacity, jitter to control overplotting
  • Show marker density with heatmaps

By following these kinds of best practices, you can create scatter plots that clearly communicate the relationships in your data.

Conclusion

In this comprehensive guide, we explored how to leverage Plotly‘s powerful graph objects API for building customizable scatter plots in Python.

Key topics included:

  • Basic scatter plot configuration
  • Customizing markers, colors, styles, and scales
  • Adding animations, interactivity, geographic mapping
  • Statistical analysis and model fitting
  • Quick plotting with datasets
  • Combining multiple traces
  • Best practices for effective visualization

With the graph objects framework, you have extensive control to create interactive publication-quality scatter plots tailored to your specific needs.

The wide range flexibility makes Plotly one of the best libraries for advanced statistical data visualization in Python.

Similar Posts