state of Context Parallel & Flex attention

Hi, i am trying to catch up what's the state of Context Parallel for Flex attention.

Per https://github.com/pytorch/torchtitan/pull/2145, CP + Flex attention is supported now (llama 3 & llama4) ? and does it means "generally" Flex attention can work with CP? For example sliding window attention?

And is this rope_cache https://github.com/pytorch/torchtitan/issues/2077 blocked Flex attention CP for qwen & GPT-OSS and/or is there any another reasons?

what about[deepseek](https://github.com/pytorch/torchtitan/blob/main/torchtitan/models/deepseek_v3/infra/parallelize.py#L71)?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

state of Context Parallel & Flex attention #2417

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

state of Context Parallel & Flex attention #2417

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions