Hi, i am trying to catch up what's the state of Context Parallel for Flex attention.
Per #2145, CP + Flex attention is supported now (llama 3 & llama4) ? and does it means "generally" Flex attention can work with CP? For example sliding window attention?
And is this rope_cache #2077 blocked Flex attention CP for qwen & GPT-OSS and/or is there any another reasons?
what aboutdeepseek?
Hi, i am trying to catch up what's the state of Context Parallel for Flex attention.
Per #2145, CP + Flex attention is supported now (llama 3 & llama4) ? and does it means "generally" Flex attention can work with CP? For example sliding window attention?
And is this rope_cache #2077 blocked Flex attention CP for qwen & GPT-OSS and/or is there any another reasons?
what aboutdeepseek?