Which architectural modifications does LCT introduce?
Question
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Lorem ipsum dolor sit amet, consectetur adipiscing elit.Morbi adipiscing gravdio, sit amet suscipit risus ultrices eu.Fusce viverra neque at purus laoreet consequa.Vivamus vulputate posuere nisl quis consequat.
Answers ( 1 )
Key architectural modifications include:
- Long-context MMDiT blocks with full attention mechanisms covering all text and video tokens
- Interleaved 3D Rotary Position Embedding (RoPE) to distinguish between different shots
- Asynchronous timestep strategy supporting both joint denoising and conditional generation