Four Ways to Improve Verbo-visual Fusion for Dense 3D Visual Grounding

Direct Link