📝 Publications

NeurIPS 2024
sym

Neural Residual Diffusion Models for Deep Scalable Vision Generation

Zhiyuan Ma, Liangliang Zhao, Biqing Qi, Bowen Zhou

[Code] [Dataset]

  • a simple yet meaningful change to the common architecture of deep generative networks by introducing a series of learnable gated residual parameters that conform to the generative dynamics that facilitates effective denoising, dynamical isometry and enables the stable training of extremely deep networks.
ACM MM 2024
sym

Safe-SD: Safe and Traceable Stable Diffusion with Text Prompt Trigger for Invisible Generative Watermarking

Zhiyuan Ma, Guoli Jia, Biqing Qi, Bowen Zhou

[Code] [Dataset]

  • A safe and high-traceable Stable Diffusion framework (namely Safe-SD) to adaptively implant the graphical watermarks (e.g., QR code) into the imperceptible structure-related pixels.
AAAI 2024
sym

LMD: faster image reconstruction with latent masking diffusion

Zhiyuan Ma, Zhihuan Yu, Jianjun Li, Bowen Zhou

[Code] [Dataset]

  • A simple but faster image reconstruction framework with Latent Masking Diffusion, which stands on the shoulder of DPMs and MAEs.
AAAI 2024
sym

AdapEdit: Spatio-Temporal Guided Adaptive Editing Algorithm for Text-Based Continuity-Sensitive Image Editing

Zhiyuan Ma, Guoli Jia, Bowen Zhou

[Code] [Dataset]

  • A spatio-temporal guided adaptive editing algorithm, which realizes adaptive image editing by introducing a soft-attention strategy to dynamically vary the guiding degree from the editing conditions to visual pixels from both temporal and spatial perspectives.
AAAI 2024
sym

Generative multi-modal knowledge retrieval with large language models

Xinwei Long, Jiali Zeng, Fandong Meng, Zhiyuan Ma, et al.

[Code] [Dataset]

  • An end-to-end generative framework for multi-modal knowledge retrieval by taking advantage of the fact within LLMs can effectively serve as virtual knowledge bases, even when trained with limited data.
AAAI 2023
sym

HybridPrompt: bridging language models and human priors in prompt tuning for visual question answering

Zhiyuan Ma, Zhihuan Yu, Jianjun Li, Guohui Li

[Code] [Dataset]

  • A cloze- and verify- style hybrid prompt framework with bridging language models and human priors in prompt tuning for VQA.
ACM MM 2022
sym

Cmal: A novel cross-modal associative learning framework for vision-language pre-training

Zhiyuan Ma, Zhihuan Yu, Jianjun Li, Guohui Li

[Code] [Dataset]

  • A novel cross-modal associative learning model with anchor points detection and cross-modal associative learning for vision-language pre-training.
COLING 2022
sym

GLAF: global-to-local aggregation and fission network for semantic level fact verification

Zhiyuan Ma, Zhihuan Yu, Jianjun Li, Guohui Li

[Code] [Dataset]

  • we introduce a fresh perspective to revisit the fact verification task and propose a novel Global-to-Local Aggregation and Fission Network (GLAF) to capture latent logical relations hidden in evidence clues for more accurate fact verification.
ACL 2022
sym

UniTranSeR: A unified transformer semantic representation framework for multimodal task-oriented dialog system

Zhiyuan Ma, Jianjun Li, Guohui Li, Yongjing Cheng

[Code] [Dataset]

  • A unified (vision, language, knowledge..) Transformer semantic representation framework with feature alignment and intention reasoning, referred to UniTranSeR, for multimodal task-oriented dialog systems.
EMNLP 2021
sym

Intention reasoning network for multi-domain end-to-end task-oriented dialogue

Zhiyuan Ma, Jianjun Li, Zezheng Zhang, Guohui Li, Yongjing Cheng

[Code] [Dataset]

  • A novel intention mechanism to better model deterministic entity knowledge for joint and multi-hop reasoning in multi-domain end-to-end task-oriented dialogue.

📝 Selected Papers