📝 Publications

NeurIPS 2024

Neural Residual Diffusion Models for Deep Scalable Vision Generation

Zhiyuan Ma, Liangliang Zhao, Biqing Qi, Bowen Zhou

[Code] [Dataset]

a simple yet meaningful change to the common architecture of deep generative networks by introducing a series of learnable gated residual parameters that conform to the generative dynamics that facilitates effective denoising, dynamical isometry and enables the stable training of extremely deep networks.

ACM MM 2024

Safe-SD: Safe and Traceable Stable Diffusion with Text Prompt Trigger for Invisible Generative Watermarking

Zhiyuan Ma, Guoli Jia, Biqing Qi, Bowen Zhou

[Code] [Dataset]

A safe and high-traceable Stable Diffusion framework (namely Safe-SD) to adaptively implant the graphical watermarks (e.g., QR code) into the imperceptible structure-related pixels.

AAAI 2024

LMD: faster image reconstruction with latent masking diffusion

Zhiyuan Ma, Zhihuan Yu, Jianjun Li, Bowen Zhou

[Code] [Dataset]

A simple but faster image reconstruction framework with Latent Masking Diffusion, which stands on the shoulder of DPMs and MAEs.

AAAI 2024

AdapEdit: Spatio-Temporal Guided Adaptive Editing Algorithm for Text-Based Continuity-Sensitive Image Editing

Zhiyuan Ma, Guoli Jia, Bowen Zhou

[Code] [Dataset]

A spatio-temporal guided adaptive editing algorithm, which realizes adaptive image editing by introducing a soft-attention strategy to dynamically vary the guiding degree from the editing conditions to visual pixels from both temporal and spatial perspectives.

AAAI 2024

Generative multi-modal knowledge retrieval with large language models

Xinwei Long, Jiali Zeng, Fandong Meng, Zhiyuan Ma, et al.

[Code] [Dataset]

An end-to-end generative framework for multi-modal knowledge retrieval by taking advantage of the fact within LLMs can effectively serve as virtual knowledge bases, even when trained with limited data.

AAAI 2023

HybridPrompt: bridging language models and human priors in prompt tuning for visual question answering

Zhiyuan Ma, Zhihuan Yu, Jianjun Li, Guohui Li

[Code] [Dataset]

A cloze- and verify- style hybrid prompt framework with bridging language models and human priors in prompt tuning for VQA.

ACM MM 2022

Cmal: A novel cross-modal associative learning framework for vision-language pre-training

Zhiyuan Ma, Zhihuan Yu, Jianjun Li, Guohui Li

[Code] [Dataset]

A novel cross-modal associative learning model with anchor points detection and cross-modal associative learning for vision-language pre-training.

COLING 2022

GLAF: global-to-local aggregation and fission network for semantic level fact verification

Zhiyuan Ma, Zhihuan Yu, Jianjun Li, Guohui Li

[Code] [Dataset]

we introduce a fresh perspective to revisit the fact verification task and propose a novel Global-to-Local Aggregation and Fission Network (GLAF) to capture latent logical relations hidden in evidence clues for more accurate fact verification.

ACL 2022

UniTranSeR: A unified transformer semantic representation framework for multimodal task-oriented dialog system

Zhiyuan Ma, Jianjun Li, Guohui Li, Yongjing Cheng

[Code] [Dataset]

A unified (vision, language, knowledge..) Transformer semantic representation framework with feature alignment and intention reasoning, referred to UniTranSeR, for multimodal task-oriented dialog systems.

EMNLP 2021

Intention reasoning network for multi-domain end-to-end task-oriented dialogue

Zhiyuan Ma, Jianjun Li, Zezheng Zhang, Guohui Li, Yongjing Cheng

[Code] [Dataset]

A novel intention mechanism to better model deterministic entity knowledge for joint and multi-hop reasoning in multi-domain end-to-end task-oriented dialogue.

📝 Selected Papers

NeurIPS 2024 Neural Residual Diffusion Models for Deep Scalable Vision Generation. Zhiyuan Ma, Liangliang Zhao, Biqing Qi, Bowen Zhou.
NeurIPS 2024 (Spotlight) Ultramedical: Building specialized generalists in biomedicine. Kaiyan Zhang, Sihang Zeng, Ermo Hua, Ning Ding, Zhang-Ren Chen, Zhiyuan Ma, et al.
NeurIPS 2024 Exploring Adversarial Robustness of Deep State Space Models. Biqing Qi, Yang Luo, Junqi Gao, Pengfei Li, Kai Tian, Zhiyuan Ma, et al.
ACM MM 2024 Safe-SD: Safe and Traceable Stable Diffusion with Text Prompt Trigger for Invisible Generative Watermarking. Zhiyuan Ma, Guoli Jia, et al.
AAAI 2024 LMD: faster image reconstruction with latent masking diffusion. Zhiyuan Ma, Zhihuan Yu, Jianjun Li, Bowen Zhou.
AAAI 2024 AdapEdit: Spatio-Temporal Guided Adaptive Editing Algorithm for Text-Based Continuity-Sensitive Image Editing. Zhiyuan Ma, Guoli Jia, Bowen Zhou.
AAAI 2024 Generative multi-modal knowledge retrieval with large language models. Xinwei Long, Jiali Zeng, Fandong Meng, Zhiyuan Ma, et al.
AAAI 2023 (Oral) HybridPrompt: bridging language models and human priors in prompt tuning for visual question answering. Zhiyuan Ma, Zhihuan Yu, Jianjun Li, Guohui Li.
ACM MM 2022 (Oral) Cmal: A novel cross-modal associative learning framework for vision-language pre-training. Zhiyuan Ma, Zhihuan Yu, Jianjun Li, Guohui Li.
COLING 2022 GLAF: global-to-local aggregation and fission network for semantic level fact verification. Zhiyuan Ma, Zhihuan Yu, Jianjun Li, Guohui Li.
ACL 2022 UniTranSeR: A unified transformer semantic representation framework for multimodal task-oriented dialog system. Zhiyuan Ma, Jianjun Li, Guohui Li, Yongjing Cheng.
EMNLP 2021 Intention reasoning network for multi-domain end-to-end task-oriented dialogue. Zhiyuan Ma, Jianjun Li, Zezheng Zhang, Guohui Li, Yongjing Cheng.