4 扩散模型 - 随笔分类 - fariver

[PaperReading] Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets

摘要：目录Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large DatasetsTL; DR;DataStage I: Image PretrainingStage II: Curating a Video Pretr 阅读全文

posted @ 2025-07-28 22:24 fariver 阅读(125) 评论(0) 推荐(0)

[PaperReading] FoundHand: Large-Scale Domain-Specific Learning for Controllable Hand Image Generation

摘要：目录FoundHand: Large-Scale Domain-Specific Learning for Controllable Hand Image GenerationTL; DR;Method数据训练过程推理过程Experiment总结与思考Contribution写作 FoundHand 阅读全文

posted @ 2025-05-21 19:11 fariver 阅读(33) 评论(0) 推荐(0)

[Paper Reading] HOIDiffusion: Generating Realistic 3D Hand-Object Interaction Data

摘要：目录HOIDiffusion: Generating Realistic 3D Hand-Object Interaction DataTL;DRMethod阶段一阶段二TrainingCode与LDM的区别与ControlNet的区别算法实现CoAdapterUNetModelExperiment 阅读全文

posted @ 2024-10-23 14:01 fariver 阅读(80) 评论(0) 推荐(0)

[Paper Reading] ControlNet: Adding Conditional Control to Text-to-Image Diffusion Models

摘要：ControlNet: Adding Conditional Control to Text-to-Image Diffusion Models link 时间：23.11 机构：Standford TL;DR 提出ControlNet算法模型，用来给一个预训练好的text2image的diffus 阅读全文

posted @ 2024-08-30 22:10 fariver 阅读(251) 评论(0) 推荐(0)

[Paper Reading] Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model

摘要：Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model link 时间：24.08 机构：Waymo & University of Southern California TL;DR 提出一阅读全文

posted @ 2024-08-28 15:46 fariver 阅读(793) 评论(0) 推荐(0)

[思考] Diffusion Model

摘要：时间线以下是一些重要的里程碑，它们代表了基于Diffusion的图像生成方法的发展：时间&机构名称简述 - VAE Variational AutoEncoder，变分自编码器用于图像生成 2020.12 VQ-VAE Vector Quantized-Variational AutoEnc 阅读全文

posted @ 2024-08-23 20:24 fariver 阅读(200) 评论(0) 推荐(0)

[Paper Reading] VQ-GAN: Taming Transformers for High-Resolution Image Synthesis

摘要：名称 link [VQ-GAN](Taming Transformers for High-Resolution Image Synthesis) 时间：CVPR2021 oral 21.06 机构：Heidelberg Collaboratory for Image Processing, IWR 阅读全文

posted @ 2024-04-01 23:08 fariver 阅读(822) 评论(0) 推荐(0)

[Paper Reading] LVM: Sequential Modeling Enables Scalable Learning for Large Vision Models

摘要：LVM: Sequential Modeling Enables Scalable Learning for Large Vision Models LVM: Sequential Modeling Enables Scalable Learning for Large Vision Models 阅读全文

posted @ 2024-03-28 14:03 fariver 阅读(139) 评论(0) 推荐(0)

[Paper Reading] VQ-VAE: Neural Discrete Representation Learning

摘要：名称 VQ-VAE: Neural Discrete Representation Learning 时间：17.11 机构：Google TL;DR VQ全称为Vector Quantised，故名思义，本文相对于VAE最大改进是将VAE的latent representation由连续建模为离散阅读全文

posted @ 2024-03-26 00:12 fariver 阅读(598) 评论(0) 推荐(0)

[Paper Reading] Imagen: Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding

摘要：名称 Imagen: Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding 时间：22/05 机构：Google TL;DR 发现使用LLM(T5)可以作为text2image任务的text en 阅读全文

posted @ 2024-03-22 20:31 fariver 阅读(203) 评论(0) 推荐(0)

[基础] DiT: Scalable Diffusion Models with Transformers

摘要：名称 DiT: Scalable Diffusion Models with Transformers 时间：23/03 机构：UC Berkeley && NYU TL;DR 提出首个基于Transformer的Diffusion Model，效果打败SD，并且DiT在图像生成任务上随着Flops 阅读全文

posted @ 2024-03-21 23:35 fariver 阅读(2164) 评论(0) 推荐(0)

[Paper Reading] DALLE3: Improving Image Generation with Better Captions

摘要：DALLE3: Improving Image Generation with Better Captions DALLE3: Improving Image Generation with Better Captions 时间：23/10 机构：OpenAI TL;DR 本文认为text-imag 阅读全文

posted @ 2024-03-20 23:34 fariver 阅读(301) 评论(0) 推荐(0)

[Paper Reading] DALLE2: Hierarchical Text-Conditional Image Generation with CLIP Latents

摘要：名称 DALLE2: Hierarchical Text-Conditional Image Generation with CLIP Latents 也叫 UnCLIP 时间：22.04 机构：OpenAI TL;DR OpenAI的首篇从CLIP的image embedding生成图像的方法，实阅读全文

posted @ 2024-03-19 23:42 fariver 阅读(352) 评论(0) 推荐(0)

[Paper Reading] GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models

摘要：GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models GLIDE(Guided Language to Image Diffusion for Generation a 阅读全文

posted @ 2024-03-18 23:46 fariver 阅读(443) 评论(0) 推荐(0)

[Paper Reading] DALLE: Zero-Shot Text-to-Image Generation

摘要：DALLE: Zero-Shot Text-to-Image Generation DALLE: Zero-Shot Text-to-Image Generation 时间：21.02（与CLIP同期论文）机构：OpenAI TL;DR 提出一个将文本与图像作为token，利用Transforme 阅读全文

posted @ 2024-03-16 23:45 fariver 阅读(287) 评论(0) 推荐(0)

[基础] Latent Diffusion Model: High-Resolution Image Synthesis with Latent Diffusion Models

摘要：名称 Latent Diffusion Model, High-Resolution Image Synthesis with Latent Diffusion Models 时间：21.12 机构：runway TL;DR 这篇文章介绍了一种名为潜在扩散模型（Latent Diffusion Mo 阅读全文

posted @ 2024-03-14 21:35 fariver 阅读(2214) 评论(0) 推荐(0)

[Paper Reading] DDIM: DENOISING DIFFUSION IMPLICIT MODELS

摘要：名称 DDIM DENOISING DIFFUSION IMPLICIT MODELS TL;DR 这篇文章介绍了一种名为去噪扩散隐式模型（Denoising Diffusion Implicit Models, DDIMs）的新型生成模型，它是基于去噪扩散概率模型（DDPMs）的改进版本。DDIM 阅读全文

posted @ 2024-03-12 00:12 fariver 阅读(688) 评论(0) 推荐(0)

[基础] DDPM原理

摘要：名称 DDPM: Denoising Diffusion Probabilistic Models 时间：2020.12 TL;DR 这篇文章介绍了一种名为去噪扩散概率模型（Denoising Diffusion Probabilistic Models, DDPM）的新型生成模型。DDPM通过在图阅读全文

posted @ 2024-03-11 00:11 fariver 阅读(2148) 评论(0) 推荐(0)

[基础] VAE原理

摘要：名称 VAE原文 TL;DR 这篇文章介绍了一种名为Auto-Encoding Variational Bayes (AEVB)的算法。AEVB算法通过引入随机变分推断和学习算法，解决了在大数据集和不可解后验分布情况下的推断和学习问题。文章的主要贡献有两个：首先，提出了一个可以直接使用标准随机梯度方阅读全文

posted @ 2024-03-10 20:40 fariver 阅读(317) 评论(0) 推荐(0)

[基础] Vision Transformer

摘要：VIT: AN IMAGE IS WORTH 16X16 WORDS link TL;DR 首篇使用纯Transformer来做CV任务的文章。 Method 首先将图像拆成多个图片Patch，每个Patch通过LindearProjection变成embedding特征，使用Transformer 阅读全文

posted @ 2024-03-05 23:46 fariver 阅读(127) 评论(0) 推荐(0)

fariver

随笔分类 - 4 扩散模型

公告