论文标题
与扩散模型的新型视图合成
Novel View Synthesis with Diffusion Models
论文作者
论文摘要
我们提出了3DIM,这是3D新型视图合成的扩散模型,它能够将单个输入视图转化为许多视图中一致且敏锐的完成。 3DIM的核心成分是姿势条件图像到图像扩散模型,该模型采用源视图及其姿势作为输入,并为目标姿势作为输出生成新颖的视图。 3DIM可以使用一种称为随机调节的新技术生成3D一致的多个视图。输出视图是自动审核的,在每个新颖的视图的生成过程中,一个人从每个Denoising步骤中的一组可用视图中选择一个随机调节视图。我们证明,随机调节可显着提高图像到图像扩散模型的幼稚采样器的3D一致性,该模型涉及在单个固定视图上进行调节。我们将3DIM与SRN Shapenet数据集的先前工作进行了比较,这表明3DIM从单个视图中生成的完成实现了更高的保真度,而大约3D一致。我们还引入了一种新的评估方法,即3D一致性评分,以通过在模型的输出视图上训练神经场来衡量生成对象的3D一致性。 3DIM不含几何形状,不依赖于超网络或测试时间优化来进行新型视图合成,并且允许单个模型轻松扩展到大量场景。
We present 3DiM, a diffusion model for 3D novel view synthesis, which is able to translate a single input view into consistent and sharp completions across many views. The core component of 3DiM is a pose-conditional image-to-image diffusion model, which takes a source view and its pose as inputs, and generates a novel view for a target pose as output. 3DiM can generate multiple views that are 3D consistent using a novel technique called stochastic conditioning. The output views are generated autoregressively, and during the generation of each novel view, one selects a random conditioning view from the set of available views at each denoising step. We demonstrate that stochastic conditioning significantly improves the 3D consistency of a naive sampler for an image-to-image diffusion model, which involves conditioning on a single fixed view. We compare 3DiM to prior work on the SRN ShapeNet dataset, demonstrating that 3DiM's generated completions from a single view achieve much higher fidelity, while being approximately 3D consistent. We also introduce a new evaluation methodology, 3D consistency scoring, to measure the 3D consistency of a generated object by training a neural field on the model's output views. 3DiM is geometry free, does not rely on hyper-networks or test-time optimization for novel view synthesis, and allows a single model to easily scale to a large number of scenes.
