Vibe Coding: From 2D Image to an HTML5 3D Heavy Equipment Simulator

Recently I came across some impressive AI-powered 3D interactive projects created by Dilum Sanjaya on X. It immediately made me think about digital twins and construction-site scenarios — perhaps large-scale construction machinery monitoring systems could also leverage this workflow to rapidly build more immersive and real-time visualization experiences with AI.

Exploration Direction

Using a piece of heavy equipment from SANY as an example, the idea was to select a real product, combine product images and manuals, and experiment with rapidly generating both the UI and the 3D interactive experience.

Workflow

Text to UI

Use GPT-IMAGE-2 to generate UI concepts and interface visuals.

Image to 3D

Tested three open-source Image-to-3D models:

  • TencentARC : Pixal3D, https://huggingface.co/TencentARC/Pixal3D
  • Tencent Hunyuan3D-2, https://huggingface.co/tencent/Hunyuan3D-2
  • Microsoft TRELLIS.2-4B, https://huggingface.co/microsoft/TRELLIS.2-4B

After comparison, Hunyuan3D delivered the best overall results.

Vibe Coding

The development process combined both Gemini 3.5 and GPT-5.5:

  • Gemini 3.5
    Handled 3D GLB model interaction relatively well and generated usable interaction logic quickly, but the UI fidelity was not ideal.
  • GPT-5.5
    Produced much better UI restoration and styling quality, but struggled with GLB model positioning, camera alignment, and spatial interaction. Even after multiple correction cycles, the final result was still inconsistent.

Eventually, the best approach was letting GPT-5.5 reference the 3D interaction code generated by Gemini 3.5, and then using GPT-5.5 primarily for UI and visual refinement.

Conclusion

The biggest bottleneck is still the current state of Image-to-3D generation.

For real-world heavy machinery, there are still several major limitations:

  • Incomplete structural recognition of complex mechanical equipment
  • Rough material quality, especially for glass and metallic surfaces
  • Generated models are usually merged into a single mesh, making it difficult to animate or control individual components separately

Because of these limitations, it is still difficult to directly deploy this workflow into real production-level business scenarios today.

That said, if high-quality 3D assets already exist, combining them with AI-powered vibe coding is already a very promising way to rapidly build interactive 3D monitoring platforms with real-time data visualization and modern UI experiences.