
Recently I came across some impressive AI-powered 3D interactive projects created by Dilum Sanjaya on X. It immediately made me think about digital twins and construction-site scenarios — perhaps large-scale construction machinery monitoring systems could also leverage this workflow to rapidly build more immersive and real-time visualization experiences with AI.
Using a piece of heavy equipment from SANY as an example, the idea was to select a real product, combine product images and manuals, and experiment with rapidly generating both the UI and the 3D interactive experience.

Use GPT-IMAGE-2 to generate UI concepts and interface visuals.

Tested three open-source Image-to-3D models:
After comparison, Hunyuan3D delivered the best overall results.
The development process combined both Gemini 3.5 and GPT-5.5:
Eventually, the best approach was letting GPT-5.5 reference the 3D interaction code generated by Gemini 3.5, and then using GPT-5.5 primarily for UI and visual refinement.
The biggest bottleneck is still the current state of Image-to-3D generation.
For real-world heavy machinery, there are still several major limitations:
Because of these limitations, it is still difficult to directly deploy this workflow into real production-level business scenarios today.
That said, if high-quality 3D assets already exist, combining them with AI-powered vibe coding is already a very promising way to rapidly build interactive 3D monitoring platforms with real-time data visualization and modern UI experiences.