← back to blog

How Rosie Works

A text prompt becomes a 3D world. Here's the architecture, the model choices, and the engineering behind it.

Rosie is the AI engine behind world creation in Highrise. A creator types a description — or uploads a reference image — and minutes later, a fully realized 3D environment appears. Here's how it works under the hood.

The pipeline

At a high level, Rosie takes a creative prompt through several stages:

  1. Intent understanding — We parse the creator's input to understand the type of world, the mood, the spatial layout, and key objects.
  2. Asset generation — Individual 3D assets (furniture, walls, decorations) are generated or selected from our library based on the understood intent.
  3. Scene composition — Assets are arranged in a coherent spatial layout that respects both aesthetics and Highrise's technical constraints.
  4. Refinement — Lighting, textures, and final touches are applied to make the world feel polished.

Model choices

We use a combination of proprietary and open models. The intent understanding layer leverages large language models for their reasoning capabilities. Asset generation uses diffusion-based approaches fine-tuned on Highrise's art style. Scene composition is a custom system trained on millions of existing Highrise worlds.

The goal isn't to impress AI researchers. It's to make a 14-year-old feel like a professional world designer.

Engineering constraints

Highrise runs on mobile devices. That means every world Rosie generates needs to:

These constraints actually make the problem more interesting, not less. Unconstrained generation is easy. Generation within tight, real-world limits is where the engineering challenge lives.

What's next for Rosie

We're working on real-time collaborative editing with AI assistance, multi-room world generation, and better creator controls. More on all of that soon.