2026-02-24

How Rosie Works

A text prompt becomes a 3D world. Here's the architecture, the model choices, and the engineering behind it.

Rosie is the AI engine behind world creation in Highrise. A creator types a description — or uploads a reference image — and minutes later, a fully realized 3D environment appears. Here's how it works under the hood.

The pipeline

At a high level, Rosie takes a creative prompt through several stages:

Intent understanding — We parse the creator's input to understand the type of world, the mood, the spatial layout, and key objects.
Asset generation — Individual 3D assets (furniture, walls, decorations) are generated or selected from our library based on the understood intent.
Scene composition — Assets are arranged in a coherent spatial layout that respects both aesthetics and Highrise's technical constraints.
Refinement — Lighting, textures, and final touches are applied to make the world feel polished.

Model choices

We use a combination of proprietary and open models. The intent understanding layer leverages large language models for their reasoning capabilities. Asset generation uses diffusion-based approaches fine-tuned on Highrise's art style. Scene composition is a custom system trained on millions of existing Highrise worlds.

The goal isn't to impress AI researchers. It's to make a 14-year-old feel like a professional world designer.

Engineering constraints

Highrise runs on mobile devices. That means every world Rosie generates needs to:

Load quickly on low-end hardware
Stay within polygon and texture budgets
Feel consistent with the Highrise art style
Be editable by the creator after generation

These constraints actually make the problem more interesting, not less. Unconstrained generation is easy. Generation within tight, real-world limits is where the engineering challenge lives.

What's next for Rosie

We're working on real-time collaborative editing with AI assistance, multi-room world generation, and better creator controls. More on all of that soon.