Real-Time or Your-Time—How Visual Vocal Can Change the AR/VR Landscape in AEC

Anthony Frausto-Robledo, AIA, NCARB, LEED AP

6 years ago

People are busy, and large meetings are hard to set up and time-consuming. Certainly, there must be a better way to collaborate with today’s technologies. Enter Visual Vocal, a new AR/VR software technology company that aims to streamline AEC collaboration.

“Our investors refer to us as the Google Docs of AR,” says John SanGiovanni, CEO and co-founder of Visual Vocal, “because we are a lightweight, cloud-based AR/VR communication platform.” A platform that can dramatically change the way people in AEC and other industries do their collaboration and communication. SanGiovanni is talking to me about the advantages of Visual Vocal over other VR/AR options in the market and doing so while leading me through a demo on my iPhone 6, which I popped into a Google Cardboard device that I conveniently keep near my desk.

Our investors refer to us as the Google Docs of AR because we are a lightweight, cloud-based AR/VR communications platform.

“The best way to think of Visual Vocal is we are this general purpose document and communication platform for a new future of AR and VR,” he adds. And Visual Vocal’s technologies do not require an expensive VR headset to work—you use your iPhone or Android smartphone. “Everything we do works at a perfect frame-rate over cellular—our system is optimized for cellular;” SanGiovanni says that their goal is to democratize AR and VR and the best way to do that is to build a system around the device everybody already has. A smartphone.

Incubated at NBBJ—A Different Kind of AR/VR Company

Visual Vocal is a different kind of AR/VR software company, and this difference goes beyond its product vision. The company was founded in 2015 in partnership with global architectural leader NBBJ, which helped incubate the startup as well as serve as a critical real-life context for use-case-based development.

01 – John SanGiovanni, CEO and co-founder of Visual Vocal.

In the fall of 2017 Visual Vocal raised $3.6 million in a seed fundraising round led by Eniac Ventures, a VC firm that has also funded well-known innovators like Airbnb, brightwheel, SoundCloud, and ELEVATE. The Eniac team believes that VR, AR and AI technologies—all emerging technologies (emTech) in many respects—will “push the boundaries on how businesses operate, collaborate and go to market in the next five to 10 years.”

John SanGiovanni worked at Microsoft Research for many years, where he was responsible for worldwide research for advanced UI, mobile devices, and AR areas. He has also developed successful mobile apps before with co-founder and CTO, Sean B. House, who has deep knowledge of the whole technology stack behind Visual Vocal.

I had this insight that visualization was kind of table stakes and also not the most interesting or difficult thing to do with VR, so my co-founder and I decided to attack a much harder challenge in communication.

NBBJ’s role in the venture was to serve as an incubator and to use its real-life projects for testing and refining the technology. “I had this insight that visualization was kind of table stakes and also not the most interesting or difficult thing to do with VR,” said SanGiovanni, “so my co-founder and I decided to attack a much harder challenge in communication.” But in order to do that well, the pair needed a third co-founder that could clearly articulate the day-to-day tactical needs of a very large architectural enterprise doing very large projects—“the sort of multi-billion dollar construction projects.”

02 – Using just your iOS or Android smartphone you can run the free Visual Vocal app and join meetings up to 20 people. The app is shown here in VR mode (split screen) with quick-access VR lens rather than the Google Cardboard.

“Fortunately we were here in Seattle, and we got introduced through a mutual friend to Steve McConnell [NBBJ Managing Partner]. In a press release back in 2016 McConnell states that the firm’s decision to launch Visual Vocal was representative of their “ongoing mission to find more informative and inspiring ways to engage clients in the design process.” NBBJ found that Visual Vocal radically shifted the way “design feedback was sourced and integrated into projects.”

“We sit right here inside the NBBJ offices [in Seattle], and it has been an amazing way for us to collaborate with a wide array of other architecture firms,” says SanGiovanni. He adds that these days their customer base has gone far beyond NBBJ and transcends the vertical architectural world. “In fact, at this moment,” he notes, “most of our revenue comes from the construction vertical.”

Collaboration at a Distance—Cloud-based and Multiparty

As John SanGiovanni leads me through a demo, I realize what sets Visual Vocal apart from the competition is largely encapsulated in its product name. The word “vocal” is significant in the app from its R2D2 robotic pairing technology to its voice messages tied into annotation tied into VR/AR imagery. Visual Vocal uses sonic pairing technology by its partner Chirp, which sends a robotic-sounding signal of data over sound. This pairs you to a meeting session. No need to enter one of those 9-digit codes to get started with collaboration. We all know what a pain those are.

We sit right here inside of the NBBJ offices [in Seattle], and it has been an amazing way for us to collaborate with a wide array of other architecture firms.

Once inside the Visual Vocal session the app uses its multi-user messaging technology to support up to 20 people simultaneously. Each person gets a color assigned to them so that when they do “markup” to VR/AR imagery, it clearly indicates who did the markup. Because the app was designed to be lightweight there is no need to hunt for a WiFi signal at a job site. It will work with a cellular signal just fine.

Visual Vocal is designed for smartphones, and you place the phone into a Google Cardboard, ideally. You can also use it Pokemon Go style without the app splitting the screen into left and right images for stereo imagery. This means I believe, you can use the app on an iPad just as well, but we didn’t test this out.

03 – Sitting around in meetings up to 20 users can enter the AR/VR Visual Vocal environment at the same time. This democratizes the usefulness of these types of meetings.

SanGiovanni has loaded a skyscraper project into the Architosh demo. He is showing me how to annotate, leave voice messages, and teleport to other areas of the building. “You can draw too with your own color using your cardboard, just hold down the button [on the Cardboard] and move your head around,” he says. Using your head to draw is an interesting exercise in neck muscle control, but it is workable.

The signature features in Visual Vocal have more to do with how you can leave voice messages attached to annotation moments. “I am a huge fan of asynchronous communication,” he chimes in, “because often times on large projects it is death by meetings that take a long time to coordinate and schedule.” SaaS software, in general, is philosophically anti-meeting—that’s the whole point of tools like Asana and originally tools like 37signals’ BaseCamp. Remoteness is another issue that the power of the cloud addresses and a place where Visual Vocal shines. Collaboration at a distance is critical for firms like NBBJ doing projects all over the world, but it’s also peoples’ busy and non-aligning schedules.

04 – This image (left side) shows the Visual Vocal user-interface. Hot spots are indicated as blue-highlighted squares. When you move your center of view over them, menus automatically pop up. In this case option A shows the valve system closed, option B open. Choosing between the two “loads” different images.

SanGiovanni quickly shows me how easy it is for him to record a voice note inside the VR and then send it to me like as if it was an email. Visual Vocal has something called Visual Vocal (VV) inbox. Here I received his message, quickly went into it, where it took me to the spot in the building he wanted to talk about. This works whether an architect sends a collaborating engineer or client a view inside of a BIM model or whether a general contractor sends an architect a view taken from a construction site. So how does that process actually work?

next page: Getting It Done in Visual Vocal

Getting It Done in Visual Vocal

Visual Vocal works with standard stereo panoramic imagery. It works with stereo spherical maps and stereo cube maps. There are a few different types of these, but 3D rendering packages like Autodesk 3ds Max and Chaos Group’s V-Ray easily produce these as render options. You can also render Revit BIM models via Autodesk 360 cloud rendering. This produces a stereo cube map. These images are saved as JPEG files, and you access them and load them into Visual Vocal in the cloud.

05 – To get into a Visual Vocal collaboration session place your smartphone near where it can hear the sonic pairing tone.

06 – This is what the UI looks like when it is looking for the Chirp-based sonic signal.

07 – Once inside the session, the basic user interface provides you simple options for your session.

Rendering quality is not something limited by Visual Vocal. This app works differently than other VR solutions we’ve discussed here on Architosh. You can render a lower-quality set of images, or a V-Ray Next render at high-quality. It depends on the user’s needs and desires.

Another method applies to finished buildings and existing environments, completed or under construction. Using your smartphone again, simply produce a spherical image using something like Google’s free Cardboard Camera app. This will create the correct files for Visual Vocal. For higher quality and perfect spherical panos, Google and Lenovo make cameras for taking them on tripods—good technology worth investing in for AEC pros who are working in the field.

08 – Inside the cloud-environment on Visual Vocal where the user sets up Vv’s for collaborative sessions.

Inside the cloud-based app, users can upload their images and create hot spots for teleporting from one image to another. With a series of images taken from different vantage points of the building, Visual Vocal attendees can experience a virtual attendance or site visit, teleporting around the building by simply aiming the Cardboard device at a hot spot in a scene. There is no need to click anything as just keeping the target in the hot teleport square will take you there (think mouse-over effects in web browsers).

Looking Down the Road

Visual Vocal is still benefitting from their work with NBBJ. The tower project shown in the Architosh demo was an NBBJ project. “They are working with a very expensive acoustical diffuser material for the atrium,” he says, “and the way the content is generated they push out from Revit into 3ds Max, add cinematic quality lighting, and then bake a rendering from this perspective without the screen, then one with the screen.” Using the hot spot technology you can quickly be in optioneering capacity into the Visual Vocal experience. “Just moving the Cardboard with your head you can experience the two different variations,” he says.

I’m a huge fan of asynchronous communication because often times on large projects it is death by meetings that take a long time to coordinate and schedule.

SanGiovanni says firms like NBBJ, HOK, Perkins + Will and others are using Visual Vocal for architectural optionneering while other industries and construction companies are using it for safety and training.

As the technologies behind general 3D rendering get better? The imagery will get better and faster. The cloud continues to get faster but more importantly, this app is built for cellular connections, and the near future is 5G. It means Visual Vocal will gain even more power and speed on faster networks, empowering AEC workers across buildings sites, offices and FM (facility management) locations. The ability to attach data to hot spots means facility managers can provide useful data for maintenance of building equipment as well as X-ray vision vis-a-vis the ability to overlay BIM 3D data over final imagery data of actual buildings.

09 – A detailed look at the user interface and visual quality of imagery of what is possible in Visual Vocal.

“Some firms are very sophisticated already with AR and VR,” he says. “So those users want to dig in and hack around a bit. So we just launched what we call Visual Vocal for Unity. This is a legit architectural VR SDK, which means someone can build their own VR/AR app that is branded around their firm and can reflect their visual language, branding, et cetera but has all the Visual Vocal features.”

The company will charge those users a per-app yearly fee. For those users using the regular Visual Vocal app, they charge per user per year, just like a typical SaaS tool. However, they do not charge anyone to join a Visual Vocal meeting session—only those who are creating and hosting them. This makes Visual Vocal an inclusive tool that anyone with a smartphone can enjoy.

Today AR and VR in AEC are being used in disciplinary silos—architects using them for their work with their clients or internally for design review. But Visual Vocal’s powerful real-time, up to 20 users capacity, using popular smartphones, means AR and VR is far more democratized. There is no need to pass a VR headset around, taking turns looking at a scene. Moreover, its asynchronous messaging technology with voice messages means AEC stakeholders can weave in and out of collaboration AR/VR sessions on their schedules—their time, offering powerful flexibility. It’s this combination of features that lends Visual Vocal the disruptor and change-factor status it deserves.