Introduction

Amongst the former members of the Beam Research Team, we held a competition to investigate generative AI for making images. When discussing the criteria we agreed that realistic outdoor scenes would be more challenging than indoor scenes and that it would be difficult to achieve spatial consistency between images.

The Challenge

Create photo-realistic images of an outdoor scene, you must generate 2 views of the same scene.

Rules

  1. Images must be AI generated, use any generator you like
  2. You must submit your prompt and which generator was used
  3. Prompts must be text; no images or other data; you cannot instruct a generator to download an image in your prompt
  4. You may submit up to 5 entries, 2+ views of each scene

It is allowed to chain generators together and use images in intermediate stages; e.g. generate an image with text prompt and pass that output image to another generator; as long as you don't input anything other than text. It is also allowed to provide additional text prompt to the next generator.

Results

We held a vote to decide whose images were the best, with the restriction that you could not vote for your own images.

Winner

Mountain biking

Prompt: Generate an image of a downhill mountain bike race in the french Alps. The image should be photorealistic, containing one cyclist descending a steep, rocky track. There should be clear foreground/background separation in the image. The lighting should be flat, as if the weather is overcast. There should not be any lens distortion effects in the image.

Mountain biking

Prompt: Generate a second view of this scene, taken from approximately 30cm to the right. Scene geometry must be consistent between the two images. Maintain camera parameters from the input image.

Generator: Gemini

Entrant: Ben Leslie

Runner Up

Taj Mahal

Prompt: Can you create a photo realistic stereo pair image of the Taj Mahal, with a long baseline and do not include the camera in the image

Generator: Gemini

Entrant: Lyndon Hill

Entries

Arc de Triomphe

Prompt:

Generator: Gemini

Entrant: Lyndon Hill


Triathlon Swimming

Prompt: A photo realistic scene of an open-water swimming race (like a triathlon). The scene shows 5 swimmers and 2 supporting canoeists paddling alongside the swimmers. The water is quite rough and choppy but you can still see the people clearly. The camera is observing the scene from the top, a bird’s eye view of what you could get with a drone.

Triathlon Swimming

Prompt: Please see Eduardo's blog post on GenAI Multiple View Consistency; in summary use a ChatGPT description of the image on the left then ask Gemini to generate another viewpoint.

Generator: Gemini

Entrant: Eduardo Ruiz-Libreros


Eiffel Tower

Prompt: Create an image of two photos of the eiffel tower on the same day. One should be from the front (north), one from the side (east). The first image should show the whole tower. The second should show a closer view. The two images should be laid out side-by-side.

Generator: Gemini

Entrant: James Ross


Arc de Triomphe

Prompt:

Generator: Gemini

Entrant: Lyndon Hill


Arc de Triomphe

Prompt: A stereo photograph of the Taj Mahal; using the left-right image style of the reference image. The left and right views should have a wide baseline. Make any clouds consistent between the two views. + using this reference image

Generator: Firefly

Entrant: Lyndon Hill


Mexican Market
Mexican Market

Prompt: Create a photo-realistic photograph of a bustling outdoor Mexican flea market ("tianguis") during the Day of the Dead celebrations in late afternoon golden light. Captured with a stereo camera 50mm lenses, rich depth of field. I would like left and right images to be joined. Right camera coordinate system is perfectly horizontally aligned with left one with a separation of 30 cm to the right.

Generator: Gemini

Entrant: Abel Pacheco-Ortega


Oak Tree

Prompt: Generate a photo-realistic scene of an oak tree, in the countryside of Mexico. The tree is the only one in a grass field. I need two images from the same scene taken from different perspective exactly at the same time (no that different perspective). Give me an image that contains both perspectives as if they were a stereo pair

Generator: ChatGPT

Entrant: Abel Pacheco-Ortega


Mexican Market

Prompt: Create a photo-realistic photograph of a bustling outdoor Mexican flea market ("tianguis") during the Day of the Dead celebrations in late afternoon golden light. Captured with a stereo camera 50mm lenses, rich depth of field. I would like left and right images to be joined. Right camera coordinate system is perfectly horizontally aligned with left one with a separation of 30 cm to the right.

Generator: ChatGPT

Entrant: Abel Pacheco-Ortega


Swiss Lake

Prompt: Create a photo realistic image of a chess board set up ready to play on a small round red table on the end of a small pier by a lake in Switzerland. The water is calm and the sky has no clouds.

Swiss Lake

Prompt:

  • Now create a view from the opposite side, there is a small village of mostly wooden buildings in the background. One of the buildings has a clock.
  • That is not the opposite direction, the view should be behind the black chess pieces and the direction of the view should be back to shore; also the pier should connect to the shore.
  • OK, can we keep the same image but change the actual chessboard to the same style as the first image where it was made from wood
  • Can you keep the scene the same but make the board a little smaller so that it doesn't overhang the table; also remove the letters around the edge of the board and make the board so that it doesn't have a fold line down the middle.

Generator: Gemini

Entrant: Lyndon Hill