Aaron Gadberry

Help – v. helped, help·ing, helps

Was this site helpful?
My Amazon.com Wishlist

Pop up Photographs (Automatic 2D -> 3D Scene Creation)

26th January 2006 - By Paradochs

Pop up Photographs (P.u.P.)
Travis Gadberry
Christopher Skelton
CPSC 641(485) – Dr. John Keyser
December 13, 2005


Our original plan had been to implement our own version of the SIGGRAPH paper “Automatic Photo Pop-up”, from the 2005 SIGGRAPH proceedings. We researched ourselves into a corner, realizing that what we had taken on a project that was going to take many, many more months to complete than the time we had to work on this project. We decided, then, to plan out and research what we would have done if we had more time to work on this. This paper explains the approach that we took in an attempt to implement the paper. Even the simplified version that we had planned on still turned out to be too much to tackle in one semester.


The user loads the image by passing the image filename as a parameter to the executable file. The program then reads the file and sets the height and width accordingly. The image is then broken into regions with region codes. Each region makes up one polygon of the final scene. Once all the regions are created and coded, the region map is created. This is an image with the same dimensions as the original and contains the regions scan converted using a parallel view. Each region code has a different color, 0 = red, 1 = blue, and 2 = green. Then the regions are given world points based on the pixels vertically below it in the region map. Once all the regions are converted to world space, they are then drawn in the final scene and texture mapped. The user can then use the fly-through ability to view the scene from any angle.

Loading an Image

Since we chose OpenGL to develop this project, loading a picture became an issue. The way we chose to load the image for processing and display was to load it as a texture and paste it onto a quad. Since most picture are 3×5 or 4×6, OpenGL wouldn’t like that since it only supports square textures. So the problem was finding a way to load these pictures. You can “pad” a picture by adding extra information to make a square, but we chose not to do that, and try to find an image loading library.

There is an open source image loading library called DevIL(DEVelopers Image Library) that supports non-square image loading for textures in a wide variety of image formats. This seemed to be the golden ticket. Not only did we take care of the non-square problem, but now we didn’t have to worry about writing our own code to load a specific file format. There were a few problems (mainly with VS .NET not knowing where to look for the file at compilation/linking time), but in the end it was up and running.

Happy with DevIL, but still curious to see what else was out there, we kept digging around. We discovered that OpenGL had extensions of its own that allows for textures with non-power of two dimensions, provided your video card is recent enough to support it. They allow you to use your OpenGL code exactly like you would use it for square textures, but you specify non square dimension when creating your texture (in glTexImage2D). The only problem with this was that we would have to write our own code to load an image. This was easy enough to do if you wanted to load 24 bit bitmaps, since the code was pretty straightforward, but bitmaps take up a lot of space, and we had hoped to use jpgs or some other compressed format, which is why DevIL is so wonderful, because it supports those.

Creating Regions Manually

The most effective way to manually input the regions was to use a mouse capture function. Using OpenGL, we displayed the picture in a parallel view to the screen. We were able to save the points the mouse moved over while being clicked using built in functions of OpenGL. Every time another region is input using the mouse, the user is prompted to input the code for that region. A code of zero means that the region is flat, like the ground in the front of a picture. A code of one means that the region should be displayed vertically and is usually indicative of buildings or walls. A code of two means that it is above the code one regions, whether it be a ceiling or the sky.

Creating Regions Automatically

In the SIGGRAPH paper, they used an algorithm to detect “regions” that would be folded to make the 3d image. Over several passes, they would create “super pixels” that were collections of similarly colored pixels. These “super pixels” were then grouped into what they called constellations. A statistics algorithm was then run on the constellations to find the regions to fold.

We had designed an algorithm that was significantly less complex, using basic edge detection algorithms, which will be explained in brief below.

Edge detection

Originally, we had planned for automation across all aspects of our pop up process. In order to do so, we needed to be able to detect edges that we wanted to “fold” our image. There are various edge detection algorithms out there, such as detection of sudden color differences, contrast differences, or a blending of both.

We never could decide on what detection would be best due to the differences in finding edges within a single image. The edges to detect would have to be between ground-wall, wall-sky, and wall-wall. The first two would be relatively simple to implement with the same edge detection technique, but the wall-wall edge would be different. The reason behind this is that the colors on each side of the edge are almost exactly the same. This would make it very difficult to implement an effective technique just by doing edge detection.

The best option using edge detection was to use edges in conjunction with colors. To do this we would implement an edge detection algorithm and find the major edges in the result image. Major edges are the edges in an image that are the most likely to be the edges that are folded on. We would then use the color information to find the ground-wall edges and the wall-sky edges. We would then search the vertical space between those edges for the wall-wall edges in a separate routine. This is the only way that we could think of to possibly use edges in determining where to fold. However, we are not positive that it would work. We did not run any tests or simulations on whether or not it works on all images.

Region Creation

To create the regions based on these techniques is relatively simple compared to finding where to create the regions. Once the folding edges are found, each region is defined simply by connecting the edges together and getting the perimeter of the polygon surrounded. The next step is to set the region code of each region. The first step is to set the lowest polygon to be code zero, because one of the requirements of the input image is that there is ground in the foreground. The part is assigning the other region codes. For each region, starting with the lowest and ending with the highest, a code is set based on the codes below it. If there are only regions with code zero below the current region, then it is assigned a code of one. If there are regions with code zero and regions with code one directly below the current region, then it is assigned a code of one. If there are only regions with code one directly below the current region then it is a code two. These codes are the only way that we can determine which regions are ground, wall, and sky.

Region Map

The region map is a necessity due to the way we decided to display the regions. We researched different possible options of how to handle the display. We looked into the possibility of using a simple conversion from texture coordinates and region codes into world space, but could not find a suitable function to do the calculations correctly. We then tried to use OpenGL functions to translate and rotate. We had difficulties figuring out how to extract the axis of rotation from the vertices of a region. There were other complications also. We finally settled on the concept of the region map. The region map is an image with the same dimensions as the original image. We render the regions into a buffer using a parallel view to maintain the proportions. We set the color of each region using the associated region code. The region map is simply a scan conversion of the polygons so that the interior points can also have the region codes associated with them.

Converting Regions to World Space

The regions at this point are still in texture space, and we had to come up with a way to convert them to world space. We used the region map to do this. We converted each point on the perimeters of the regions one at a time. To convert a single point we used the vertical line under the point in the region map. We counted the total number of pixels that were contained in different region codes. We came up with a total number for region codes zero and one. Once those totals were calculated we used formulas to change those into world space coordinates. The number of pixels with region code zero is the rough amount that the point is moved down the Z axis. The number of pixels with region code one is the rough amount that the point is moved up the Y axis. We ignored region code two because it should never happen and inevitably the scene will be wrong anyway. Once this conversion process is done for every point on ever regions’ perimeter, they are ready to be displayed.

The following code is the pseudo code to convert the texture coordinates into world space coordinates. For each region code, there are different x, y, and z values calculated.

For each region { for each point on region j { if region is code 0 { x = 7*(u *2.0 -1); y = -4; z = 7*(( v *2.0) -1)-7; } if region is code 1 { ZeroCount = countBelow(u, v, 0); OneCount = countBelow(u, v, 1); x = 7*(( ZeroCount/ImageH *2.0) -1); y = -4+7*(( OneCount /ImageH *2.0) -1)-7; if(OneCount == 0) y = -4; z = 7*(( v *2.0) -1)-7; } if region is code 2 { SKY processing } } }

Displaying the Scene

We used OpenGL for all the display purposes. To display the scene we simply used a vertex array and used the perimeters of the regions. The regions are initially stored using texture coordinates, so texture mapping is not difficult either. We simply used the converted data as the world space coordinates and the original as the texture coordinates. We used preexisting code to implement the fly through, which we had written as part of a homework assignment before.

Sky Processing

There is a lot of room to develop this area of the project. The paper we discussed eliminated the sky processing and did not include it in the display. We worked on possible ways to handle the sky, including eliminating it. We came up with three other ways to produce a reasonable representation of the sky. The first was the easiest of the three, and simply forces the program to display the polygons created by the regions. This outcome would might look odd because it would seem like the sky is broken and only in certain places around the top of the building. The second was a little bit more in depth. The program would average the values of all of the pixels in code two regions. It would then create a polygon that connects to the tops of all of the buildings and color it with the average. Finding the tops of all of the buildings wouldn’t be difficult. They could be found by saving the highest points of all of the code one regions. The third option that we came up with was to use a program that we read about in our research. The program takes a sample image and generates a texture which is essentially tiled with the image. The program does not simply tile the sample over and over, but generates an extrapolation of what a continued picture would look like. This would give a much better sky and would be able to create clouds, birds, and other such objects that might be in the sky. We would then use that image to texture the polygon that spans the entire top of the scene. The sky would have to be optional to include because users might not want to have it there, especially if the modifications are made to turn multiple images into a model (see future work).

Future Research

Generate a fly-through of an area and building. It can be used for professional use in portfolios of architects and designers, as well as personal use in showing off vacation pictures or homes. It adds more aesthetic value to simple photographs by providing a third dimension.

More technical possibilities are possible, such as extrapolating data about a building from a photo. Given enough photos, it would be relatively simple to create a full model of a building and the surrounding area. The same techniques could be used to recreate towns with groups of buildings.

This technique could also be modified to handle two or more images and create something like a layered depth image. Multiple objects at each point would make many things possible. A user could load multiple images and the program could automatically create the scenes, find similarities between them, and combine them into a single scene. The ability to create an entire texture mapped model by simply loading images of the building into the program would have many possible advantages. Architecture planning and displays would have the most obvious benefit. Other possibilities include personal use and simple model creation for use in larger scenes.


We took on this project with a simple concept in our heads. We assumed that we knew everything that we needed to be able to complete this project in its entirety. We began designing the program based on what we knew, but we soon discovered that our approach would not even come close to satisfying the requirements. We began researching the concepts that we discovered we would need to implement. It soon became clear that we would not be able to code this in the time we had. There is an amazing amount of coding involved in the creation of this program, and we did not have the resources or the time to complete it. At that point we decided to not attempt the coding at all, because we would not be able to complete anything that would have visible results. Instead we decided to research and learn about everything that we would need to know to code the project. This project is a prime example of not judging a book by its cover. The output is deceptively simple, covering a large amount of brilliant work on their part.

Leave a Reply

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>