Here at Expero, we have an expert User Experience design team which builds beautiful user interfaces for projects of various scope like designing user interfaces for visualizing large data sets. Our folks make state-of-the-art tech easy to use, slick and modern. I, on the other hand, have absolutely no eye for building out slippy, cool looking web stuff. Give me a text entry box and a cup of coffee; I’m good.
But what if I could write a system which would draw some type of UI design automatically for me? Not the level of complexity our UX folks would typically tackle mind you, but a UI for a simple website. Enter generative AI. Let’s see if we can build a bot that rivals the Expero design experts!
To train such a system, we’ll need website examples. No problem, we’ll just scrape a bunch of images from Awwwards.com. Easy. Next, we need to build a machine learning algorithm which learns about images and generates new, unique examples. This problem, though difficult, is totally tractable. This is the realm of the deep convolutional generative adversarial network (DCGAN).
Generative adversarial networks (GAN), as described by Goodfellow, et al. (2014), are a pair of learning algorithms squished back to back and trained against one another. The first learner, say, a neural network, is called the generator. Its job is to produce output which resembles real data. The second learner, maybe another neural network, is called the discriminator. The discriminator’s job is picking out fake examples built by the generator when shown a random sampling of real data mixed with the fake (generated) data.
As learning progresses, the generator gets better and better at fooling the discriminator by generating ever more realistic looking output, and the discriminator becomes ever more sensitive to picking out fake data samples. We terminate training when the system reaches some stability threshold (a good-enough stopping point). Ideally, the training should stop when the discriminator is 50% accurate, the same as a random guess.
Okay, that’s a GAN, but we still need to add the “DC” part to make the DCGAN. Remember, we’re playing with images of websites here. Because we want to learn about images, we need a GAN which is good at learning about 2D arrays. Since convolutional neural networks (CNN) are exceptional learners with respect to image data, let’s define both our generator and our discriminator as CNNs. And because CNNs learn more accuracy when they contain many layers of neurons, we’ll make our CNNs deeply-layered.
Hence, deep (many layers) convolutional (image related) generative (creates data) adversarial (opposition learning) network. After the system is trained, we simply chop off the discriminator half of the network. We’re then left with only a generator, which will produce realistic looking images for us.
No, but it did create some neat looking websites.
Above is a selection of network input images. Below is a selection of output images. Given images like the ones above, the system learned how to paint out the images you see below.
The system outputs images which sort of look like websites, but they aren’t anything I’d use as a template. The system needs more training (see the appendix for details) to be able to create perfect websites. It’s fascinating, though, that it learned how to write words, and was able to distinguish the need for white-space-partitioned areas.
Yep. Generative AI can literally write you a symphony. And it can do a bunch of other really cool stuff, too. Generative machine learning is one of the most captivating disciplines under active research. Currently, Expero is producing systems which write music, automatically produce noise-free versions of noisy data, and speak in natural human language. The eventual scope of generative AI is unbounded, meaning with enough compute, these systems will eventually be able to create anything.
I’ll leave you with an exercise for the reader: Build a bot which writes an actual website. Like actually writes out the HTML code. Here’s a hint: Swap the CNNs for LSTMs in the generative adversarial network and train on source code from real websites.
Thanks for tuning in. Remember Skynet; always code a kill switch...
For those hungry for details, here’s the loss summary over a few thousand training epochs using an Adam optimization algorithm:
This network was trained on a heavily feature selected data set starting at 998 images, pared down according to image entropy (to increase white space and cleanliness on output). Since the entropy filter built out a data set of only 336 images, I built the training set back up by duplication and translation. The generator network used standard ReLU activation functions, but the discriminator used Leaky-ReLUs. Here’s the graph with more architectural detail in the gradient calculation, generator network, and discriminator network:
This network is fully capable of accomplishing the task we asked of it. Because of time constraints, I had to shrink the input images to 104x74 pixels. Also, as you can see in the loss summary, I early-stopped training before global convergence. Just a couple hundred training epochs at 4,000 input images took 12 hours on one NVIDIA Tesla K80.
Yes, the system is overfitting right now. That’s because the training data didn’t span the input space. Increasing the variety of training examples, and letting it eat for a few days on a P100 cluster would produce stellar output. Output worthy of web design? You bet.