Building Photosynthernet (Part 1)

What is it?

Photosynthernet is an image hosting application that allows users to upload, organize, search, and share images with a central focus on their color palette. Photosynthernet takes any uploaded image, extracts the dominant colors, runs some corrective algorithms, and returns a palette of 6-10 colors.

How did it start?

The idea for photosynthernet originated while randomly thinking up domain names. You know, back when all the wacky TLDs started gaining traction and people were claiming domains such as “myname.rocks” and “cool.io”. This spurred an unprompted and quite silly burst of creativity and I thought of the domain photosynther.net. Ehhh? …..

Well, I thought it was clever

At any rate, I held onto this domain for quite some time, and even used it to host various personal projects that had nothing to do with the photosynthernet application of today! Now, the origin of the domain and the technology behind photosynthernet are two different stories. The core application started as a messy 100 line python script I put together one night at a friend’s house about 4-5 years ago. All it did was loop over all pixels in an image, find the top 10 most common ones, and then returned them.

In smaller images this surprisingly worked within a somewhat believable degree of accuracy. So that night, I signed off, content with my program working on 100×100 images of landscapes.

Over the next couple years image manipulation, and specifically color extraction, came up more and more often in my day job; and by that I mean a couple more times. One of the most notable, and important to the photosynthernet algorithm, was being tasked with automatic logo detection in product images…

Logo Detection

I promise this all ties into photosynthernet eventually.

Logo detection was initially a fairly simple problem. Look in the top left of the image, count the number of each color in that area, and compare it against the source logo image; if the counts were the same within a degree of error, then we found the logo and flagged the image as such. Again, this worked faily well for images that were smaller, there wasn’t a background to the image that shared colors with the logo, and where the logo was always in the same area, meaning the degree for acceptable error was higher.

This algorithm, in hindsight unsurprisingly, did not perform well against real images.

There were two main problems, the logo wasnt always in the same spot and there was too much noise when it was. To address a dynamically placed logo I decided to move my analysis to the entire image. But this meant that the previous algorithm, which relied on counting the number of colored pixels that matched the logo, would not be work since it would introduce even MORE noise.

The first idea I had to cut out the noise would be to take the ratio of the logo colors to the image colors. This meant if the logo was cyan, and the rest of the image was barely any cyan, the ratio of cyan to not-cyan in an image with a logo would be, approximately, 1/200; and the ratio in any images without a logo would be 1/10000. This was fairly solid in theory and I knew it would only have one downside, when the image itself had an abundance of cyan. We will discuss this issue a little later.

The first issue I came across, since nothing in my test set appared to be cyan, was that due to image compression artifacts the cyan in the logo was not always the same cyan represented in the compressed image. This gave me the first hint towards something that would become an integral part of the photosynthernet algorithm : finding a color’s nearest neighbors. This means programatically determining the shades and tints near a color in rgb format. For logo detection I could get away with something quick and dirty since I only had a single logo to check for; so I wrote a script that took a few hand picked colors from the logo, and then built an array of colors by incrementing and decrementing on the original values. It has been a while since then, a few years at least, so I may be misremembering, but I believe it was done by bit shifting the hex value and then converting hex to rgb at a later time.

For those more interested in programatically changing the tints and shades of a color, please take a look at this exceptionally well crafted stack overflow answer – https://stackoverflow.com/a/13542669/11579046

Great, now I had a list of 50 or so rgb values that I can start scanning the entire image for. However this approach had an additional problem I did not foresee: performance. On high resolution images this took AGES, and I had probably close to a million images to churn through. But I had a guardian angel in the form of designer standard and consistency. For almost all images, the logo used was the exact same, and it was placed in either one of the 4 corners of the image.

This meant I could eliminate scanning ~90% of the image and if I stepped through the image in “tiles”, I could rely more on my ratio algorithm from earlier since I have also eliminated a lot of noise. The main takeaway from this issue, that lives on in photosynthernet today, is stepping through and analyzing images in sections, or “tiles”.

What this meant for the logo detection algorithm was I could distill the original logo into a multidimensional vector of color ratios that could then be compared against the target image. For example the logo, programatically, could look like the below if divided into tiles of 50×50 pixels:

color = {
    tile_0 : {cyan_ratio : 0.5, grey_ratio : 0.2}, // These vectors can be normalized to unit vectors in order to form a one dimensional vector which we can easily use in our euclidean distance calculations below
    tile_1 : {cyan_ratio : 0.6, grey_ratio : 0.3},
    tile_2 : {cyan_ratio : 0.3, grey_ratio : 0.1},
    tile_3 : {cyan_ratio : 0.2, grey_ratio : 0.0},
};

Now we can iterate over the target image, and use euclidean distance to determine if the tiles we are finding are close to the tiles in the logo. This allows us to cut through noise in a performant manner since we only need to do the distance calculations on a vector with a length of 10, instead of possibly th

e thousands of pixels that make up that vector.

In Conclusion

What I learned from the logo detection exercise was that you cant operate on exact hex values with images, and how to conceptualize images as euclidean vectors. These two ideas form the basis for how the photosynthernet palette generation algorithm works. In PART 2 I’ll go over this in more detail and cover how it all ties into photosynthernet directly, instead of forcing you to read a trip down my memory lane.

本博客采用知识共享署名-非商业性使用-相同方式共享 4.0 国际许可协议(CC BY-NC-SA 4.0) 发布.

Prev Home Next