No Landlords. No Masters.

If you’re an engineer, or have ever been an engineer and have since moved on to management or some other dreadful thing, then I’m sure you’ve seen documentation live and die in the trenches of confluence, notion, or a company wiki. A lone reference doc describing some then-novel esoteric service that was developed and then quickly abandoned; left to drive 8% of your business through a chunk of code only Tyler understands, if only he were still with us…

Everybody understands that documentation is important. It ensures asynchronous knowledge transfer, enshrines technical decisions, reduces time to error resolution, and serves to make your application more modular and maintainable. These all combine to make sure your business can pivot early and when necessary, allows your business to maintain first to market competitive advantages, and enables your engineering team to keep users and customers happy with quick (and correct) responses. However, I’m not here to convince you that documentation is important (If you don’t agree then this isn’t the post for you!); instead I’ll attempt to answer the question of how to make your documentation process a successful and integral piece of your business. (This post is primarily aimed at mid-market “scale-up” phase companies. I fully understand that the process and politics of new global initiatives vary wildly at larger scales; however some information may be beneficial for a team-wide process)


Documentation is a team effort

First thing’s first: Who owns this documentation initiative? Often one of the more frustrating aspects of software development is being pushed to tighter and tighter deadlines, while knowing full-well that writing good, coherent, documentation is time consuming. Developers aren’t all natural technical writers and thorough documentation will take some time to write, proof, edit, and rework. This leads us to the ages-old point of contention that engineers don’t get enough time to write documentation and managers don’t see the engineers they manage writing documentation so, in turn, don’t allot the time for it.

Is documentation a top-down or bottom-up initiative? Well, it’s both!

Me, just now

I like to call this approach to introducing concepts “sideloading”. It requires effort from all sides, just in different capacities. In my experience managers will not grant time for a process they don’t see value in, so this process starts with the top-down approach of ensuring that management understands the value gained by writing good documentation.

However, it does fall to the engineers to actually write that documentation, and my advice is: don’t ask for permission. I spent much of my early career asking my managers “Can we set aside some time to write documentation on this?”. This was almost always met with a response along the lines of: “Let’s get the project finished and if we have time afterwards we can write something in retrospect.”. However, “retrospect” never came as the next feature had a shorter and tighter deadline that came earlier and earlier. The mindset that helped me overcome this nagging desire to ask was to realize that documentation (along with other “administrative” pieces of software development like unit tests…) is just part of developing software. Documentation and testing is no different than writing loops or creating classes, it’s all part of the process to ship working code. And, nobody ever asks their boss if they have permission to write a loop or create a class!


Doesn’t matter, just write it down…

Now, assuming that management agrees documentation is valuable and the engineers are all writing rogue documentation, how can you ensure the documentation being written is complete and organized enough to be valuable for all those reasons I listed above? Well, there are one hundred different ways to organize and taxonomize your documentation, but to be frank nobody reading the documentation is going to care and they’ll almost always just use the search box, bypassing all of your folders, categorization, and tagging. The single most important thing is to just get your thoughts out of your head, off slack, out of meetings, and onto “paper”.

Before anyone thinks of bucketing, tagging paradigms, completeness metrics, hiring technical writers, or organization; make sure you have a culture of just getting your thoughts to some public repository. The one caveat to this approach is you should ensure that all documentation is written in the same singular location. This improves findability and also lowers the barrier to entry for someone who wants to contribute to this documentation for the first time, (avoiding common thoughts such as “Well we put design docs here, and docs about business processes here, and docs about our apis in one place, but our nodejs lives here for such and such reason).

Form matters much less than function at this phase. If you are having trouble finding the time to write properly edited and proofed documentation it is perfectly ok to ping someone on slack and ask “Hey, how does module x and y work?”, then copy/pasting their entire response into confluence, hitting “publish” and calling it a day. This is ugly, but infinitely better than nothing.

Lastly, if you have an existing, yet fragmented, knowledge repository I have found it is beneficial to just take the time, crack a few sodas, and copy/paste the writings from the old wiki to the new location. It took me a couple weeks of like 30-60 mins each day to do something similar before, and it is one of the more critical requirements to ensuring your knowledge repository is actually used.

Have some kind of framework or process

Once you have the culture of just writing things down, it is somewhat useful to categorize your documentation in a way that encourages exploratory learning; which is increasingly more valuable as your company expands and hires more. The important thing here is not to bikeshed over solutions and worry about ifs and buts, just find some taxonomy that works for you and document it so people know it exists. There are some formal types of documentation (Like Swagger, ideation/design frameworks, Architectural Decision Records, etc.) that won’t fit into your framework anyway so there is no point attempting to hamfist it all in.

As an addendum to the above: allow for linking out to specific documentation from within the central repository. It makes no sense to put diagrams in a wiki or confluence, they belong in lucidchart or archimate, just ensure that a path to access to those documents is available through the central documentation repo.

The primary focus here is to ensure your documentation is helpful so it will actually be read and utilized. People will be hard pressed to to take issue with a process they directly benefit from; so make sure it’s beneficial!

For managers: recognize contributors

As a manager you are the primary arbiter of feedback and recognition for your employees. If you recognize the value inherent in strong documentation, then I urge you to take the time to recognize your employees that are writing this documentation. To foster a culture of documentation: publicly uplift engineers whom make the effort to build technical writing into their development process. This means commending them in slack, on meetings, and when it comes time for performance reviews nothing speaks louder than financial compensation.

It is also your job to read the documentation that is relevant to you. The minute your lack of reading comprehension forces you to ask a question that was covered in a document an engineer JUST sent you, is the moment you trash months of effort trying to build a documentation-centric culture. You need to make writing documentation a positive part of the SDLC to guarantee it doesn’t become a thankless, bitter, task.

Gatekeepers need not apply

This is a reiteration of the first point phrased from an organizational level.

Do not gatekeep your documentation, above all else getting thoughts out of heads and onto paper is the most critical objective. If someone is assigned to oversee documentation ensure they are not overly critical of things like grammar, taxonomy, or organization: this can sap the will to live from anyone just trying to do a good thing. The less rewarding and engaging your process of documentation is the less likely people are to contribute.

contributor : Hey, I just wrote this cool overview on the new authentication system! It’s several thousand words and I made some diagrams to go along with it, let me know if I missed anything!

gatekeeper : You misspelled “authentication” on line 472, position 12. Fix it.

contributor : I mean, you have edit capabilities on the document, you could have just fixed it yourself. Is there anything else wrong with the content of the document?

// This shows a poor interaction with an individual that is too hyper focused on the presentation of a document over the content. This can discourage and demoralize otherwise enthusiastic contributors.

Lastly, make your documentation accessible in as many ways as possible. Before cordoning off a section of your organization from others, ask yourself: “Is it necessary that bob from accounting be restricted from viewing reference material about the banking service?”. It may make ontological sense to keep documentation on a need-to-know basis, but it can severely hinder cross-sectional communication. Remember: you want as much buy-in as possible on your process to ensure it doesn’t die coughing in a ditch.

Let’s recap
  • Just start writing. Engineers: don’t ask for permission. The more content that is written the more valuable the process becomes and the higher chance of success the initiative has.
  • Make sure your documentation is useful to newcomers and those otherwise unfamiliar with the documentation process by way of some organizational framework
  • Do not over-do it on the taxonomical requirements. Make it useful but no more.
  • Managers: Reward contributors. Publicly, hierarchically, and financially; put your money where your mouth is if you truly believe documentation is an important and valuable part of the SDLC
  • Make documentation fun to engage with. Discourage pedantry and overly critical gatekeepers.
  • Make your knowledge repository as accessible as feasibly possible.

Finally, in as big of letters as my blogging platform will allow me…

DO NOT ASK FOR PERMISSON!

Read More

Welcome to the special launch edition blog post! Now that Photosynthernet has been launched out to the public world via the url https://photosynther.net (as discussed in Part 1), I figured now was the best time to write the next installment in the “Building Photosynthernet” series. After reading, feel free to sign up on the live site and see the information presented here hard at work extracting color palettes from your images!

Quick Part 2 Retro

In Part 2 we discussed things like what Perceptual Color Space was and how Superpixels are determined. If you haven’t read it, I would recommend it as this part will present a deeper dive on those fundamental concepts introduced in Part 2.

Let’s talk Lab

No not this lab

In the context of colors and images, Lab is a color space that is designed to be perceptually uniform. That means, when you add or subtract from any of its values linearly, then the perceived change in the resultant color is also linear. But before we get too far lets get a mental image of a color represented in Lab color space.

First of all, Lab* (The asterisks are a common addition to prevent confusion with another color space designated “Hunter Lab”. I will be using the asterisks the remainder of this post), is an acronym composed of three parts that make up the whole, very similar to RGB. The first part, L*, makes up the perceptual lightness value of the color. This just tells us how light or dark it appears. The next part, a*, tells us the range from red to green; and then similar to a*, b* tells us how blue or yellow a color is.

Now, there is a very long history to LAB color space and it’s an accomplishment by the color industry that I don’t want to diminish by reducing to a single paragraph in this blog post. If you want to know more, please read the Wikipedia Article and work from there; if you’re a geek like me you’ll find it incredibly fascinating! However, for the sake of brevity, all you need to know for this blog post is that Lab color space is perceptually uniform, and device independent. (We’ll get into the device thing later). Lastly, here is a 3D representation of Lab color space.

If you remember from Part 2, we often perceive the range of blues to be more distinct than we do other colors. Which is why the coefficient for our blue value in sRGB Euclidean Distance calculations was 4, instead of 2 or 3. This fact is demonstrated in this visualization by the asymmetric grouping of “Blue” colors.

Better Superpixels

We aren’t done with Lab* color space just yet, as we will need that information later, but for now lets talk about improving our superpixel algorithm.

In the previous post we defined super pixels as

… Clusters of perceptually similar pixels that have been merged to form a single polygon that represents a single color

Part 2!

And these clusters were defined as “tiles” that I broke the image up into. However, this is not very effective as often a tile can break up parts of an image that “blend” together very naturally, shattering a natural “segment” of an image into an array of artificial squares. What I found was that this method of tiling often lead to creating artificially dominant color palettes. Where a single color may have been the global minority in an image, since it appeared as the dominant color in several superpixels, the initial algorithm thought it was more important than it really was.

Thankfully, people much smarter than I am, have developed better algorithms for determining superpixels, so I figured I should just use one of those. And again, similar to Lab* color space, the topic of superpixels has an incredible depth and is deserving of it’s own post. However, in the meantime, if you want to read up on some of the different superpixel algorithms and how they operate I would suggest starting here, where I did.

To put it tersely, after much experimentation, I settled on the SLIC (Simple Linear Iterative Clustering) Superpixel algorithm. This algorithm works in a very similar manner to K-Means Clustering, and by that I mean some centroids/clusters are chosen, and then in an iterative fashion, each centroid pulls other similar pixels near it into the cluster.

Original Image

After SLIC Tranformation

As you can see above, this algorithm does a fairly good job at determining where the “perceptually natural” clusters of each color are. However, as good as the algorithm is in pulling similar colors into clusters, it’s still too granular and produces way too many shades of each color to get a reduced color palette. Remember, Photosynthernet is looking to grab the distinct color palette of an image; which means highlights and lowlights. If we were just grabbing the dominant palette of the above image, we would get like 15 boring blues; so we need to reduce the blues and bring out the reds/beiges. To do so, we will be using a Region Adjacency Graph (RAG)…

The Graphening!

Source : http://melvincabatuan.github.io/RAG/

A Region Adjacency Graph is formed by drawing a line between each centroid, or center of a superpixel, of the above clustered image. Then each line is given a “weight” based on how similar that centroid is to the one it’s connected to. This graph doesn’t do anything for us on its own, other than look cool, but it does provide a framework for merging similar superpixels.

Once we have the RAG, we can then work on merging superpixels. How this is done is we first generate the RAG, assigning weights to the connections between each cluster (How this is done is covered in the next section). Then, we iterate over all the connections and compare the weight against an arbitrary threshold; if the weight is less than the threshold we consider these clusters to be the same color. Once that’s done we choose the dominant color to represent the new mega-cluster (usually just the cluster with the most surface area), then we delete the boundaries between the two clusters, turning it into one and assigning it a new centroid.

This process is done for all connections and done in cycles. Once we have a bunch of mega-clusters after a single run, we run it again, until no more clusters share weighted connections under the set threshold. When we’re done, we should get something resembling the SLIC clustered image above, but with larger and more clearly defined boundaries around each superpixel.

Original

SLIC Clustering

After RAG Merging

Calculating Color Distance

In the previous couple sections I’ve mentioned “similar colors” and “weighted connections. These are not determined through some magical unknown method, they are precise calculations done through a very researched and well-known magical method called CIEDE2000.

CIEDE2000 is a color distance formula for determining the perceptual distance between two colors in Lab color space. It takes things into consideration such as “Is this color viewed on textiles or on a screen? Is this color very light or very dark? Is it very blue? etc.” Think of CIEDE 2000 as the basic Euclidean Distance of Lab color space. Unfortunately for us, it is anything but simple. If you are really interested, I recommend starting at the wikipedia page for color difference formulas and working from there. I will only be covering it at a high level and sharing what helped me during the implementation phase.

CIEDE2000 for determining the distance between two colors on a screen or print media. Not applicable to textiles. Diagram made by me! 😀

There is a lot to cover here so I’ll just keep it to the highlights and describe each primary phrase of the above formula. The above diagram is structured with the full formula presented at the top, with each phrase below defining the components that make up this formula. At the bottom of the diagram is the two colors represented in Lab color space. (These are quickly converted to LCH* color space so we can utilize cylindrical coordinates).

  • a. This is the Lightness Phrase. It determines the difference in the lightness value of two colors. It’s the simplest of all phrases, only combining the squared distance between the L values with a single compensation constant.
  • b. This is the Chroma Phrase. Most of this phrase consists of just converting Lab to LCH color space.
  • c. This is the Hue Phrase. This phrase consists of a lot of geometry to determine how close two colors are on the LCH* color sphere.
  • d. This is the hue rotation phrase. This phrase provides the compensation that is still necessary to achieve color uniformity in our distance calculations.

This formula was essentially implemented part for part from the wikipedia article in our python palette extraction service to serve the needs of our superpixel and RAG merging stages. I did find that the python libraries I was using already had an implementation of CIEDE2000 included, but in the hue rotation phrase they were only rotating by 30 degrees instead of the 60 degrees outlined in the formal proof of this formula. I personally found better results by strictly following the 60 degrees coefficient instead of the 30, so I decided to just re-implement CIEDE2000 myself both to accomplish better results and as a learning exercise.

Lastly, Give Me That Palette!

So, finally, we have an image with better superpixels and we have developed a superior distance formula to determine how similar two colors are. At this stage, we can go back to the basics discussed in part 2.

First, just like before, we iterate over each superpixel; we then compare that superpixel with all the other superpixels, and if we find one that is below our new threshold for “uniqueness”, we “merge” them. And you might be asking, just as I did, why don’t we use the RAG merge for this and call it a day? Well, RAG is only for merging Region-Adjacent nodes, but an image may have two of the same clusters on opposite sides, leading to a scenario where a RAG would not be aware of them. So this final step is necessary to form a truly unique and accurate color palette.

Another difference here is we start with an incredibly high threshold for similarity. If we assume that red is the polar opposite of green with a distance value of 100, then we set our threshold to somewhere around 75. Then, we aim for a target of at least 4-6 unique colors, and if we find only 2 unique colors, we lower that threshold in steps of ~2.5 and re-run this final palette selection step until we get 4-6. This solves for a disparity we were seeing in our algorithm when selecting the palette from visually “busy” images and visually “minimalist” images. However, some images may only have two primary colors, and the palette extraction algorithm should reflect that reality. So, to prevent the algorithm from lowering the threshold to 0 and creating a palette of 15 of the same blue, we set a lower bound of about 35. If the algorithm only finds 2 primary colors and the threshold is already 35, then we just accept that this image has 2 primary colors and no more.

Wrapping Up!

Hopefully this blog post was as interesting to read as it was for me to write; but even more-so, I hope it was informative. This concludes the primary three-part journey of building photosynthernet; there will of course be more to come about photosynthernet such as how we deployed on kubernetes, how we implemented websockets, how we managed the project and more about the palette selection algorithm!

If you have any questions feel free to ask in the comments and I will try to improve this post with as many addendums as necessary to ensure it’s interesting and understandable.

Thanks for reading!

Read More

In the previous publication, I covered what lead to the idea of photosynthernet as well as explained some early research that led to building v1 of photosynthernet. So if you want some more background on what photosynthernet is and how we got here, be sure to read Part 1. If you’re just here for the cool parts, then read on!

Quick Part 1 Retro

In Part 1, we learned 2 significant things that would lead to a more robust and successful palette algorithm for photosynthernet. Firstly we discovered we could determine the difference between two colors by measuring the Euclidean distance between their sRGB values (this was used to determine if the “tiles” in the logo were similar to the tiles in the image). Second of all, we unintentionally discovered the concept of Super Pixels (In part 1, when we discussed “tiles”, these were really just basic superpixels). These two concepts are the foundation for the current palette generation algorithm used by photosynthernet.

What are Super Pixels?

Superpixels are clusters of perceptually similar pixels that have been merged to form a single polygon that represents a single color.

For example: Imagine you have three pixels that are all some shade of green which are represented by the Cartesian coordinates ([0, 1], green), ([0, 2], light-green), and ([0, 3], swamp-green). You could simply say that this image is just “green”, or you could say that this image contains a single Super Pixel that could be represented as all the values between [0, 1] … [0, 3] as being “green”.

The decision to merge two pixels is made via a distance and weighting equation compared against a maximum threshold. For now, think of it in terms of Euclidean distance; if the delta of two pixels is greater than, say, 20, then consider them to be the same color and merge it into a superpixel.

This is the three pixels mentioned above before merging them into a single superpixel
This would be the resultant superpixel of the previous image.
This is a real image that has been converted into super pixels using the SLIC algorithm

Implementing Rudimentary Super Pixels

In the above image, I show the results of a more intelligent superpixel clustering algorithm, which we will get to eventually… However, when I first started, I didn’t even know the phrase “Super Pixel” existed, so I stuck with my “tiles”, described in Part 1. I broke each image up into a grid of 50×50 pixel tiles. I iterated over each pixel within each tile and compared it against all the other pixels in that tile. If the Euclidean distance between two pixels was greater than approximately 120, I incremented the “score” of the original color by 1. This gave me a map of color -> score. I would then sort the array and pop the top item off. This was the color I could assume was the dominant color in the tile. Once I ran this on all tiles I usually had an array of 50-100 of what I called “Dominant Colors”, which was just my ignorant re-branding of “Super Pixels”.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
// Each tile is a key/value store of an sRGB color (as the key)
// and the count of all colors within a delta of 120 to that color (as the value)
tile_1 = {
'20,155,22' : 30,
'200,5,29' : 12,
'79,152,240' : 9
}

tile_2 = {
'56,172,46' : 60,
'200,5,29' : 2,
}

// The dominant colors array below is the result of taking the "top" item in each tile map for the entire image
dominant_colors = ['20,155,22', '56,172,46']

Now that I had an array of superpixels I could begin thinking about how to extract a palette from them. The result of the following brainstorming session was a list of criteria that I decided would create the most visually appealing and fun palette:

  1. The palette should contain only 6 colors.
    1. The point of photosynthernet is to search images by the color palette and discover similar photos. If each image had a palette of 100 unique colors then a search for “green” would have a very high likelihood of returning a slew of irrelevant results.
  2. The palette shouldn’t contain any pure-white or pure-black results.
    1. Even though this may not be very accurate, I thought it would make for more interesting palettes on images with lower color diversity (such as a night sky or bright skyline)

After the above criteria were selected I could get to work on creating an algorithm for eliminating some of the similarly colored tiles. This was done by running the same algorithm used to determine the dominant color of each tile against the tiles themselves. The only difference is this was a recursive and incremental implementation; meaning that if the results did not abide by the above selection criteria then the input variables would be tweaked and re-run. The most notable variable was the threshold to determine if two colors were different.

Essentially the logic was: if at the end of the palette selection algorithm there were not at least 6 unique colors for the palette, then I needed to lower the requirement for two colors to be considered “unique”. So with a threshold of 120, burnt orange and bright orange may be considered the same color. However, when we lower that threshold to 80, then they are seen as two distinct colors, opening us to have a palette full of all the shades of a beautiful sunset.

Finding the Delta

Above I discussed splitting the image into rudimentary superpixels, but what I didn’t touch on was the algorithm I used to determine the distance between two colors.

Initially I started with basic 3 dimensional euclidean distance. This can be expressed as :

Values :
c_1 = (r_1, g_1, b_1)
c_2 =  (r_2, g_2, b_2)

Euclidean Distance :
\Delta(c_1, c_2) =  \sqrt{(r_1 - r_2)^2 +  (g_1 - g_2)^2 +  (b_1 - b_2)^2  }

And this worked fairly well for a lot of images; in fact, all my test images performed incredibly well with basic euclidean distance. Because of this I stopped improving the palette selection algorithm and began working on the actual photosynthernet app. Which is a Laravel based web application that allows you to log in and upload images. However, this blog post isn’t about that for the time being (I’ll touch on it later I promise), so I’ll skip ahead to improving the palette algorithm and why I had to do so…

Perceptual Color Space

Once I had the app running in a capacity that I could share it with friends and get more real use out of the algorithm I began to see things happening with the palette selection algorithm that I didn’t like.

  • Images with low color diversity were lowering the threshold too much that the resultant palette was virtually 6 of the same color.
    • By this, I mean that if an image was actually tri-chromatic, then the algorithm borked itself trying to force 6 unique colors out of it.
  • Some colors were dominating others.
    • In some images where ~50% of the photo was one color, the other secondary, although significant, colors were not being selected for the palette, instead, they were being rolled into some of the other superpixels.
    • A notable example was blues and purples being merged.

So I began doing some research. I’ve seen palette selection algorithms work better than mine, so what were they doing that I wasn’t?

After a bit of looking, I came across the following article by CompuPhase, which had built a desktop program for selecting 256-bit color palettes from an image (Somewhat familiar, yeah?). https://www.compuphase.com/cmetric.htm .

The main takeaway from this article was learning the existence of “Perceptual Color Space”. This means that simply because we perceive a color to be very different from another (take blue and purple for example), doesn’t mean that they are as numerically distinct as we think, especially in sRGB color space. To get more technical, sRGB color space is non-linear, which means the difference between the red values of two colors does not directly correlate to the difference between the green values of two colors. Which means that our simple euclidean distance equation wasn’t going to cut it.

Weighted Euclidean Distance

The solution presented by CompuPhase to the above problem was a weighted euclidean distance. This meant that all I needed to do was multiply our r,g, and b values by certain coefficients, or “weights”, in order to get a more perceptually uniform color distance. These weights were determined by various experiments carried out by the CompuPhase team, so I would suggest reading the above article if you want to get into the nitty and gritty of it all.

However one thing I would like to restate in this post is that simple weighted euclidean distance wasn’t enough. CompuPhase found that the weights for the blue and green values were dependent on how much red was present in a color.

All it means to me is I had to tweak my distance algorithm to use the below equation to get much more consistent results. (Keep in mind for Part 3 that I’m still using my rudimentary “tile superpixels” here)

Red Mean :
\bar{r} = \frac{r_1 + r_2}{2}

Dynamic Weighted Euclidean Distance
\Delta{C} = \sqrt{(2 + \frac{\bar{r}}{256}) \cdot (r_1 - r_2)^2 + 4 \cdot  (g_1 - g_2)^2 + ( 2 + \frac{255 - \bar{r}}{256} ) \cdot  (b_1 - b_2)^2 }

In Conclusion

In this part we primarily talked about super pixels and perceptual color space. This post may have seemed a little dry and it seems like I didn’t accomplish much, and I didn’t, but learning this foundation was very important for me to create the next phase of the palette selection algorithm.

In Part 3 we will ditch the sRGB color space altogether, explore Adaptive Histogram Equalization, and implement a much more intelligent Super Pixel clustering algorithm. With these three things I was able to not only create a more accurate palette selection algorithm but in doing so I gained a much deeper understanding of color processing that gives me confidence in delivering something substantial via the Photosynthernet project.

Read More

What is it?

Photosynthernet is an image hosting application that allows users to upload, organize, search, and share images with a central focus on their color palette. Photosynthernet takes any uploaded image, extracts the dominant colors, runs some corrective algorithms, and returns a palette of 6-10 colors.

How did it start?

The idea for photosynthernet originated while randomly thinking up domain names. You know, back when all the wacky TLDs started gaining traction and people were claiming domains such as “myname.rocks” and “cool.io”. This spurred an unprompted and quite silly burst of creativity and I thought of the domain photosynther.net. Ehhh? …..

Well, I thought it was clever

At any rate, I held onto this domain for quite some time, and even used it to host various personal projects that had nothing to do with the photosynthernet application of today! Now, the origin of the domain and the technology behind photosynthernet are two different stories. The core application started as a messy 100 line python script I put together one night at a friend’s house about 4-5 years ago. All it did was loop over all pixels in an image, find the top 10 most common ones, and then returned them.

In smaller images this surprisingly worked within a somewhat believable degree of accuracy. So that night, I signed off, content with my program working on 100×100 images of landscapes.

Over the next couple years image manipulation, and specifically color extraction, came up more and more often in my day job; and by that I mean a couple more times. One of the most notable, and important to the photosynthernet algorithm, was being tasked with automatic logo detection in product images…

Logo Detection

I promise this all ties into photosynthernet eventually.

Logo detection was initially a fairly simple problem. Look in the top left of the image, count the number of each color in that area, and compare it against the source logo image; if the counts were the same within a degree of error, then we found the logo and flagged the image as such. Again, this worked faily well for images that were smaller, there wasn’t a background to the image that shared colors with the logo, and where the logo was always in the same area, meaning the degree for acceptable error was higher.

This algorithm, in hindsight unsurprisingly, did not perform well against real images.

There were two main problems, the logo wasnt always in the same spot and there was too much noise when it was. To address a dynamically placed logo I decided to move my analysis to the entire image. But this meant that the previous algorithm, which relied on counting the number of colored pixels that matched the logo, would not be work since it would introduce even MORE noise.

The first idea I had to cut out the noise would be to take the ratio of the logo colors to the image colors. This meant if the logo was cyan, and the rest of the image was barely any cyan, the ratio of cyan to not-cyan in an image with a logo would be, approximately, 1/200; and the ratio in any images without a logo would be 1/10000. This was fairly solid in theory and I knew it would only have one downside, when the image itself had an abundance of cyan. We will discuss this issue a little later.

The first issue I came across, since nothing in my test set appared to be cyan, was that due to image compression artifacts the cyan in the logo was not always the same cyan represented in the compressed image. This gave me the first hint towards something that would become an integral part of the photosynthernet algorithm : finding a color’s nearest neighbors. This means programatically determining the shades and tints near a color in rgb format. For logo detection I could get away with something quick and dirty since I only had a single logo to check for; so I wrote a script that took a few hand picked colors from the logo, and then built an array of colors by incrementing and decrementing on the original values. It has been a while since then, a few years at least, so I may be misremembering, but I believe it was done by bit shifting the hex value and then converting hex to rgb at a later time.

For those more interested in programatically changing the tints and shades of a color, please take a look at this exceptionally well crafted stack overflow answer – https://stackoverflow.com/a/13542669/11579046

Great, now I had a list of 50 or so rgb values that I can start scanning the entire image for. However this approach had an additional problem I did not foresee: performance. On high resolution images this took AGES, and I had probably close to a million images to churn through. But I had a guardian angel in the form of designer standard and consistency. For almost all images, the logo used was the exact same, and it was placed in either one of the 4 corners of the image.

This meant I could eliminate scanning ~90% of the image and if I stepped through the image in “tiles”, I could rely more on my ratio algorithm from earlier since I have also eliminated a lot of noise. The main takeaway from this issue, that lives on in photosynthernet today, is stepping through and analyzing images in sections, or “tiles”.

What this meant for the logo detection algorithm was I could distill the original logo into a multidimensional vector of color ratios that could then be compared against the target image. For example the logo, programatically, could look like the below if divided into tiles of 50×50 pixels:

1
2
3
4
5
6
color = {
tile_0 : {cyan_ratio : 0.5, grey_ratio : 0.2}, // These vectors can be normalized to unit vectors in order to form a one dimensional vector which we can easily use in our euclidean distance calculations below
tile_1 : {cyan_ratio : 0.6, grey_ratio : 0.3},
tile_2 : {cyan_ratio : 0.3, grey_ratio : 0.1},
tile_3 : {cyan_ratio : 0.2, grey_ratio : 0.0},
};

Now we can iterate over the target image, and use euclidean distance to determine if the tiles we are finding are close to the tiles in the logo. This allows us to cut through noise in a performant manner since we only need to do the distance calculations on a vector with a length of 10, instead of possibly th

e thousands of pixels that make up that vector.

In Conclusion

What I learned from the logo detection exercise was that you cant operate on exact hex values with images, and how to conceptualize images as euclidean vectors. These two ideas form the basis for how the photosynthernet palette generation algorithm works. In PART 2 I’ll go over this in more detail and cover how it all ties into photosynthernet directly, instead of forcing you to read a trip down my memory lane.

Read More

TL;DR; Use esptool.py to upload your firmware over USB since it will automatically put your board in flash mode.

Materials
– ESP8266 Board – https://www.amazon.com/gp/product/B010N1SPRK (HiLetgo ESP8266 NodeMCU CP2102 ESP-12E)
– VSCode (optional, but recommended)
– PlatformIO
– esptool.py
– MicroUSB Cable
– Wifi

Assumptions
– Your OS is Ubuntu 18.04

The ESP8266 NodeMCU board can be a little finnicky. The main gotcha being that the board needs to be put into “Flash Mode” in order to upload firmware to it. This is usually done automatically by the USB by pulling one of the gpio pins down (the one attached to the FLASH button on the board). But, for some reason platformio’s [upload] does not do this, so we have to use esptool.py. This blog post should act as a document to setting up these boards with as little headache as possible and as a bonus there will be some code at the end that enables Over the Air Updates for wifi-enabled boards.

First download VSCode with PlatformIO [here] and follow the instructions for enabling the extension.

Next install esptool.py by installing python and pip with sudo apt install python python-pip and then installing esptool.py with pip install esptool.py. esptool should then be available through the command line with esptool.py -h

Then create a new platformio project.

Open Platformio Home and then New Project!

Use these settings for the new project. “NodeMCU 1.0 (ESP-12E Module)” is the important bit.

If you choose not to use vscode with platformio, then your platformio.ini file should look like this

1
2
3
4
[env:nodemcuv2]
platform = espressif8266
board = nodemcuv2
framework = arduino

main.cpp is where the magic happens

Place this script in your main.cpp file for testing. It should cause the blue LED to blink

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
#include <Arduino.h>

void setup() {
pinMode(0, OUTPUT);
pinMode(2, OUTPUT);
}

void loop() {
Serial.begin(9600);
digitalWrite(2, HIGH);
Serial.println("Turning On");
delay(1000);
digitalWrite(2, LOW);
Serial.println("Turning Off");
delay(1000);
}

Next hit “Build” in your platformio project settings. This will create a *.bin file for you in your project directory at ~/Documents/PlatformIO/Projects/{project_name}/.pioenvs/nodemcuv2/firmware.bin. Now you can upload your scripts with esptool! Plug in your arduino board and then run the below command

1
esptool.py --port /dev/ttyUSB0 write_flash 0x00000000 ~/Documents/PlatformIO/Projects/nodemcutest/.pioenvs/nodemcuv2/firmware.bin

You should expect some output like the below…

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
esptool.py v2.5.1
Serial port /dev/ttyUSB0
Connecting....
Detecting chip type... ESP8266
Chip is ESP8266EX
Features: WiFi
MAC: 80:7d:3a:75:de:ef
Uploading stub...
Running stub...
Stub running...
Configuring flash size...
Auto-detected Flash size: 4MB
Compressed 252016 bytes to 183741...
Wrote 252016 bytes (183741 compressed) at 0x00000000 in 16.2 seconds (effective 124.1 kbit/s)...
Hash of data verified.

Leaving...
Hard resetting via RTS pin...

Now the board should have a flashing blue light! And, if you pop over to vscode and hit “Monitor” then you should see something like the following in your console

1
2
3
4
5
6
7
8
9
--- Miniterm on /dev/ttyUSB0  9600,8,N,1 ---
--- Quit: Ctrl+C | Menu: Ctrl+T | Help: Ctrl+T followed by Ctrl+H ---
Turning Off
Turning On
Turning Off
Turning On
Turning Off
Turning On
Turning Off

Now you know how flash new firmware to your board! Also, as promised, put the below code into main.cpp and upload it using esptool to enable OTA updates.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
#include <Arduino.h>
#include <ESP8266WiFi.h>
#include <ESP8266mDNS.h>
#include <WiFiUdp.h>
#include <ArduinoOTA.h>

// Change This!
const char* ssid = "SSID_changeme";
const char* password = "PASSWORD_changeme";

// Change this function to be whatever you want your board to do!
void run() {
// Turn Blue LED On
digitalWrite(2, HIGH);
Serial.println("Turning On"); // Prints to serial port
delay(1000);

// Turn Blue LED Off
digitalWrite(2, LOW);
Serial.println("Turning Off");
delay(1000);


Serial.print("Local IP Address: ");
Serial.println(WiFi.localIP());
}

void setup() {
// Set Pin 2 as Output so we can signal the LED
pinMode(2, OUTPUT);

// Begin Serial with 9600 baud rate
Serial.begin(9600);

WiFi.mode(WIFI_STA);
WiFi.begin(ssid, password);
while (WiFi.waitForConnectResult() != WL_CONNECTED) {
Serial.println("Connection Failed! Restarting Board...");
delay(5000);
ESP.restart();
}

// Port defaults to 8266
// ArduinoOTA.setPort(8266);

// Hostname defaults to esp8266-[ChipID]
// ArduinoOTA.setHostname("myesp8266");

// No authentication by default
// ArduinoOTA.setPassword((const char *)"123");

ArduinoOTA.onStart([]() {
Serial.println("Start");
});
ArduinoOTA.onEnd([]() {
Serial.println("\nEnd");
});
ArduinoOTA.onProgress([](unsigned int progress, unsigned int total) {
Serial.printf("Progress: %u%%\r", (progress / (total / 100)));
});
ArduinoOTA.onError([](ota_error_t error) {
Serial.printf("Error[%u]: ", error);
if (error == OTA_AUTH_ERROR) Serial.println("Auth Failed");
else if (error == OTA_BEGIN_ERROR) Serial.println("Begin Failed");
else if (error == OTA_CONNECT_ERROR) Serial.println("Connect Failed");
else if (error == OTA_RECEIVE_ERROR) Serial.println("Receive Failed");
else if (error == OTA_END_ERROR) Serial.println("End Failed");
});
ArduinoOTA.begin();
Serial.println("Ready");
Serial.print("Local IP Address: ");
Serial.println(WiFi.localIP());

}

void loop() {
// Will listen for OTA Updates and Handle them
ArduinoOTA.handle();

run();
}

Now you can use platformio to upload OTA (When uploading OTA you do not need esptool to put the system into FLASH mode. For some reason platformio just works now…). But in order to do so, you must change the upload_port inside your platformio.ini file to the local ip address of your board. You can get this ip address if you monitor your board after you upload the OTA code with esptool. You can also run nmap on your local network to discover your board ip. It should have a hostname similar to ESP_75AAAA.lan.

1
2
3
4
5
6
7
8
9
; PlatformIO Project Configuration File
;
; Build options: build flags, source filter
; Upload options: custom upload port, speed and extra flags
; Library options: dependencies, extra library storages
; Advanced options: extra scripting
;
; Please visit documentation for the other options and examples
; https://docs.platformio.org/page/projectconf.html

[env:nodemcuv2]

platform = espressif8266 board = nodemcuv2 framework = arduino upload_port = 192.168.15.123

With the above platformio.ini, you should be able to press [upload] in platformio and successfully upload new firmware! Now you can feel free to tape your esp8266 to the wall with a battery pack and still push updates to it. Thanks for reading!

Troubleshooting
Sometimes my boards will get stuck with a solid blue light if I do something like cancel an upload halfway through. I’ve had good success with “unsticking” these boards by holding the “FLASH” button for a second or two then just tapping the “RESET” button. (While the FLASH button is held down).

Read More

Intro

Oliver is a small track-driven robot controlled via a web application using MQTT to send instructions to an ESP8266 brain. Keep in mind that the ESP8266 runs on 3.3v logic, so the usual sensors you would use to do things like obstacle detection won’t work here. However, you’re more than welcome to use a 5v Arduino board and use something else for control instead of the ESP8266

Material Requirements

  • Tank Chassis – https://www.osepp.com/robotic-kits/4-tank-mechanical-kit I used a whole tank kit from OSEPP. The chassis is high quality extruded aluminum, but the tracks are made of a pretty cheap rubber that doesn’t seem to work well on high friction surfaces like carpet. But it also includes two motors that fit perfectly with the chassis to make for a pretty solid drivetrain system; so the tradeoff was worth it.
  • Motor Driver L298N – https://www.amazon.com/Qunqi-2Packs-Controller-Stepper-Arduino/dp/B01M29YK5U This is a pretty bulky breakout board, but it includes a heatsink. I tried a smaller chip but had issues with heat when the motors were under heavy load.
  • HiLetgo ESP8266 NodeMCU – https://www.amazon.com/HiLetgo-Internet-Development-Wireless-Micropython/dp/B010N1SPRK This is the main brain of the robot. The ESP8266 comes WiFi-Enabled by default and has a fairly large ecosystem around it, which made finding libraries for sending/receiving MQTT pretty easy! I used the HiLetgo brand of ESP8266, they are considerably cheaper than the adafruit alternative and work just as well in my experience. But everything I’m doing should be translatable to just about any arduino board, you just have to change the pinout references.
  • Breadboard – https://www.amazon.com/Antrader-Solderless-Breadboard-Tie-points-Prototyping/dp/B07G731PHQ I used a solderless breadboard since I prototype quickly and change components often. (Like adding/removing obstacle detection and NeoPixel Strips). But you are more than welcome to make your final product with a solderable protoboard if this isn’t your first time attempting project like this, if this is your first arduino project than I highly recommend the solderless breadboard
  • GPIO Jumpers – https://www.amazon.com/Kuman-Breadboard-Arduino-Raspberry-Multicolored/dp/B01BV3Z342 You can make your own or buy them preassembled. Buying them saves a lot of time
  • Battery – https://www.amazon.com/GOLDBAT-1500mAh-Softcase-Battery-Airplane/dp/B07LGYSGZH The battery is really up to you, you could use 6 AAs and get a good result (Although it would be wasteful as the AAs would be drained rather quickly). So any rechargeable battery should do so long as it is between 6-12v and can fit on your chassis.
  • Neopixels Strip (Optional) – https://www.adafruit.com/product/1426 I use the neopixels strip to show when Oliver has connected to the Access Point and the MQTT broker. It can also show cool effects like a Cylon eye for added intimidation!
  • Raspberry Pi (Optional) – https://www.amazon.com/CanaKit-Raspberry-Premium-Clear-Supply/dp/B07BC7BMHY/ref=sr_1_3 Oliver was developed with DEFCON in mind; and I wanted to take Oliver to DEFCON and drive him around while I was there. In order to do this I needed to run a local, secure, MQTT broker and a wireless Access Point, and I needed it to be portable. If this isn’t the case, then you can connect to your home wifi and run an MQTT broker from your local server or Digital Ocean box or something.
  • Zipties (Optional) I used zipties to affix my battery since I bought the wrong size. You may end up not needing them depending on which battery you buy.

Step 1: Put together your Chassis

  1. Breadboard
  2. L298N Motor Driver
  3. Battery
  4. Cargo Area

The OSEPP Robot Kit comes with some pretty good instructions, and it’s fairly modular. I bought an extra set of Aluminum Beams from OSEPP to make my Oliver a little wider. All it took was replacing the step where it said “Connect the Left to the Right side” with 2 Beams instead of just one.

Next pile your components onto the chassis. I put my breadboard in a central location with the motor driver just in front of it. Then attached the second black plate to the front to provide a spot for cargo. (At DEFCON I put the raspberry pi there)

Step 2: Wire Motors to the L298N Board

Each Motor has both a white and black/white striped wire. These will be connected to the output terminal blocks on the L298N board. Just follow the diagram below (if you are using the same kit as me, otherwise there may be some trial and error to determine which polarity to use for your motors. If one, or both, motors are turning in the wrong direction and you know the code is correct, then try swapping Output 1 for Output 2 or Output 3 for Output 4.

Step 3: Wiring the Inputs

Now we need to wire the PWM pins from the ESP8266 to the Motor Driver board. I’ll be referencing the below pinout diagram. Yours may be different, in that case just google “[My Arduino Board] Pinout Diagram” and go to images.

EN1 and EN2 are our Enable pins. When sent a PWM value of 0-255 we can tell the motors how fast to go. EN1 for Motor1 and EN2 for Motor2. IN1-4 are used to tell the motor which direction to go. For example, if we send a HIGH output to IN1 and LOW to IN2, then Motor1 will spin forward as fast we tell EN1 to.

Read More

Someone from an IRC (Internet Relay Chat) channel needed a set of data normalized, all he knew was that it fell loosely into a bell curve (or was supposed to anyway). With that information, we can take the data set and calculate the equation needed to graph the bell curve. After we have the equation we then compare the difference between the value ‘y’ with the corresponding one generated by the equation.

Our sample data set for now is:

1
[0, 0, 0, 0, 0, 0, 6, 13, 15, 18, 20, 19, 17, 15, 16, 13, 11, 11, 9, 7, 4, 2, 0, 0]

(You will find that this is quite a small data set for any kind of accurate statistical curve matching. The resulting curve will appear off due to the few datapoints. You are welcome to try a larger data set. But I digress…)

First we need to know WHAT the equation for a bell curve is. I found this to be the most fitting for normalizing small data sets.

1
(sigma * sqrt(2 * pi) * e)^(-((x - mu)^2 / (2 * sigma)^2)

Where ‘sigma’ is the standard deviation and ‘mu’ is the mean (simple average) of the data.

So the first step we need to take is to calculate the standard deviation. This is generally found by following 3 steps: Find the mean of the data, use that to find the variance, and then take the square root of the variance. The script below should explain a bit better.

1
2
3
4
5
6
simple_mean = sum(data)
squared_differences = []
for i in data:
squared_differences.append((i - simple_mean)^2)
variance = sum(squared_differences) / (len(squared_differences) - 1)
std_deviation = sqrt(variance)

I trimmed the data set I was given by removing some outliers, in this case the leading zeros (save for one). Given the small data set and extreme outliers the curve was far too skewed to be useful (Although technically it was correctly normalized). So the new dataset I have now is:

1
[0, 6, 13, 15, 18, 20, 19, 17, 15, 16, 13, 11, 11, 9, 7, 4, 2, 0]

Using those numbers I arrived at the following results:

mu ~= 10.9
sigma ~= 6.5

This gives us the final equation of

1
(6.5 * sqrt(2 * pi) * e)^(-((x-mu)^2 / (2 * sigma)^2)

Desmos Link | Interactive Graph

This image above will show you a working graph should the Desmos Link fail in the future.

TODO :: Write the script to compare y values. (Part 2 maybe?)

Read More

The Cool Part

In a nutshell, I got to use machine learning to profile users and recommend them products based on browsing data and user profile analyses.

The Problem (Business Requirements)

For this project the problem I was presented with was a catalog of 30,000+ products in the same genre. How do we show users the products they want to see without having to click “Next Page” 1,000 times. Of course we could have bucketed users based on demographic, incoming traffic channel, geographic location, etc. But all of that required setting very manual outlines of who our customer was, something prone to error and human bias.

The Solution

The solution was two-fold, create an interactive way to gather user interests, and then use that data to recommend products more likely to convert that user.

Our Head UX Engineer did a write up on the user story and interactive requirements [here]. I’ll try to stick to the more technical side.

Overview

At a high level, we feed product attribute vectors into a program that clusters them based on product attributes. When a user interacts with a few products we find the nearest cluster and suggest products from that cluster (Using business logic on top of this like top sellers and revenue generating products due to profit margin and such) until we have enough data to profile them. Lastly we combine product and user features (User features are generated through Collaborative Filtering, popularized by Netflix) to suggest the ideal products to the customer.

Tech Stack:

  • R
  • Spark
  • PHP
  • Postgresql

The First Step – Product Tagging

The first step was to tag products in a meaningful and easy to consume way. We decided on creating a vector of features that best described each product within our genre. Since we sold lingerie our attributes became stuff like “Lacy”, “Chest Coverage”, “Sheerness”, etc. So each product was assigned a vector with these attributes ranked from one to ten. In order to get these attributes we outsourced some data entry and had several people rank each product and took the mean vector for each product as the true representation (We built a page that gave our PM the ability to generate a unique link to send to freelancers that allowed them to tag products and for her to track their progress). In the end each product, to our program, looked something like this

1
lace_up_bustier <- [1.7, 6.8, 9.2, 4, 2.1, 4.7, 7.9, 10, 9.2];

Step Two – Product Clustering

After all, most, of our products were tagged with attributes, I set to work on clustering them. This was the first use of Spark’s ML libraries. Hooking up R with Spark made it easy to load all the products into a DataFrame, run some clustering, and tweak until my heart’s content. Mostly just working with the number of clusters. I exported to CSV and built a small tool to visualize the clusters to get an intuitive feel for what product images felt the most similar.

https://imgs.xkcd.com/comics/machine_learning.png

Step 3 – Collaborative Filtering (Part 1)

Product Features alone aren’t enough to suggest products to people. Since some users will like Bustiers and Swimwear. Two wildly different products, but they aren’t mutually exclusive in interest. To solve this we need to mix in some User Features. After a bunch of research (cough Googling cough), I found out about the Netflix Prize competition. This was an open contest Netflix posted to anyone who wanted to improve their recommendation algorithm, the winner used something called “Collaborative Filtering”. Which essentially provides a matrix containing [user_id, movie_id, rating]. At a high level the algorithm works like this.

We know “Person A” likes the movies Star Trek, Gattaca, and Repo! ‘The Genetic Opera’. And we know “Person B” likes Star Trek and Gattaca

We know that “Person A” has rated the following movies like so:

1
2
3
4
5
{
"Gattaca" : 4.5,
"Star Trek" : 5,
"Repo! The Genetic Opera" : 4.5
}

And “Person B” has rated their movies like this:

1
2
3
4
{
"Gattaca" : 5,
"Star Trek" : 4.5
}

Intuitively, we can see that Person B has similar enough interests to Person A that we can guess they will like “Repo! The Genetic Opera”. I just Used the ALS (Alternating Least Squares) method of Collaborative Filtering to make sure. Which, as far as I understand it, is just a truck load of matrix factorization with a hint of magic sprinkled on top.

Step IV – Collaborative Filtering (Part 2)

So now we know we have the ability to pass in a movie and get a predicted rating for that user profile. However, ratings are explicit data; the user performed an action saying “I think this thing is worth 3 out of 5”, which is a pretty solid indication. Unless the users were dishonest, in that case, they deserve the algorithm to work against them.

Anyways, in eCommerce we deal mostly with implicit data. Clicks, viewing time, adding to wishlists, purchases, etc; I wanted to add some optional rating feedback feature but nobody would let me. It totally doesn’t keep me up at night thinking what could have been… Ahem Sadly, working with implicit data wasn’t giving me the results I wanted. So I wrote a linear function to convert implicit data to a suggested rating. It ate implicit data, weighted each interaction against each other, and spit out a rating from 1 – 5 (And sometimes 6, if they so happened to buy a product 15 times or something else crazy).

After the implicit → explicit conversion, I set to transforming all of our user interaction data to a [user_id, product_id, suggested_explicit_rating] matrix. I used all user interaction data for the past year, including inactive users, since while they no longer use the site, I felt their interactions would still help the final model. This was several million total data points, I worked on expanding it later but the results of the final model did not change much so I kept it to the last year of interaction.

The Fifth One – Collaborative Filtering (The Filtering!)

Lastly, I utilized Spark’s ML libraries, which included an ALS Collaborative Filtering method builtin to build the model. With this I built a service that took a user_profile and a product and gave a predicted rating. With that service I created a user-centric API that piled on all three layers; Business Logic, Product Feature Clustering, and User Feature Collaborative Filtering.

Read More
⬆︎TOP