Too Clever for Your Own Good

Photo by Laura Ockel on Unsplash

Too Clever for Your Own Good

Anti-K.I.S.S. software solutions from the real world

One of the tenets of software development is to avoid premature optimization. Premature optimization is when a developer invests time in making software faster or more efficient when there is little to no actual benefit. Doing so may make the software more difficult to maintain or debug.

I was going to suggest a new tenet about avoiding being too clever, but that likely falls under the K.I.S.S. principle (Keep it simple, stupid). I've collected some excellent real-world examples of software solutions that were clever, complex, and yet may have missed the mark.

One of the earliest examples from my career occurred back in 2007 or 2008. I was working on a website for a university. They wanted a page on the site where alums could post a message about the university. The task was simple enough. Users would enter their first name, their last name, and a plain-text message. The fields were stored in a database and rendered nicely on the screen. As I recall, the site used plain ol' PHP which meant I had to take care to escape or encode the user-entered strings against SQL injection and JS injection. I created the database table and the web page that would display the messages. I added little tricks like disabling the submit button when clicked so that users couldn't unintentionally submit the form twice (a fairly common thing to do back then). When it came to displaying the names of the alums, I wanted to display the names in the correct case, so I created a helper method to lowercase the name fields and then uppercase the first letter of each. That's when I started to spiral. What about last names like "McGee" or "O'Hare"? What about names like "Cristina Fernández de Kirchner"? I really pulled the thread on the sweater with this one. I kind of locked up at the daunting nature of what I was trying to do. My boss came by and I shared where I was stuck. He simplified the whole problem by directing me to just display whatever the user enters. The customer didn't ask for anything more than that and there was no value in doing anything more. If an alum can't letter case their name correctly, that's on them.

At another employer, it was my boss who built something that I thought was a bit crazy at the time. As I recall, he built a database table that could represent relationships between records in a flexible way. I don't have the original schema, so I'll do my best to explain. For example, if there was a group of users in the system and one person was designated as the team lead, rows would be added to this table for each team member that referenced the team leader record. If there were two team leaders, additional rows could be added to also point to that team leader. The schema was simple and flexible, but I was concerned about the number of rows needed to represent these relationships and how many rows needed to be added or removed to make a change. For instance, if you had a team of twenty members and two leaders and you added a new leader and also removed the two old leaders, there was a surprising number of inserts and deletes to do from my perspective. I remember discussing this with my boss. As an SQL Server expert, he wasn't concerned about the number of writes and deletes. I felt like the generic record relationship table would have been unusual to implement and maintain in practice. Unfortunately, we never got to see if his solution would have worked in production as the project ended up being canceled.

One of my favorite clever solutions was developed by a good friend of mine. We worked at an insurance company together. In insurance speak, an "illustration" is a document or data that shows the value of an insurance policy over time so that policyholders or prospects can get an idea of how a particular product will perform based on numerous factors. The illustration system was built to provide one illustration based on the input parameters provided, such as age, state, gender, coverage amount, etc. We had a mobile-friendly web application that was built for the salespeople. It interfaced with the illustration system so it could provide insurance premium quotes that can be found in a full illustration request. The salespeople needed a way to use one set of input parameters and get a quote for each product that made sense. Illustrations are mathematically expensive to compute and invalid request exceptions are expensive to throw, so we decided that it would be better to only send requests that we knew were good to the illustration server. For that, we created a table of input parameters that ended up being very large. There were rows for the product IDs, age ranges, genders, states, and more. The table was easy enough to understand. Loading the table is where we ran into problems. To load the table for each product, my friend made a complex SQL query with numerous cross-joins. When executed, the query would generate all the allowed input values for a given insurance product. The queries ended up being difficult to maintain and creating a new product query from scratch took three experts an hour and we ended up getting it wrong. In the end, my friend decided on his own that his solution was novel but unmaintainable. He regretted using a table and wished he had just done it all in code. I liked the solution on concept, but I think we needed a tool to create the rows for us. Creating the SQL by hand is what did us in. Unfortunately, the business rules for the insurance products were not in a system-consumable format, so we'd have to build that, too. The ROI just wasn't there for all that.

More recently, my current team discovered that our API request validation code had a glaring logical error in it that was allowing requests we didn't want to be processed. We fixed the rules, expecting no one would notice, but we were wrong. Another team was already used to sending "bad" requests and we were breaking their code with our fix. We met to discuss our validation rules and we ran into a possible pre-op issue. We set a minimum length of three characters for the first name and last name fields. The API feature is a search feature, so it returns an array of data based on the inputs provided. The team didn't want someone searching with name parameters that were too short and causing a large dataset to be generated. However, restricting each field to a minimum of three characters meant you couldn't search for "D* Dawson". The team with the user-facing app said they needed that to work. I suggested that we could allow one character in one field as long as there were three characters in the other. Then, my staff engineer mentioned all the last names in the world that are only two characters long. I was sharing all this with my VP as an amusing story and he wondered how much slower the response actually is when single character parameters are used for the first name and last name fields. Is searching by single characters actually that slow? Is the value to the user more important? Is there a better way to solve the problem? Is the problem even real? More to come on this one! I need to take some measurements.

Effective collaboration with your customer can get you the answers you need regarding the value of a feature and the expectations of users. Everyone usually wants performance, accuracy, and timeliness, but for some features one of those may be more important than another. Instead of just asking what your customer needs functionally, you could ask them about the value the feature provides, how many users will use the feature, etc. Then, it takes some discipline for the agile dev team to deliver what was asked without trying to be too clever.

Cheers!