Your Code May Be Clean But...

Does your code clean?

I was recently reminded of software engineering practice that I adopted years ago. I think it is a very valuable and pragmatic practice. First, let me share a few scenarios that led me to adopting this practice.

The PrintExpert Server Fiasco

When I worked at WoodmenLife, we had a software product called PrintExpert deployed to a virtual machine running an older version of Windows server. The platform was fairly complex. There were the vendor's product and utilities. There were also applications that were custom built for our purposes. One of the side affects of this setup was leftover working files. The platform would run a batch of print jobs for insurance certificates and letters. Data would get merged into templates and written to disk. One set of temporary processing files had a specific naming convention. It was something like XXX0123.tmp (I don't recall exactly). The way the software worked is it looked at the filesystem, found the tmp files, parsed the name of the most recent one, and added 1 to it. That number was then used to create the next file (like XXX0124.tmp).

One day, we ran into an issue with running the batch. Some part of the platform crashed and processing stopped. After some debugging, we figured out that the code that writes those *.tmp files had hit the top of the number (like XXX9999.tmp) and threw an exception when it could increment to a new number. Why didn't it rollover back to 1? Why didn't it use a longer number? No idea. The fix that was implemented was a simple batch script or VBS script that cleaned up the files. It was scheduled to run on an interval using the Windows Task Scheduler.

After many moons of the platform doing its job, it was time to upgrade the servers. Our technical services department had a policy that they didn't do in-place Windows upgrades. They created a generic Windows server image in VMWare and deployed new servers for us. The team didn't have perfect documentation, so it was a bit of a struggle to deploy all the tools, update all the configs, wire up the databases, etc.

After all the servers were setup and everything tested, we cut over to the new servers and everything was fine for many moons. Then, we had a batch failure. After relearning some of the lessons of the past, we rediscovered that we had filled the temp folder with *.tmp files and caused a crash. We found the old script file and rescheduled it.

This incident planted the seed that spouted in my next project.

My New Platform Cleans Up

One of my proudest accomplishments is a platform I built single-handedly called Document Conversion Service. The platform had a singular purpose. The workflow management solution we implemented was setup to show digital documents to users as single-page TIFF files. The DCS platform accepted PDF, Word, and other document types and converted them programmatically to single-page TIFF files. The TIFF files would then get picked up and inserted into the digital document storage solution.

I designed the system to only use the file system for managing state. It would receive a file through a web API and write the file to an inbox folder. The software that did the actual conversion was CPU-intensive and needed to be a long running process, so I created a Windows service to process the documents. The Windows service would write the converted pages to an outbox folder that was shared on the network. I used a trigger file to indicate to the client that conversion was done. I can't remember if the DCS clients couldn't delete files from the outbox or just didn't delete files from the outbox. Either way, those files would need to be cleaned up at some point.

Recalling the issue with the print platform and the scheduled script, I decided to add another .NET Task to the Windows service to check the outbox folder and remove files that were older than some number of days. The number of days was configurable. The great thing about this solution is that when the DCS platform gets deployed to a new server, there are no extra steps for clean up. Cleaning up the working folders is a primary feature of the platform.

Not Just for Files

I highly recommend adopting the practice of having your code clean up after itself. This can apply to files, folders, logs, database records, dead letters, and more. If your code receives or makes digital artifacts, take a moment to consider how the system may look after running for years. Having clean code that cleans up after itself can mean a more resilient and maintainable system that can run indefinitely.

Cheers!