Skip to main content

On Keeping Task Descriptions Up to Date

· 3 min read

Context

In software engineering, we often encounter work that is repetitive in nature. Examples of such tasks include setting up a new project, conducting a series of manual tests, or making a new release. While some tasks can be automated, reducing them to a mere click of a button or execution of a script, others are more complex and demand careful attention during their execution. Given that these tasks may not always be performed by the same individual, it's crucial to determine how to ensure correct execution every time.

Easy vs Right

· 3 min read

Context

Balancing the ease of implementation with the correctness of a solution is a complex trade-off. When developing a package to be used in a CI/CD pipeline for multiple repositories, I encountered the challenge of deciding how to handle the package's versioning strategy within the CI process.

Problem

Pinning the package to a specific version in the Jenkins script for each repository ensures that the CI process is stable and predictable. However, this approach necessitates manual intervention for each repository whenever a new package version is released, which can be problematic, especially considering the following pragmatic factors:

  • Diverse repositories managed by different teams, where gaining approvals for changes can be time-consuming and laborious.
  • The package is still under active development, with new versions released frequently.

On the other hand, always using the latest package version in CI pipelines simplifies updates but risks unexpected disruptions. This approach can eliminate the need for manual updates to many repositories but also introduces the risk of breaking changes, leading to failing CI pipelines across various repositories, which can have adverse consequences:

  • Unexpected disruptions for developers in their branches or PRs.
  • Resistance from developers, possibly leading to the removal or ignoring of this CI step.

Discussion

Finding a balance between the two approaches is crucial, and importantly, it requires a deeper understanding of the underlying problems and whether we can address them in a more fundamental way. Here are some hidden issues behind this problem:

  • Why do cross-team, multi-repo changes intimidate and slow down processes?
  • Are there ways to automate the creation of similar changes to multiple repositories?

For the first problem, it might be a management issue where a standard procedure can be devised to guide the process of assigning responsibilities and gaining approvals for cross-repo changes within the organization/team. For the second problem, it might require additional tooling to address the repetitive nature of the changes. It could also suggest that this configuration might benefit from more centralized control, where a single repository can manage the package version for all connected repositories.

Retrofitting

Before diving into what I would consider a better approach, I would like to discuss how we can retrofit the easy solution (always install and use the latest package version in CI pipelines):

  • Commit to backward compatibility: Avoid breaking changes at all costs.
  • Support previous x versions:
    • Maintain backward compatibility for the previous x versions.
    • Notify users of required upgrades without breaking their current setup for a reasonable period.
  • Provide upgrade support: Assist repositories in adapting before releasing breaking changes and updating the package version after new releases.

Solution

The solution I propose is to pin a specific package version in CI and upgrade only when necessary. To address the issues, I would also propose the improvement items mentioned in the discussion section:

  • To deal with the troublesome manual updates:
    • Create codemod-like tools or scripts to automate the process.
    • Revert the usage model to more centralized control, where a single repository can configure the package version and the repositories that will use this package in the CI pipeline.
  • To deal with cross-team, multi-repo changes:
    • Find out the established process for proposing and getting support for cross-repo changes, which may involve sharing the proposal in a forum/meeting, getting the owners' support, and then proceeding with the changes with known assigned liaisons for each repository.

The Prebound Method and Sentinel Object Pattern in Python

· 3 min read

Motivation

This article reflects on the blog posts, The Prebound Method Pattern and The Sentinel Object Pattern, by Brandon Rhodes. I'll briefly summarize the patterns and discuss my thoughts on them.

The Prebound Method Pattern

This pattern can be observed when using built-in functions such as random and logging. Instead of needing to create a new instance of the class, we can simply call the function directly. This is possible because a default instance is created within the module, and the instance method is assigned to the module's global namespace.

For example, a logger could be created as follows:

class Logger:
def __init__(self, name):
self.name = name

def log(self, message):
print(f"{self.name}: {message}")

_default_logger = Logger("default")
log = _default_logger.log

The log method is assigned to the module's global namespace, allowing it to be called directly. This is a simple example, but it's useful when you want to create and use a default instance of a class without needing to create a new one.

This supports the usage of the logger as follows:

import logger

logger.log("Hello World")

This pattern isn't super common, but it's a neat trick to know about. It feels a bit like the singleton pattern. However, since we don't restrict the number of instances that can be created, it's not truly a singleton.

The Sentinel Object Pattern

This pattern highlights that, despite Python's support for None, we can sometimes provide a more meaningful value to represent a missing value. This is particularly useful when we need to differentiate between a missing value and a valid one.

To illustrate with a similar example from the original article, consider the context of open-source software, where we might want to specify the type of license. We could use None to represent an unspecified license type. However, a valid alternative could be to assign it to a License object that clearly indicates the type of license (e.g., "not specified" or "unlicensed", which may mean different things).

To give a similar example provided in the original article, suppose in the context of open source software, we want to provide a value of the license type. We can use None to represent the case where the license type is not specified. However, a valid alternative could be assigning it to a License object that clearly indicates the type of license (whether it is "not specified" or "unlicensed", they may mean different things).

Here's an example of the pattern in action:

class License:
def __init__(self, name):
self.name = name

def get_name(self):
return self.name

class Package:
def __init__(self, name, license):
self.name = name
self.license = license

# INSTEAD OF
packages = [
Package("dummy1", None),
Package("dummy2", License("BSD")),
]

for package in packages:
if package.license is None:
print("not specified")
else:
print(package.license.get_name())

# WE CAN USE
UNLICENSED = License("unlicensed")
NOT_SPECIFIED = License("not specified")

packages = [
Package("dummy1", UNLICENSED),
Package("dummy2", NOT_SPECIFIED),
Package("dummy3", License("BSD")),
]

for package in packages:
print(package.license.get_name())

The advantage here is replacing None with more explicit values, which documents the intent more clearly. This may also reduce the need for None checks in the code.

Conclusion

The two patterns are subtle but can be quite useful in certain cases. It's good to be aware of them and use them when appropriate. The original articles are also worth reading for more detailed explanations.