Skip to main content

The U in UX

· 3 min read

Unlike solving problems in an exam paper, building a product starts with defining a problem worth solving. Problems can come from pain points that we endure going through our quotidian life. Or, they can be astute observations of obstacles that people around us encounter. Some issues might seem mundane, while others have half-decent, duct tape solutions(a better analogy nowadays is a combination of Excel, Google Docs, and Whatsapp). I will be elaborating on three lessons I have learned in the recent UIUX workshop, in terms of product ideation and validation.

The first key lesson that I learned from the workshop (on Beauty is more than skin deep) is to be wary of bad ideas. We want to build something that stands out from existing competitions. We want mass adoption from human beings who have an innate tendency to prefer stability over change. If the above is true, we need to be able to answer the following: Is the problem that we are describing real? Does the proposed solution solve it? It's easy to see the relevance of the first question. During the workshop, we had a group proposing a product idea to help students find mentors easily. Critical response from the audience was how that product would provide utility to people who avail themselves as mentors. Assuming that the product does help their target users(students), it lacked the consideration of another significant group of involved parties(potential mentors). To go deeper into the above example, the proposed solution might not effectively address an inherent issue at hand: perhaps there's a lack of motivated mentors?

My second learning point is on making a product that offers better solutions. I would like to share an anecdote that resonates with the content of the workshop. I was reading blog posts by Kent C. Dodds (a software engineer and OSS educator) and came across his open-source project named "split-guide". It is a tool to help generate code for his workshops. Quoting the problem and the proposed solution highlighted in the repository's Readme for context:

Problem: Managing workshop repositories. A great way to do a workshop repo is to have exercises and exercises-final directories. The problem is keeping these two directories in sync. Solution: This allows you to create a template for both of these in a templates directory. Based on special text in these files, you can specify what parts of the file you want in exercises and exercises-final. This allows you to co-locate what you're going to be showing workshop attendees and what the final answer should be.

Having done many coding tutorials, I can see why managing workshop repositories can be a pain. However, it turned out that "split-guide" is no longer maintained. In the latest Pull Request (https://github.com/kentcdodds/split-guide/pull/24), Kent remarked that he thought the solution was neat but complex. The better solution/alternative that Kent settled for? "copy/paste/modify". This is such an apt illustration of how a solution that has more friction can go south.

The last point is on the importance of user validation. A surprising fact that I got out of the UX critique session is how much we can gain from iterations with a simple design and a group of target users poking at it. Feedback from users can guide us in making design choices that meet the eye and lead users down the intended happy path. One of the more striking comments that I remembered from the session was that users might not appreciate the sophisticated algorithms underneath if they are not motivated to use the product.

P.S. This is repost of my writing assignment for CS3216.

Data Visualization With Highcharts

· 6 min read

devto

Motivation

I was looking through the drafts that I wrote and thought that this one could be salvaged. I have done some simple graph visualization projects and I still think they are fun to work on. Though most of the time we just learn the APIs of the graphing libraries of our choice, these libraries work wonders to present data. So here is a short walk-through of how I would use HighCharts to showcase data from the Dev.to API. As an example, Dev.to API is used to retrieve details of 1000 articles to plot them in the form of a "packedbubble" graph. The size of each bubble refers to the reaction count (positive_reaction_count + comment_count). Then when hovered over, the title, URL, and the count of the article will be shown. The articles that have over 1000 reaction counts will be labeled. I have also arbitrarily chosen to only display articles of 8 categories/tags (More details in Step 2).

Initial Preparation

I have come to realize that a part of programming work is converting data from one form to the other. To use a front-end graphing library, in my experience having data in JSON format is the most convenient. However, there are times when the data source could be coming from CSV or Excel spreadsheet. We could either write some conversion scripts in Python or have some preprocessing steps in JavaScript. Papa Parse is one such JS helper package that I have previously used. Even if we have APIs that return us JSON formatted data, we might still need to manipulate it into the format that the charting library expects.

In this working example, I am choosing Highcharts for their rich features and extremely good documentations. They have many Jsfiddle examples that could serve as a good reference/starting point. However, do note that paid license is required to use their products commercially. To use it for free, note the following:

Non-profit organisations, schools and personal websites can enjoy our software for free under a Creative Commons (CC) Attribution-Non-Commercial license. In order to obtain a non-commercial license, please fill out this form.

The first thing to do is to find out what structure of the data is expected by Hightcharts. Sometimes this information can be confusing to figure out, given that documentations of graph/chart libraries are filled with options and explanations. So, we look at examples. This is one such example I found browsing their documentation. Looking at the code, it is easy to identify that data to be used in the chart is specified here:

series: [{
data: [1, 4, 3, 5],
type: 'column',
name: 'Fruits'
}]

So a series contains an array of individual groups of data. The actual data points are within the attribute data, in the form of an array. Upon further inspection of other examples, we can see that the data points need not be primitives like numbers or strings. They could be objects containing the data point and its metadata such as its name or other attributes. Now we are ready to proceed.


Step 1:

Fetch 1000 articles from Dev.to using the API:

async function makeGetRequestAndReturnJson() {
const response = await fetch('https://dev.to/api/articles?per_page=1000');
return await response.json();
}

Step 2:

Manipulate the data into the required format. Each individual data point is of the following format:

{
'title': 'someTitle',
'url': 'someUrl',
'value': 'someReactionCount'
}

And the code to filter and consolidate the data is as follows (I might have gone too functional in the data processing part, use of for-loops is possible too) :

async function processData() {
const targetTags = ['react', 'opensource', 'codenewbie', 'beginners', 'tutorial', 'webdev', 'showdev', 'productivity'];
const seriesData = [{
name: 'react',
data: []
},
{
name: 'opensource',
data: []
},
{
name: 'codenewbie',
data: []
},
{
name: 'beginners',
data: []
},
{
name: 'tutorial',
data: []
},
{
name: 'webdev',
data: []
},

{
name: 'showdev',
data: []
},
{
name: 'productivity',
data: []
}];
const data = await makeGetRequestAndReturnJson();
const filteredData = data.filter(article => article.tag_list.some(tag => targetTags.includes(tag)))
filteredData.forEach(article => {
const filteredTags = article.tag_list.filter(tag => targetTags.includes(tag))
filteredTags.forEach(tag => {
seriesData.find(type => type.name === tag).data.push(
{
title: article.title,
url: article.url,
value: article.comments_count + article.positive_reactions_count
})
});
})
return seriesData;
}

Step 3:

Setup and use the prepared data in the graph configuration process:

async function setupGraph() {
const seriesData = await processData()
chart = new Highcharts.chart('container', {
chart: {
type: 'packedbubble',
height: '50%',
},
title: {
text: 'Visualizing Dev.to articles'
},
tooltip: {
useHTML: true,
stickOnContact: true,
pointFormat: '<b>{point.title}:</b> <br/>Reaction Count: {point.value} <br/><a target="_blank" href={point.url}>{point.url}</a>'
},

plotOptions: {
packedbubble: {
useSimulation: false, // true for a better animation
minSize: '30%',
maxSize: '100%',
zMin: 0,
zMax: 2000, // the max value of the bubble
layoutAlgorithm: {
gravitationalConstant: 0.01,
splitSeries: false,
seriesInteraction: true,
dragBetweenSeries: true,
parentNodeLimit: true,
},
dataLabels: {
enabled: true,
format: '{point.title}',
filter: {
property: 'y',
operator: '>',
value: 1000 // labeling the articles with over 1000 in positive reaction counts
},
style: {
// adjusting of styles for data labels
color: 'black',
// textOutline: 'none',
// fontWeight: 'normal',
},

},
}
},
series: seriesData,
});
}

Step 4:

Invoke the function call when ready:

// trigger setupGraph function on document ready
document.addEventListener('DOMContentLoaded', () => {
setupGraph();
})

Step 5:

Create a basic HTML page to run the script and display the outcome:

<!DOCTYPE html>
<html lang="en">
<head>
<title>DevTo Visualization</title>
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<meta charset="utf-8" />
<!-- Load jQuery -->
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.4.1/jquery.min.js"></script>
<script src="https://code.highcharts.com/highcharts.js"></script>
<script src="https://code.highcharts.com/highcharts-more.js"></script>
<script src="https://code.highcharts.com/modules/exporting.js"></script>
<script src="https://code.highcharts.com/modules/accessibility.js"></script>
</head>
<body>
<div id="container"></div>
<script src="index.js"></script>
</body>
</html>

Conclusion

Putting everything together, here is the link to see the visualization in action. Here is the link to the GitHub repo if you are interested in the code.

In terms of difficulty, most of the complexity lies in knowing the settings and configurations of the library in use. I think the harder part is finding out what to visualize and the appropriate graph/chart type to use. What story should the data tell? In my quick example, I guess it shows that people really enjoy "collectible" and "mark for further usage" kinds of articles 😂.

Some further extension ideas:

  • Explore the Dev.to API to get some interesting data, such as
    • filter the tags using the API parameter to only retrieve articles of certain tags
    • Retrieve articles that you authored
  • Explore other graph/chart types available

It's not a bug but a feature?

· 5 min read

P.S. This turned out to be a rant...just so you know...

Motivation

I am currently involved in a small web scraping project. My job is to retrieve information about local charity organizations from a website that only offers a web-based search interface. The complexity of the project is manageable but I learned a thing or two in that process of making a runnable Python script to capture the data and transform it into a readable Excel spreadsheet.


1. Web Scraping Can Be Fun

Within the bounds of laws and regulations, programmatic ways of gathering information from the web are what I would recognize as different acts of web scraping. I believe it is not an unfamiliar concept to many, possibly due to the popularity of Python and how easy it is to do some of the simple web scraping tasks in Python. After working on web development projects for a while, I gained more understanding of how websites work behind the scene and in the most recent web scraping project that I worked on, I was able to make use of my insights to explore ways of gathering required data.

There was a working Python script wrote by someone else for the above-mentioned project when I took over. It was making use of basic Selenium selectors to crawl the information field by field. The issue with this approach is that the Chrome web driver involved has to be kept up-to-date with the user's Chrome browser. The script often fails to run after a month or two, and it is annoying to always have to download the driver again. The problem is made worst when the script has to be run by someone else.

Change is the only constant. I got the chance to update the script because the website it was scraping from had a major upgrade and the logic of the selectors no longer worked. With the web development experiences that I have now, I decided to do some preliminary checks and see if I could find an easier way to get the data.

The first thing that came to my mind was to check the network calls that the site is making and see if I can make them directly. Cutting the middleman out of the race is always a strategy worth trying. With the inspector tool opened, I was able to observe the requests and the responses when the website refreshes.

Whipping out an API testing tool, I was able to replicate the network calls directly. In fact, I can now gather the entire list of organization information in JSON representations. For individual details of an organization, I had to find the corresponding query string that the site used to identify it. This was interesting as the query string looks like this: M2E5M2Q1N2YtNzk2NS1lMzExLTgyZGItMDA1MDU2YjMwNDg0.

I was pretty clueless at first but as someone who has now been through the entire journey of front-end & back-end development, I know that people don't write perfect software and there are always clues hidden in the source code. Given that we can inspect the HTML of a website easily, I decided to look for hidden treasures in the HTML file. After some inspection, I found the piece of code that is used to make the query string: btoa(charityID). After googling btoa, I found out that it's a way to encode a string into base64. With that, I was able to simplify the web scraping process by encoding the string programmatically and using the requests package to simply making POST requests to get what I wanted.


2. It's not a bug but a feature?

I thought the above experience is interesting but the following point is what triggered me to write this article. After I made the script, I was informed that the resultant files had a few issues. Looking at the code again, I realized that there was a mistake.

To understand the problem, let me briefly introduce the background. The information organized in JSON format contains primary categories and sub-categories. Thus, one combination could be

  • Primary category: Personal
    • sub-category: Expenditure
  • Primary category: Business
    • sub-category: Expenditure

In the example given here, it is clear that both categories contain a sub-category called "Expenditure". This does not seem like a problem unless the JSON format is something like the following:

[
{ "key": "someOtherValue", "value": 123},
{ "key": "expenditure", "value": 123},
{ "key": "expenditure ", "value": 456},
{ "key": "someMoreValue", "value": 123},
]

It is simply an array of key-value pairs. So, how do the developers that created this schema find out whether an expenditure amount belongs to the "personal" or the "business" category?

Initially, I was unaware that the same identifier was being used twice. What I did realize is that some identifiers have a trailing space. I thought they were careless mistakes and put in some code to strip out trailing spaces while processing the data. Later, I found out that trailing spaces were intentional and that was how they differentiate one value from the other. The best part is that because the trailing spaces are practically visually hidden, the developers simply loop through the values in the array and displayed them normally as a table on the website. When I inspected the HTML, there were indeed trailing spaces for some of the identifiers. I was rather speechless to find out that a trailing space was used as part of an unique identifier. This is worse than having a bad name...

Conclusion

We all tend to take the shortest, most efficient path to make something work. This could mean copy-pasting code and making the slightest change to satisfy a new requirement. If the software is important and used by many, we ought to stop in our tracks sometimes and plan proper refactoring to make it right. Or else...