Interactive Data Visualization In Python with Pygal

By Sara A. Metwalli

Bar charts help show the overall data, but if we want to get more specific, we can choose a different type of char, namely, a treemap. Treemaps are useful for showing categories within the data. For example, in our dataset, we have the number of cases based on each county in every state. The bar chart was able to show us the mean of every state, but we couldn’t see the case distribution per county per state. One way we can approach that is by using treemaps.

Let’s assume we want to see the distribution of the detailed cases for the 10 states with the most significant number of cases. Then, we need to manipulate our data first before plotting it.

sort_by_cases = data.sort_values(by=['cases'],ascending=False).groupby(['state'])['cases'].apply(list)
top_10_states = sort_by_cases[:10]
treemap = pygal.Treemap(height=400)[treemap.add(x[0], x[1][:10]) for x in top_10_states.items()]

display(HTML(base_html.format(rendered_chart=treemap.render(is_unicode=True))))

This treemap, however, is not labeled, so we can’t see the county names when we hover over the blocks. We will see the name of the state on all the county blocks in this state. To avoid this and add the county names to our treemap, we need to label the data we’re feeding to the graph.

Image for post
Image for post
Unlabeled Treemap

Before we do that, our data is updated daily. Hence, there will be several repetitions for each county. Since we care about the overall number of cases in each county, we need to clean up our data before adding it to the treemap.

#Get the cases by county for all statescases_by_county = data.sort_values(by=['cases'],ascending=False).groupby(['state'], axis=0).apply( lambda x : [{"value" : l, "label" : c } for l, c in zip(x['cases'], x['county'])])cases_by_county= cases_by_county[:10]

#Create a new dictionary that contains the cleaned up version of the data

clean_dict = {}start_dict= cases_by_county.to_dict()for key in start_dict.keys(): values = [] labels = [] county = [] for item in start_dict[key]: if item['label'] not in labels: labels.append(item['label']) values.append(item['value']) else: i = labels.index(item['label']) values[i] += item['value'] for l,v in zip(labels, values): county.append({'value':v, 'label':l}) clean_dict[key] = county

#Convert the data to Pandas series to add it to the treemap

new_series = pd.Series(clean_dict)

Then we can add the series to the treemap and plot a labeled version of it.

treemap = pygal.Treemap(height=200)[treemap.add(x[0], x[1][:10]) for x in new_series.iteritems()]

display(HTML(base_html.format(rendered_chart=treemap.render(is_unicode=True))))

Awesome! Now our treemap is labeled. If we hover over the blocks now, we can see the name of the county, the state, and the number of cases in this county.

Image for post
Image for post
Labeled Treemap

The complete code for the treemap