Skip to content Skip to sidebar Skip to footer

Letter Frequencies: Plot A Histogram Ordering The Values Python

What I am trying to do is to analyse the frequency of the letters in a text. As an example, I will use here a small sentence, but all that is thought to analyse huge texts (so it's

Solution 1:

For counting we can use a Counter object. Counter also supports getting key-value pairs on the most common values:

from collections import Counter

import numpy as np
import matplotlib.pyplot as plt

c = Counter("quatre jutges dun jutjat mengen fetge dun penjat")
plt.bar(*zip(*c.most_common()), width=.5, color='g')
plt.show()

The most_common method returns a list of key-value tuples. The *zip(*..) is used to unpack (see this answer).

Note: I haven't updated the width or color to match your result plots.

Solution 2:

Another solution using pandas:

import pandas as pd
import matplotlib.pyplot as plt

test = "quatre jutges dun jutjat mengen fetge dun penjat"# convert input to list of chars so it is easy to get into pandas 
char_list = list(test)

# create a dataframe where each char is one rowdf = pd.DataFrame({'chars': char_list})
# drop all the space charactersdf = df[df.chars != ' ']
# add a column for aggregation laterdf['num'] = 1
# group rows by character type, count the occurences in each group# and sort by occurancedf = df.groupby('chars').sum().sort_values('num', ascending=False) / len(df)

plt.bar(df.index, df.num, width=0.5, color='g')
plt.show()

Result:

enter image description here

Edit: I timed my and ikkuh's solutions

Using counter: 10000 loops, best of 3: 21.3 µs per loop

Using pandas groupby: 10 loops, best of 3: 22.1 ms per loop

For this small dataset, Counter is definately a LOT faster. Maybe i'll time this for a bigger set when i have time.

Post a Comment for "Letter Frequencies: Plot A Histogram Ordering The Values Python"