Skip to content Skip to sidebar Skip to footer

Regex Inside Findall Vs Regex Inside Count

This is a follow up question to How to count characters in a string? and to Find out how many times a regex matches in a string in Python I want to count all alphabet characters i

Solution 1:

You should use str.count instead of count.

spam_data['text'].str.count('\w')

083
Name: text, dtype: int64

To access the first value use:

spam_data['text'].str.count('\w')[0]83

Solution 2:

How would one do that for counting any letter in the entire alphabet in a string, using the count method?

wrd = 'Go until jurong point, crazy.. Available only in bugis n great world la e buffet... Cine there got amore wat...'
>>>> count = sum([''.join({_ for _ in wrd if _.isalpha()}).count(w) for w in wrd])>>>> count
83

explanation: get the sum of unique letters count (inside a set) in the wrd using list comprehension. similar to:

count = []
set_w = set()
for w in wrd:
    if w.isalpha():
        set_w.add(w)

for w in set_w:
    count.append(wrd.count(w))

print(sum(count))

Solution 3:

In this one:

spam_data['text'][0].count((r'[a-zA-Z]'))

the count accepts parameter by string, not regex, that is why it returns 0.

Use your second example.

Solution 4:

Short answer: you did not use a regex, but a raw string literal, and thus count occurrences of the string '[a-zA-Z].

Because a string of the format r'..' is not a regex, it is a raw string literal. If you write r'\n', you write a string with two characters: a backslash and an n. not a new line. Raw strings are useful in the context of regexes, because regexes use a lot of escaping as well.

For example:

>>> r'\n''\\n'>>> type(r'\n')
<class'str'>

But here you thus count the number of times the string'[a-zA-Z]' occurs, and unless your spam_data['text'][0] literally contains a square bracket [ followed by a, etc., the count will be zero. Or as specified in the documentation of str.count [Python-doc]:

string.count(s, sub[, start[, end]])

Return the number of (non-overlapping) occurrences of substringsub in string s[start:end]. Defaults for start and end and interpretation of negative values are the same as for slices.)

In case the string is rather large, and you do not want to construct a list of matches, you can count the number of elements with:

sum(1 for _ in re.finditer('[a-zA-Z]', 'mystring'))

It is however typically faster to simply use re.findall(..) and then calculate the number of elements.

Post a Comment for "Regex Inside Findall Vs Regex Inside Count"