Regex Inside Findall Vs Regex Inside Count
Solution 1:
You should use str.count
instead of count.
spam_data['text'].str.count('\w')
083
Name: text, dtype: int64
To access the first value use:
spam_data['text'].str.count('\w')[0]83
Solution 2:
How would one do that for counting any letter in the entire alphabet in a string, using the count method?
wrd = 'Go until jurong point, crazy.. Available only in bugis n great world la e buffet... Cine there got amore wat...'
>>>> count = sum([''.join({_ for _ in wrd if _.isalpha()}).count(w) for w in wrd])>>>> count
83
explanation: get the sum of unique letters count (inside a set
) in the wrd
using list comprehension.
similar to:
count = []
set_w = set()
for w in wrd:
if w.isalpha():
set_w.add(w)
for w in set_w:
count.append(wrd.count(w))
print(sum(count))
Solution 3:
In this one:
spam_data['text'][0].count((r'[a-zA-Z]'))
the count
accepts parameter by string, not regex, that is why it returns 0.
Use your second example.
Solution 4:
Short answer: you did not use a regex, but a raw string literal, and thus count occurrences of the string '[a-zA-Z]
.
Because a string of the format r'..'
is not a regex, it is a raw string literal. If you write r'\n'
, you write a string with two characters: a backslash and an n
. not a new line. Raw strings are useful in the context of regexes, because regexes use a lot of escaping as well.
For example:
>>> r'\n''\\n'>>> type(r'\n')
<class'str'>
But here you thus count the number of times the string'[a-zA-Z]'
occurs, and unless your spam_data['text'][0]
literally contains a square bracket [
followed by a
, etc., the count will be zero. Or as specified in the documentation of str.count [Python-doc]:
string.count(s, sub[, start[, end]])
Return the number of (non-overlapping) occurrences of substring
sub
in strings[start:end]
. Defaults forstart
andend
and interpretation of negative values are the same as for slices.)
In case the string is rather large, and you do not want to construct a list of matches, you can count the number of elements with:
sum(1 for _ in re.finditer('[a-zA-Z]', 'mystring'))
It is however typically faster to simply use re.findall(..)
and then calculate the number of elements.
Post a Comment for "Regex Inside Findall Vs Regex Inside Count"