Why Is Collections.counter Much Slower Than ''.count?
Solution 1:
Counter()
allows you to count any hashable objects, not just substrings. Both solutions are O(n)
-time. Your measurements show that the overhead of iterating and hashing individual characters by Counter()
is greater than running s.count()
4 times.
Counter()
can use C helper to count elements but it seems it doesn't special case strings and uses general algorithm applicable for any other iterable i.e., processing a single character involves multiple Python C API calls to advance the iterator, get previous value (a lookup in the hash table), increment counter, set new value (a lookup in the hash table):
while (1) {
key = PyIter_Next(it);
if (key == NULL)
break;
oldval = PyObject_GetItem(mapping, key);
if (oldval == NULL) {
if (!PyErr_Occurred() || !PyErr_ExceptionMatches(PyExc_KeyError))
break;
PyErr_Clear();
Py_INCREF(one);
newval = one;
} else {
newval = PyNumber_Add(oldval, one);
Py_DECREF(oldval);
if (newval == NULL)
break;
}
if (PyObject_SetItem(mapping, key, newval) == -1)
break;
Py_CLEAR(newval);
Py_DECREF(key);
}
Compare it to FASTSEARCH()
overhead for bytestrings:
for (i = 0; i < n; i++)
if (s[i] == p[0]) {
count++;
if (count == maxcount)
return maxcount;
}
return count;
Solution 2:
The Counter
class inherits from dict
, while string.count
is the following C-implementation (CPython 3.3):
/* stringlib: count implementation */#ifndef STRINGLIB_FASTSEARCH_H#error must include"stringlib/fastsearch.h" before including this module#endifPy_LOCAL_INLINE(Py_ssize_t)
STRINGLIB(count)(const STRINGLIB_CHAR* str, Py_ssize_t str_len,
const STRINGLIB_CHAR* sub, Py_ssize_t sub_len,
Py_ssize_t maxcount)
{
Py_ssize_t count;
if (str_len < 0)
return0; /* start > len(str) */if (sub_len == 0)
return (str_len < maxcount) ? str_len + 1 : maxcount;
count = FASTSEARCH(str, str_len, sub, sub_len, maxcount, FAST_COUNT);
if (count < 0)
return0; /* no match */return count;
}
Guess, which one is faster? :)
Post a Comment for "Why Is Collections.counter Much Slower Than ''.count?"