How To Find Duplicate List Values?
Solution 1:
I'm not completely sure if I understand you correctly:
You would like to get all elements (tuples) of a list that have a collection of entries occurring multiple times in your list?!
A compact implementation can be realized if you combine itertools.groupby
with the operator.itemgetter
. This actually results in a one-liner expression:
from operator import itemgetter
from itertools import groupby
# how often must the pattern appear (redundancy)# what indices determine the pattern (target_slots)
redundancy, target_slots = 2, (1, 2)
eg_data_2 = [(0, 'Boby', 'beekeeper'), (1, 'Boby', 'beekeeper'), (2, 'Boby','beekeeper'), (3, 'Boby', 'gardener'), (4, 'Boby', 'gardener'), (5, 'Jack', 'gardener')]
targets = [k for k, v in groupby(eg_data_2, itemgetter(*target_slots)) ifsum(1for _ in v)>=redundancy]
targets
Out[6]: [('Boby', 'beekeeper'), ('Boby', 'gardener')]
For your original data (orig_data
below) you would get:
target_slots = [1,3]
targets = [k for k, v in groupby(orig_data, itemgetter(*target_slots)) ifsum(1for _ in v)>=redundancy]
In [9]: targets
Out[9]: [('Aaron Paul', 'sfp_names')]
As alternative, you can work with the itemetter
operator alone. The idea is to use the collections of elements as a key with the value being a list of element indices this particular collections occurs in. Then, if this list is longer than whatever threshold you chose (the redundancy
parameter below) we report this particular collection:
from operator import itemgetter
from collections import defaultdict
# how many times must the collection of elements appear
redundancy = 2# what are the indices of the collection
target_slots = [1, 2]
# the example data:
eg_data_2 = [(0, 'Boby', 'beekeeper'), (1, 'Boby', 'beekeeper'), (2, 'Boby','beekeeper'), (3, 'Boby', 'gardener'), (4, 'Boby', 'gardener'), (5, 'Jack', 'gardener')]
occurences = defaultdict(list) # this is just convenient, you can use a normal dict as well.for i, entry inenumerate(eg_data_2):
occurences[itemgetter(*target_slots)(entry)].append(i)
targets = [k for k,v in occurences.items() iflen(v) >=redundancy]
targets
Out[18]: [('Boby', 'beekeeper'), ('Boby', 'gardener')]
In case you want the elements rather than the repeated entries back, you need to slightly adapt the statement for the targets
as the sum(1...
will already consume the group iterator.
Here is how this could look:
from operator import itemgetter
from itertools import groupby
redundancy, target_slots = 2, (1, 2)
eg_data_2 = [(0, 'Boby', 'beekeeper'), (1, 'Boby', 'beekeeper'), (2, 'Boby','beekeeper'), (3, 'Boby', 'gardener'), (4, 'Boby', 'gardener'), (5, 'Jack', 'gardener')]
_targets = [(k, [e for e in v]) for k, v in groupby(eg_data_2, itemgetter(*target_slots))]
targets = [tg[1] for tg in _targets iflen(tg[1]) >= redundancy]
Which will give:
[ins] In [6]: targets
Out[6]:
[[(0, 'Boby', 'beekeeper'),
(1, 'Boby', 'beekeeper'),
(2, 'Boby', 'beekeeper')],
[(3, 'Boby', 'gardener'), (4, 'Boby', 'gardener')]]
Solution 2:
If I understand your question correctly, you're looking to get all the tuples from a list of tuples that have a duplicate value for a specific element of the tuple, but want to keep only those groups of duplicates that have a varying value for some other specific element of the tuple?
If so, I'm sorry to say you didn't do a very good job of explaining that and I mention that because getting a clear understanding of a problem, so you can explain it in a few words, also happens to be the best first step to coding something.
Example data:
[('a', 1, 0), ('a', 2, 0), ('b', 1, 0), ('c', 1, 0), ('c', 1, 0)]
In this example, assuming you'd be looking at the 1st (index 0) and 2nd (index 1) elements, I would expect you want [('a', 1, 0), ('a', 2, 0)]
as a result. The tuple with 'b'
isn't included because there is no second and the tuple with 'c'
isn't included because there is a second, but it does not have a different value for the other element.
Second example
('d', 1, 0), ('d', 2, 0), ('d', 2, 1)]
Something you don't address is shown here. They should be included, because the first element is the same for all and the second is not, but should all three be included, or just one (at random, or the first) of the tuples that has 2
for the second element? I'm assuming you'd want all of them because they meet your first two criteria.
from itertools import groupby
data = [('a', 1, 0), ('a', 2, 0), ('b', 1, 0), ('c', 1, 0), ('c', 1, 0)]
defmy_filter(el1, el2, xs):
return [e for l in [list(g) for k, g in groupby(xs, lambda x: x[el1])]
for e in l iflen(set([e[el2] for e in l])) > 1]
print(my_filter(0, 1, data))
Post a Comment for "How To Find Duplicate List Values?"