Combining Lists With Overlapping Elements
I have a collection of lists, some of which have overlapping elements: coll = [['aaaa', 'aaab', 'abaa'], ['bbbb', 'bbbb'], ['aaaa', 'bbbb'], ['dddd', 'ddd
Solution 1:
Here's a way to do it (assuming you want unique elements over the overlapping results):
defover(coll):
print('Input is:\n', coll)
# gather the lists that do overlap
overlapping = [x for x in coll ifany(x_element in [y for k in coll if k != x for y in k] for x_element in x)]
# flatten and get unique
overlapping = sorted(list(set([z for x in overlapping for z in x])))
# get the rest
non_overlapping = [x for x in coll ifall(y notin overlapping for y in x)]
# use the line bellow only if merged non-overlapping elements are desired# non_overlapping = sorted([y for x in non_overlapping for y in x]) print('Output is"\n',[overlapping, non_overlapping])
coll = [['aaaa', 'aaab', 'abaa'],
['bbbb', 'bbbb'],
['aaaa', 'bbbb'],
['dddd', 'dddd'],
['bbbb', 'bbbb', 'cccc','aaaa']]
over(coll)
coll = [['aaaa', 'aaaa'], ['bbbb', 'bbbb']]
over(coll)
output:
$ python3 over.py -- NORMAL --
Input is:
[['aaaa', 'aaab', 'abaa'], ['bbbb', 'bbbb'], ['aaaa', 'bbbb'], ['dddd', 'dddd'], ['bbbb', 'bbbb', 'cccc', 'aaaa']]
Output is"
[['aaaa', 'aaab', 'abaa', 'bbbb', 'cccc'], [['dddd', 'dddd']]]
Input is:
[['aaaa', 'aaaa'], ['bbbb', 'bbbb']]
Output is"
[[], [['aaaa', 'aaaa'], ['bbbb', 'bbbb']]]
Solution 2:
You can do this with sets using a successive merging approach:
coll = [['aaaa', 'aaab', 'abaa'],
['bbbb', 'bbbb'],
['aaaa', 'bbbb'],
['dddd', 'dddd'],
['bbbb', 'bbbb', 'cccc','aaaa'],
['eeee','eeef','gggg','gggi'],
['gggg','hhhh','iiii']]
pooled = [set(subList) for subList in coll]
merging = True
while merging:
merging=False
for i,groupinenumerate(pooled):
merged = next((g for g in pooled[i+1:] if g.intersection(group)),None)
ifnot merged: continuegroup.update(merged)
pooled.remove(merged)
merging = True
print(pooled)
# [{'aaaa', 'abaa', 'aaab', 'cccc', 'bbbb'}, {'dddd'}, {'gggg', 'eeef', 'eeee', 'hhhh', 'gggi', 'iiii'}]
Solution 3:
Working on a suggestion from alkasm in the comments, I used networkx:
import networkx as nx
coll = [['aaaa', 'aaab', 'abaa'],
['bbbb', 'bbbb'],
['aaaa', 'bbbb'],
['dddd', 'dddd'],
['bbbb', 'bbbb', 'cccc','aaaa'],
['eeee','eeef','gggg','gggi'],
['gggg','hhhh','iiii']]
edges = []
for i inrange(len(coll)):
a = coll[i]
for j inrange(len(coll)):
if i != j:
b = coll[j]
ifset(a).intersection(set(b)):
edges.append((i,j))
G = nx.Graph()
G.add_nodes_from(range(len(coll)))
G.add_edges_from(edges)
for c in nx.connected_components(G):
combined_lists = [coll[i] for i in c]
flat_list = [item for sublist in combined_lists for item in sublist]
print(set(flat_list))
Output:
{'cccc', 'bbbb', 'aaab', 'aaaa', 'abaa'}
{'dddd'}
{'eeef', 'eeee', 'hhhh', 'gggg', 'gggi', 'iiii'}
Undoubtedly this can be optimized but it seems to solve my problem for now.
Post a Comment for "Combining Lists With Overlapping Elements"