Skip to content Skip to sidebar Skip to footer

Combining Lists With Overlapping Elements

I have a collection of lists, some of which have overlapping elements: coll = [['aaaa', 'aaab', 'abaa'], ['bbbb', 'bbbb'], ['aaaa', 'bbbb'], ['dddd', 'ddd

Solution 1:

Here's a way to do it (assuming you want unique elements over the overlapping results):

defover(coll):
     print('Input is:\n', coll)
     # gather the lists that do overlap 
     overlapping = [x for x in coll ifany(x_element in [y for k in coll if k != x for y in k] for x_element in x)] 
     # flatten and get unique 
     overlapping = sorted(list(set([z for x in overlapping for z in x]))) 
     # get the rest
     non_overlapping = [x for x in coll ifall(y notin overlapping for y in x)] 
     # use the line bellow only if merged non-overlapping elements are desired# non_overlapping = sorted([y for x in non_overlapping for y in x]) print('Output is"\n',[overlapping, non_overlapping])

coll = [['aaaa', 'aaab', 'abaa'],
        ['bbbb', 'bbbb'], 
        ['aaaa', 'bbbb'], 
        ['dddd', 'dddd'],
        ['bbbb', 'bbbb', 'cccc','aaaa']]
over(coll)
coll = [['aaaa', 'aaaa'], ['bbbb', 'bbbb']]
over(coll)

output:

$ python3 over.py                                                                                                                                                              -- NORMAL --
Input is:
 [['aaaa', 'aaab', 'abaa'], ['bbbb', 'bbbb'], ['aaaa', 'bbbb'], ['dddd', 'dddd'], ['bbbb', 'bbbb', 'cccc', 'aaaa']]
Output is"
 [['aaaa', 'aaab', 'abaa', 'bbbb', 'cccc'], [['dddd', 'dddd']]]
Input is:
 [['aaaa', 'aaaa'], ['bbbb', 'bbbb']]
Output is"
 [[], [['aaaa', 'aaaa'], ['bbbb', 'bbbb']]]


Solution 2:

You can do this with sets using a successive merging approach:

coll = [['aaaa', 'aaab', 'abaa'],
        ['bbbb', 'bbbb'], 
        ['aaaa', 'bbbb'], 
        ['dddd', 'dddd'],
        ['bbbb', 'bbbb', 'cccc','aaaa'],
        ['eeee','eeef','gggg','gggi'],
        ['gggg','hhhh','iiii']]

pooled = [set(subList) for subList in coll]
merging = True
while merging:
    merging=False
    for i,groupinenumerate(pooled):
        merged = next((g for g in pooled[i+1:] if g.intersection(group)),None)
        ifnot merged: continuegroup.update(merged)
        pooled.remove(merged)
        merging = True

print(pooled)
# [{'aaaa', 'abaa', 'aaab', 'cccc', 'bbbb'}, {'dddd'}, {'gggg', 'eeef', 'eeee', 'hhhh', 'gggi', 'iiii'}]

Solution 3:

Working on a suggestion from alkasm in the comments, I used networkx:

import networkx as nx

coll = [['aaaa', 'aaab', 'abaa'],
        ['bbbb', 'bbbb'], 
        ['aaaa', 'bbbb'], 
        ['dddd', 'dddd'],
        ['bbbb', 'bbbb', 'cccc','aaaa'],
        ['eeee','eeef','gggg','gggi'],
        ['gggg','hhhh','iiii']]

edges = []
for i inrange(len(coll)):
    a = coll[i]
    for j inrange(len(coll)):
        if i != j:
            b = coll[j]
            ifset(a).intersection(set(b)):
                edges.append((i,j))

G = nx.Graph()
G.add_nodes_from(range(len(coll)))
G.add_edges_from(edges)

for c in nx.connected_components(G):
    combined_lists = [coll[i] for i in c]
    flat_list = [item for sublist in combined_lists for item in sublist]
    print(set(flat_list))

Output:

{'cccc', 'bbbb', 'aaab', 'aaaa', 'abaa'}
{'dddd'}
{'eeef', 'eeee', 'hhhh', 'gggg', 'gggi', 'iiii'}

Undoubtedly this can be optimized but it seems to solve my problem for now.

Post a Comment for "Combining Lists With Overlapping Elements"