Finding Common Id's (intersection) In Two Dictionaries
Solution 1:
Pay attention to this:
output.write(dictB[key][0]+'\t'+dictA[key][1]
It means you print file2 first column than file1 second column. It doesn't correspond with your examples and desired output.
As for intersection routine, it looks quite correct, so probably it's something wrong with your file. Are you sure all keys are unique? What do you mean by "reduce to 150" - do you mean just deleting some lines from this very file.
Also better replace
for key in set(dictA).intersection(dictB):
with
forkeyin dictA:
ifkeyin dictB:
It's actually the same, but should be faster and spends less memory.
Solution 2:
You shall narrow your problem and play a bit on testing. I will not detail on using testing frameworks and show you, how to use assert
.
assert
has two parameters, first is expression, which is expected to be true.
Second one is optional and shall contain positively expressed assumption of what is expected to be true.
Here is modified example with these tests:
fileA_txt = """contig17 GRMZM2G052619_P03 x x x x x x x x x x x x x x
contig33 AT2G41790.1 x x x x x x x x x x x x x x
contig98 GRMZM5G888620_P01 x x x x x x x x x x x x x x
contig102 GRMZM5G886789_P02 x x x x x x x x x x x x x x
contig123 AT3G57470.1 x x x x x x x x x x x x x x
"""# or read it from file#with open("filaA.txt") as f:# fileA_txt = f.read()
fileB_txt = """y GRMZM2G052619_P03 y y y y y y y y
y GRMZM5G888620_P01 y y y y y y y y
y GRMZM5G886789_P02 y y y y y y y y
"""# or read it from file#with open("filaB.txt") as f:# fileB_txt = f.read()
dictA = dict()
for line1 in fileA_txt.splitlines():
listA = line1.split()
dictA[listA[1]] = listA
assertlen(dictA) == 5, "fileA_txt shall contain 5 unique IDs"
dictB = dict()
for line1 in fileB_txt.splitlines():
listB = line1.split()
dictB[listB[1]] = listB
assertlen(dictB) == 3, "fileA_txt shall contain 3 unique IDs"
common_IDs = set(dictA).intersection(dictB)
assertlen(common_IDs) == 3, "there shall be just 3 common keys"
You shall than play with your files and narrow down what is working and what not.
Simply replace the fileA_txt (or the alternative, reading it from a file) with other files, which were surprising you before.
Add more asserts, if you find assumptions you may expect (like if your files have always the same number of lines and unique ids, test it, the code would have to be modified)
Keep running your script until assert exceptions appear.
Post a Comment for "Finding Common Id's (intersection) In Two Dictionaries"