Skip to content Skip to sidebar Skip to footer

Python Iteration Order On A Set

I am parsing two big files (Gb size order), that each contains keys and corresponding values. Some keys are shared between the two files, but with differing corresponding values. F

Solution 1:

Python's dicts and sets are stable, that is, if you iterate over them without changing them they are guaranteed to give you the same order. This is from the documentation on dicts:

Keys and values are iterated over in an arbitrary order which is non-random, varies across Python implementations, and depends on the dictionary’s history of insertions and deletions. If keys, values and items views are iterated over with no intervening modifications to the dictionary, the order of items will directly correspond.

Solution 2:

Iteration over an un-modified set will always give you the same order. The order is informed by the current values and their insertion history.

See Why is the order in dictionaries and sets arbitrary? if you are interested in why that is.

Note that if you want to modify your files in place, then that'll only work if your entries have a fixed size. Files cannot be updated somewhere in the middle where that update consists of fewer or more characters than the characters you replaced.

Data in files is like a magnetic tape, you'd have to splice in longer or shorter pieces to replace data in the middle, but you can't do that with a file. You'd have to rewrite everything following the replaced key-value pair to make the rest fit.

Solution 3:

As already stated out dicts and sets are stable and provide the same order as long as you don't change it. If you want a specific order you can use OrderedDict

From the collections library docs:

>>> from collections import OrderedDict

>>> # regular unsorted dictionary>>> d = {'banana': 3, 'apple':4, 'pear': 1, 'orange': 2}

>>> # dictionary sorted by key -- OrderedDict(sorted(d.items()) also works>>> OrderedDict(sorted(d.items(), key=lambda t: t[0]))
OrderedDict([('apple', 4), ('banana', 3), ('orange', 2), ('pear', 1)])

>>> # dictionary sorted by value>>> OrderedDict(sorted(d.items(), key=lambda t: t[1]))
OrderedDict([('pear', 1), ('orange', 2), ('banana', 3), ('apple', 4)])

>>> # dictionary sorted by length of the key string>>> OrderedDict(sorted(d.items(), key=lambda t: len(t[0])))
OrderedDict([('pear', 1), ('apple', 4), ('orange', 2), ('banana', 3)])

Post a Comment for "Python Iteration Order On A Set"