Skip to content Skip to sidebar Skip to footer

Is There Any Hash Function Which Have Following Properties

I want a hash function which is fast, collision resistant and can give unique output. The primary requirement is - it should be persist-able i.e It's progress(hashing progress) cou

Solution 1:

Because of the pigeonhole principle no hash function can generate hashes which are unique / collision-proof. A good hashing function is collision-resistant, and makes it difficult to generate a file that produces a specified hash. Designing a good hash function is an advanced topic, and I'm certainly no expert in that field. However, since my code is based on sha256 it should be fairly collision-resistant, and hopefully it's also difficult to generate a file that produces a specified hash, but I can make no guarantees in that regard.


Here's a resumable hash function based on sha256 which is fairly fast. It takes about 44 seconds to hash a 1.4GB file on my 2GHz machine with 2GB of RAM.

persistent_hash.py

#! /usr/bin/env python''' Use SHA-256 to make a resumable hash function

    The file is divided into fixed-sized chunks, which are hashed separately.
    The hash of each chunk is combined into a hash for the whole file.

    The hashing process may be interrupted by Control-C (SIGINT) or SIGTERM.
    When a signal is received, hashing continues until the end of the 
    current chunk, then the file position and current hex digest is saved
    to a file. The name of this file is formed by appending '.hash' to the 
    name of the file being hashed.

    Just re-run the program to resume hashing. The '.hash' file will be deleted 
    once hashing is completed.

    Written by PM 2Ring 2014.11.11
'''import sys
import os
import hashlib
import signal

quit = False

blocksize = 1<<16# 64kB
blocksperchunk = 1<<10

chunksize = blocksize * blocksperchunk

defhandler(signum, frame):
    global quit
    print"\nGot signal %d, cleaning up." % signum
    quit = Truedefdo_hash(fname):
    hashname = fname + '.hash'if os.path.exists(hashname):
        withopen(hashname, 'rt') as f:
            data = f.read().split()
        pos = int(data[0])
        current = data[1].decode('hex')
    else:
        pos = 0
        current = ''

    finished = Falsewithopen(fname, 'rb') as f:
        f.seek(pos)
        whilenot (quit or finished):
            full = hashlib.sha256(current)
            part = hashlib.sha256()
            for _ in xrange(blocksperchunk):
                block = f.read(blocksize)
                if block == '':
                    finished = Truebreak
                part.update(block)

            full.update(part.digest())
            current = full.digest()
            pos += chunksize
            print pos
            if finished or quit:
                break

    hexdigest = full.hexdigest()
    if quit:
        withopen(hashname, 'wt') as f:
            f.write("%d %s\n" % (pos, hexdigest))
    elif os.path.exists(hashname):
        os.remove(hashname)    

    return (not quit), pos, hexdigest


defmain():
    iflen(sys.argv) != 2:
        print"Calculate resumable hash of a file."print"Usage:\npython %s filename\n" % sys.argv[0]
        exit(1)

    fname = sys.argv[1]

    signal.signal(signal.SIGINT, handler)
    signal.signal(signal.SIGTERM, handler)

    print do_hash(fname)


if __name__ == '__main__':
    main()

Post a Comment for "Is There Any Hash Function Which Have Following Properties"