Skip to content Skip to sidebar Skip to footer

Python: Extract Gz Files With And Honor Original Filenames And File Extensions

Under a folder, I have many .gz files and within these gz files some are .txt, some are .csv, some are .xml, or some other extensions. E.g. gz (the original/compressed file in()) f

Solution 1:

import gzip
import os


INPUT_DIRECTORY = 'C:\Xiang'
OUTPUT_DIRECTORY = 'C:\UnZipGz'
GZIP_EXTENSION = '.gz'defmake_output_path(output_directory, zipped_name):
    """ Generate a path to write the unzipped file to.

    :param str output_directory: Directory to place the file in
    :param str zipped_name: Name of the zipped file
    :return str:
    """
    name_without_gzip_extension = zipped_name[:-len(GZIP_EXTENSION)]
    return os.path.join(output_directory, name_without_gzip_extension)


for file in os.scandir(INPUT_DIRECTORY):
    ifnot file.name.lower().endswith(GZIP_EXTENSION):
        continue

    output_path = make_output_path(OUTPUT_DIRECTORY, file.name)

    print('Decompressing', file.path, 'to', output_path)

    with gzip.open(file.path, 'rb') as file:
        withopen(output_path, 'wb') as output_file:
            output_file.write(file.read())

Explanation:

  1. Iterate through all files in the folder with the relevant extension.
  2. Generate a path to the new directory without the gzip extension.
  3. Open the file and write its decompressed contents to the new path.

To retrieve the original file name, you can use gzinfo: https://github.com/PierreSelim/gzinfo

>>> import gzinfo
>>> info = gzinfo.read_gz_info('bar.txt.gz')
>>> info.fname
'foo.txt'

References to extract original file name:

Post a Comment for "Python: Extract Gz Files With And Honor Original Filenames And File Extensions"