Skip to content Skip to sidebar Skip to footer

Faster Azure Blob Name Search With Python?

I have a list of file names that I need to search on Azure. Right now as a noob I am looping over each blob names and comparing strings but I think there has to be easiest and fast

Solution 1:

Please use exists method in azure storage python sdk.

defifblob_exists(filename):
    try:
        container_name = '***'

        block_blob_service = BlockBlobService(account_name=accountName, account_key=accountKey,
                                              socket_timeout=10000)

        isExist = block_blob_service.exists(container_name, filename)
        if isExist:
            print("\t Blob exists :" + " " + filename)
        else:
            print("\t Blob exists :" + " " + filename)

Surely, if you have list of filename, you need to loop call above function at least.

Hope it helps you.

Solution 2:

Listing all blobs is very costly operation inside the Azure Storage infrastructure because it translates into a full scan.

Find below an example to efficiently check if the blob (e.g. filename in your case) exists or not in a given container:

from azure.storage.blob import BlockBlobService
from datetime import datetime

defcheck_if_blob_exists(container_name: str, blob_names: []):
    start_time = datetime.now()

    ifnot container_name or container_name.isspace():
        raise ValueError("Container name cannot be none, empty or whitespace.")

    ifnot blob_names:
        raise ValueError("Block blob names cannot be none.")

        block_blob_service = BlockBlobService(account_name="{Storage Account Name}", account_key="{Storage Account Key}")

    for blob_name in blob_names:
        if block_blob_service.exists(container_name, blob_name):
            print("\nBlob '{0}' found!".format(blob_name));
        else:
            print("\nBlob '{0}' NOT found!".format(blob_name));

    end_time = datetime.now()

    print("\n***** Elapsed Time => {0} *****".format(end_time - start_time))

if __name__ == "__main__":
    blob_names = []

    # Exists
    blob_names.append("eula.1028.txt")
    blob_names.append("eula.1031.txt")
    blob_names.append("eula.1033.txt")
    blob_names.append("eula.1036.txt")
    blob_names.append("eula.1040.txt")

    # Don't exist
    blob_names.append("blob1")
    blob_names.append("blob2")
    blob_names.append("blob3")
    blob_names.append("blob4")

    check_if_blob_exists("containername", blob_names)

Find below a screenshot of a quick execution test from my laptop from West US (~150 Mbps of Download, ~3.22 Mbps of Upload, per Google Speed Test) checking if 9 blobs exists in a LRS Storage Account in West US as well.

enter image description here

Post a Comment for "Faster Azure Blob Name Search With Python?"