Skip to content Skip to sidebar Skip to footer

Efficiently Populate Scipy Sparse Matrix From Subset Of Dictionary

I need to store word co-occurrence counts in several 14000x10000 matrices. Since I know the matrices will be sparse and I do not have enough RAM to store all of them as dense matri

Solution 1:

You're using LIL matrices, which (unfortunately) have a linear-time insertion algorithm. Therefore, constructing them in this way takes quadratic time. Try a DOK matrix instead, those use hash tables for storage.

However, if you're interested in boolean term occurrences, then computing the co-occurrence matrix is much faster if you have a sparse term-document matrix. Let A be such a matrix of shape (n_documents, n_terms), then the co-occurrence matrix is

A.T * A

Post a Comment for "Efficiently Populate Scipy Sparse Matrix From Subset Of Dictionary"