Transform Utilities
A class TransformUtils provides several methods that simplify transformer's implementation. Currently, it includes the following methods:
deep_get_sizeis the method to get the complete size of the Python object based on https://www.askpython.com/python/built-in-methods/variables-memory-size-in-python It supports Python structures: list, tuple and setnormalize_stringnormalizes string, converting it to lowercase and removing spaces, punctuation and CRstr_to_hashconvert string to 259 bit hashstr_to_intgetting an integer representing string by calculating string's hashvalidate_columnscheck whether required columns exist in the tableadd_columnadds column to the table avoiding duplicates. If the column with the given name already exists it will be removed before it is addedvalidate_pathcleans up s3 path - Removes white spaces from the input/output paths removes schema prefix (s3://, http:// https://), if exists adds the "/" character at the end, if it doesn't exist removes URL encoding
It also contains two variables:
RANDOM_SEEDnumber that is used for methods that require seedLOCAL_TO_DISKrough local size to size on disk/S3
This class should be extended with additional methods, generally useful across multiple transformers and documentation should be added here