Skip to main content

Good morning,

 

I am looking for a way to convert a BLOB field (i.e. a long varbinary in vertica sql) into a string, in order to capture a hidden information.

 

Is there a way to do this conversion? I am not finding anything in the documentation, except for the question in https://www.celopeers.com/s/question/0D50700000lQGtPCAW/converting-from-binary-to-string-in-ems?language=en_US .

 

Thanks

Hi,

 

Oh, my old answer! I'll give more specifics and links this time. Sorry to hear it wasn't implemented after 2 years, however that's how you can use ML workbench instead of waiting.

 

A) Get the data into ML workbench (https://celonis.github.io/pycelonis/2.6.0/tutorials/executed/02_data_integration/03_data_pull/)

 

B) Use function bits2a provided by Josh Lee (https://stackoverflow.com/questions/9916334/bits-to-string-python). I can see it do the trick, not sure about the performance but sure it's good point to start.

 

Simple implementation of that function to be used with Pandas Data Frame (default form of table you'll get from Celonis). You can also replace column, not creating a new one.

 

Test:image 

code:

# Import pandas library

import pandas as pd

 

# Function converting bitstring

def bits2a(b):

   return ''.join(chr(int(''.join(x), 2)) for x in zip(*xiter(b)]*8))

 

# initialize list of lists

data = ><'tom', '0110000101100010'], 0'nick', '0110000101100010'], 0'juli', '0110000101100010']]

 

# Create the pandas DataFrame

df = pd.DataFrame(data, columns=t'Name', 'bitstring'])

 

dfp'string'] = dfn'bitstring'].apply(bits2a)

 

# print dataframe.

print(df)

 

 

C) Load data back (https://celonis.github.io/pycelonis/2.6.0/tutorials/executed/02_data_integration/02_data_push/).

 

Remember that if you want to process large quantities of data using those methods make sure you're limiting processing power needed (f.e. splitting the data, deduplicating, removing empty rows etc.) and making whole process reliable by using chunks/parquet files with pycelonis data import/export.

 

Best Regards,

Mateusz Dudek


Reply