Hi @julia.bauer,
My first guess would be that your data contains values/characters that are not possible to transform to Parquet.
For example, null values are allowed in Celonis in string columns, but in Parquet this should be an empty string (" ") (not sure how CSV handles this though).
To test what is going on, you could create a temporal flow that sends you the CSVs via mail. In a Python script, you could manually transform the files that failed to parquet. In this way, you will get more information about what the error is.
A simple command to do so is:
import pandas as pd
df = pd.read_csv('example.csv')
df.to_parquet('output.parquet')
The next step could be to apply functions in your data to replace this empty values, for instance using COALESCE (celonis.com) --> COALESCE("table"."column" , ' ' ).
If there are some characters that are not supported, you could use REMAP_VALUES (celonis.com) to change these.
I hope this helps.
Best regards,
Jan-peter
Hi @julia.bauer,
My first guess would be that your data contains values/characters that are not possible to transform to Parquet.
For example, null values are allowed in Celonis in string columns, but in Parquet this should be an empty string (" ") (not sure how CSV handles this though).
To test what is going on, you could create a temporal flow that sends you the CSVs via mail. In a Python script, you could manually transform the files that failed to parquet. In this way, you will get more information about what the error is.
A simple command to do so is:
import pandas as pd
df = pd.read_csv('example.csv')
df.to_parquet('output.parquet')
The next step could be to apply functions in your data to replace this empty values, for instance using COALESCE (celonis.com) --> COALESCE("table"."column" , ' ' ).
If there are some characters that are not supported, you could use REMAP_VALUES (celonis.com) to change these.
I hope this helps.
Best regards,
Jan-peter
Hello Jan-Peter,
thanks alot for your suggestion!
I contacted the customer support in regards to this issue. They proposed to specify the data types of the columns right as in the standard they are all the type varchar. Do you think that might also be the problem?
I'll take a look at the null values - parquet suggestion, too.
Best regards,
Julia Bauer