Python api hybrid - 'dataPoolId.Pool does not exist'

Hello,

I am investigating the data pool connection via the Python API.

I am following the guide in : tenant.realm.celonois.cloud/documentation/data-push-api-python/

If I then try to enter the datapool according to the documentation in:
tenant.realm.celonois.cloud/documentation/data-push-api-python/
I get the following error:
‘dataPoolId.Pool does not exist’

If I try it via the curl command:
curl -X GET --header “Content-Type: application/jso --header “Authorization: Bearer <YOUR_API_KEY>” https://<YOUR_DOMAIN>.celonis.cloud/integration/api/v1/data-push/<DATA_POOL_ID>/jobs/ -X POST -d '{“targetName”:”<TABLE_NAME>",“type”:“REPLACE”}’

I get the following error:
dataPoolId.Pool id may not be empty

For the data pool id I entered the datapool id of my connection.

In the picture attached you can see that the 1st mark is the data pool id. I took this as my data pool id.

small update:

The push API only works for the “integration”, so not “integration-hybrid”.
Unfortunately once I push something I cannot find my file, moreover, if I try to push something else or try to push it again I get a 400 bad request error.

Hi Paul,

To upload a parquet file or dataframe using the Python API I would recommend following this example:
https://python.celonis.cloud/docs/pycelonis/en/latest/notebooks/02_Pushing_Data.html

In my opinion this is the simplest way of uploading data, it supports Hybrid too. :slight_smile:

Do you think this would be a solution for you?

Best regards,
Simon Riezebos

Hello Simon,

The challenge here is that my Hybrid is not a parquet endpoint, the PUSH API pushes to my understanding Parquet files, this thus means that I cannot find my table anymore, since the Hybrid connector is a JDBC connector which does not support parquet.

Hopes this helps to understand the challenge, let me know if I can push something else then parquet.

Best,
Paul

Hi Paul,

I believe the parquet files are translated to a format that works with your database over JDBC, this is done by the extractor application that is running in your network. We found a small bug in pycelonis when pushing to a hybrid data pool. If you install the latest version of pycelonis (currently 1.1.9) and follow the example shown above it should work:

https://python.celonis.cloud/docs/pycelonis/en/latest/installation.html

Does this solve your problem?

Best regards,
Simon Riezebos

Hello,

I tried pushing with the new API, I received the following error:
HTTPError: File upload problem: The file </tmp/celonis_push_Table_NAME_1574786623.9736688_0.parquet> is not a supported parquet/csv file.
The datamodel I push to is connected to SAP.
Please let me know how I can fix this if possible.

I have a additional question regarding to the push pull topic. When I pull data, I cannot pull a KPI when it has a variable in it. Is there any way to pull KPI’s with a variable in their calculation?

Hi Paul,

It is possible that this has something to do with the data, could you try this example? Assuming there is a data pool object in hybrid_pool.

import logging
logger = logging.getLogger('pycelonis')
logger.setLevel(logging.DEBUG)

import pandas as pd
df = pd.util.testing.makeMixedDataFrame()
hybrid_pool.push_table(df,"DATA_PUSH_TEST")

This also return extra logging which we can use to check what happened to the uploaded file. You can find the id of your data push job at in a request that looks like:

/integration-hybrid/api/v1/data-push/{pool_id}/jobs/{job_id}

If it doesn’t work, could you go back after a few hours and run the following? Here job_id is the id of the job.

hybrid_pool.celonis.api_request(f"{hybrid_pool.url_data_push}{job_id}/chunks"})

Does the output of this show any uploaded files?

Regards,
Simon

1 Like

Hello,

I was checking out the API again, this time with the result coming from the duplicate_checker notebook build by Celonis. When I here try to push Data back to the SAP Database, I found the following error:

file upload problem, not a supported parquet/csv file.
from pycelonis.utils.paquet_utils import read_parquet

So I think sometimes it works when pushing a specific format, but for other formats it doesn’t show any clear errors.