Hi Celonis Community,
I’m working on optimizing data jobs in Celonis Data Integration for a client with an extremely heavy SAP system (Saturn). Our data jobs currently take around 1.5 hours to process, and the client wants to reduce this to 0.5 hours. Despite optimizing SQL queries (e.g., early filtering, minimizing joins) and disabling unnecessary objects, the massive data volume seems to be the main bottleneck. Here are some examples of the volumes we’re handling:
- SalesOrderScheduleLine Transformation: 119M rows in CDPOS, 59M in CDHDR, cost of 24M.
- CreditMemoItem Transformation: 123M in VBAP, 824M in VBFA, 178M in VBRP, cost of 83M.
- CustomerInvoice Transformation: Similar volumes, 824M in VBFA, 178M in VBRP, cost of 83M.
We’ve explored strategies like incremental loads and partitioning, but the sheer volume from Saturn (e.g., 824M rows in VBFA) still slows us down. I found in the documentation that the Payment Term Checker slows with 100M+ invoices, suggesting infrastructure might be key for such cases. In the Community, I also saw that the Machine Learning Workbench requires dedicated resources for large volumes (e.g., 2B rows), but I’m unsure how this applies to Data Integration.
My question is:
- Has anyone handled similar SAP volumes (e.g., 824M in VBFA) and reduced processing time significantly for a full load execution? If so, how? (Excluding delta solutions, as we’re focusing on full load scenarios.)
Any advice or experiences would be greatly appreciated!
Thanks in advance,
Juan Palencia