Skip to main content

I am trying to replicate the following in Cloud connector extraction job for a table named order_status:

 

create table `order_status_filtered` as(

SELECT

S.order_id

,S.created_dt

from `order_status` S

where exists (select * from `fact_order` O where O.id = S.order_id)

and S.STATUS NOT IN ('Created','Sale Confirmed','Not Started')

);

 

 

I am unable to add the where exists part in the extraction filters.

Is there a way to do it?

You cannot use where exists in extraction filters, suggestion would be to pull the entire table run the same in transformation job and delete the rows that are not required.


Yes, i tried that - but our source has millions of rows - since i need a subset only for around 3 months analysis - i was wondering if we can restrict the data during extraction itself. Its running into hundred of millions of rows. Takes a lot of time also.


I have raised a ticket also. Lets see - but i believe sub queries should be allowed during extraction.


Can't you use a join with fact_order?

 

You can also put some filter in the extraction of order_status to retrieve only those in status Created, Sale Confirmed and Not Started, limiting the amount of records to download

 

Last, it seems you want only a temporal subset. Maybe you can download only some months from order_status, reducing thus the number of records. Then you can do the where exists in transformations, as @Abhishek Chaturvedi said.

 

 


Thank you for the suggestions 😊

Can't you use a join with fact_order? - i did. The default join in extraction is inner join. Inner join is not the exact same as where exists unless you use a distinct in your inner join. As inner join does not take care of duplicates.

You can also put some filter in the extraction of order_status to retrieve only those in status Created, Sale Confirmed and Not Started, limiting the amount of records to download - this i have done already and it did decrease the data a bit.

Last, it seems you want only a temporal subset. Maybe you can download only some months from order_status, reducing thus the number of records. Then you can do the where exists in transformations, as @Abhishek Chaturvedi said. - I will do this as a last resort but as i mentioned zthe number of records in the source systems are in tens of millions of rows. So, was thinking of restricting the amount of data in extraction itself.

 

Let me work on it - will keep updating :)

 


Thank you for the suggestions 😊

Can't you use a join with fact_order? - i did. The default join in extraction is inner join. Inner join is not the exact same as where exists unless you use a distinct in your inner join. As inner join does not take care of duplicates.

You can also put some filter in the extraction of order_status to retrieve only those in status Created, Sale Confirmed and Not Started, limiting the amount of records to download - this i have done already and it did decrease the data a bit.

Last, it seems you want only a temporal subset. Maybe you can download only some months from order_status, reducing thus the number of records. Then you can do the where exists in transformations, as @Abhishek Chaturvedi said. - I will do this as a last resort but as i mentioned zthe number of records in the source systems are in tens of millions of rows. So, was thinking of restricting the amount of data in extraction itself.

 

Let me work on it - will keep updating :)

 

Best luck!


Reply