Join Queries Containing Aggregate Operators



  • Avatar
    Cory Singleton

    Hi Steve, 


    Yes, join is the operator you would need to leverage in order to merge the two datasets you mention. Below are two potential solutions. Depending on what you are looking to do with this query, one may be more appropriate than the other.

    The below approach leverages join on the raw data itself and can be run interactively via search whenever needed. Note because we don't do any pre-aggregation of the data, this query can't be used on dashboards. I have redacted your sourceCategory name in the below example however you can paste the following code in your query window to view the working query: _code=3Lh3p3BNT34UbWJ1pzLIBbzO6Qm8xCGkEdFvvUo1

    _sourcecategory = <redacted> (sale or ping_dupe)
    | join (parse "seller_id=*;" as seller_id
    | parse "price=*" as price
    | count(price) as sales, sum(price) as revenue by seller_id) as query1,
    (parse "duplicate_seller_id=*;" as duplicate_seller
    | count(*) as duplicates by duplicate_seller) as query2
    on query1.seller_id=query2.duplicate_seller
    | fields query1_seller_id, query1_sales, query2_duplicates


    If you need to use this query on a dashboard, for query performance reasons we need to ensure that the join operation happens after an aggregation. In this example, we aggregate first, then join the results after so it can be used on a dashboard. You can view the working search as above by pasting this code: _code=2ePQIcsHaN8M0gCvwV0eilzk6IfVdDXfDI3hCxCV

    _sourcecategory = <redacted> (sale or ping_dupe)
    | parse "seller_id=*;" as seller_id nodrop
    | parse "price=*" as price nodrop
    | parse "duplicate_seller_id=*;" as duplicate_seller_id nodrop
    | if(_raw matches "*sale*","query1","query2") as type // in order for join to work, we need to create this type field so we can properly filter each sub-query
    | count(price) as sales, sum(price) as revenue,count as duplicates by seller_id,duplicate_seller_id,type
    | join (where type="query1" | sum(sales) as sales, sum(revenue) as revenue by seller_id ) as query1,
    (where type="query2" or !isNull(duplicate_seller_id) | sum(duplicates) as duplicates by duplicate_seller_id) as query2
    on query1.seller_id=query2.duplicate_seller_id
    | fields query1_seller_id, query1_sales, query2_duplicates


    Hope this helps, 




    Comment actions Permalink
  • Avatar
    Steve Phillips

    Thanks for the help.


    Comment actions Permalink

Please sign in to leave a comment.