Does anyone know why my query is timing out?

Comments

3 comments

  • Avatar
    Dean

    Hey Smitha,

    I know that we've corresponded on this via a support ticket, but in the interest of transparency for the community, I wanted to share a little more information. When a query is submitted by a user (or run as a scheduled search) the first step is that it is broken out into many "sub queries" that are handled in a massively parallel fashion to find the log messages that match the criteria before the first pipe. This is sort of like map reduce.

    Before the query moves on to the next step of processing the retrieved messages, ALL of the subqueries have to complete and respond. The particular error message you've listed is the timeout of one of those subqueries.

    We are working diligently on improving the resilience of individual subqueries to improve retry capabilities so that searches never fail.

    The second part of the question would be - why would an individual subquery time out? The answer is usually that the search node on which that particular subquery is running may be overloaded or become unresponsive for some reason. We have THOUSANDS of these search nodes (the power of the cloud and multi-tenancy!), but individual nodes do occasionally get into a state where a subquery might time out. We are working hard to improve the resiliency of the individual nodes as well as our monitoring capabilities to recognize that "lazy state" that could cause a timeout so we can address it before any timeouts occur.

    I hope this helps you to understand what's going on behind the scenes.

    0
    Comment actions Permalink
  • Avatar
    Smitha Sriram

    Hi Dean,

    When will this be resolved?

    0
    Comment actions Permalink
  • Avatar
    Dean

    Smitha,

    We are running this query for you broken up into smaller chunks. Going forward, you will be able to run this query against a Scheduled View (which we have created for you) that pre-aggregates some of the data (in real time, the system keeps it current) so that the query you run ad hoc will not have to do the immense in-memory calculations it is doing now. This will allow you to both run successfully for longer time ranges as well as to have the query run very quickly.

    0
    Comment actions Permalink

Please sign in to leave a comment.