Query CSV upload

Comments

8 comments

  • Avatar
    Kevin Keech

    In your case it looks like you are not too concerned about the actual format of the text you want within the field so the "Parse Anchor" operator should work better for you and return a bit more quickly. I can't see the actual text of one of your log messages but based on your original expression the following should get you the same/similar results and should return a bit faster.

     _sourceName=/texto_file.csv | parse "*\",\"*\",\"*\",*,\"*\",\"*\",\"*\",\"*\",\"*\",\"*"  as last, ip, asn, asn_code, city, region, country, host, rdns_domain, trojan | fields ip, asn, trojan, country, rdns_domain | count_distinct (ip) group by rdns_domain

     

    The extract operator can be a bit heavy in this case. "Parse Regex" (extract) is really better suited for defining a specific text string to pull from the logs, such as an IP address, Social Security Number etc... The Parse Regex operator requires additional processing as each message and then each field is matched to the expression, which can add time to the query.

    Additionally, the time of the query can be affected by the length of the timerange being queried and by how selective the query is, ie. how may keywords you use to filterer the messages.

    0
    Comment actions Permalink
  • Avatar
    Marco Filipe

    I'm running the query and it's running for the past 5 hours....

    Your solution, is running too, but from my estimate it's not going to be any faster...

    Thanks anyway

    0
    Comment actions Permalink
  • Avatar
    Kevin Keech

    5 hours is definitely a bit much. How long is your timerange? and about how many messages are being returned? It would be interesting to know how long the results take to come back when removing the parse all together.

     

     

     

    0
    Comment actions Permalink
  • Avatar
    Marco Filipe

    Here is the print screen

    0
    Comment actions Permalink
  • Avatar
    Kevin Keech

    Your query is over a 10 month period so unfortunately it is expected that this would take quite a while to return all your results. You may want to try breaking this down into a few smaller time ranges (30 - 90 days) and run the queries in parallel.

    This length of timerange is actually not our standard use case.  The availability of messages are typically bound by the retention period of the account, (7 days in case of Sumo Logic Free) and as a result most queries tend to fall within the range of the contracted retention period.

     

     

    0
    Comment actions Permalink
  • Avatar
    Marco Filipe

    My logs are just from July the first. If you entered this date for the beginning of the query date, you will get the same results.

    Maybe this product it's not for me. I'm also trying other options to base my future company. And for now your are losing for treasure-data, splunk storm, and others.

    Why don't you treat csv files in fields? Instead of one field for everything.

    0
    Comment actions Permalink
  • Avatar
    Ben Newton

    Marco,

    If your logs are just from the 1st of July, I would enter a smaller time range:

    7/1/2013 - 7/2/2013

    Otherwise you are still searching over 90 days worth of data.

    0
    Comment actions Permalink
  • Avatar
    HMRC Cyber Security

    Might the csv operator be faster for you, as in:

    csv _raw extract 1 as last, 2 as ip, 3 as asn, 4 as asn_code, 5 as city etc.

    Mind, your parse statement is a means to run a field extraction on a csv as you can't use the csv operator in a field extraction and so your syntax is a lesson to me at least :-)

    0
    Comment actions Permalink

Please sign in to leave a comment.