Analyzing app version error rates over time
Hello,
Im having trouble bridging the gap between what I want to do and what I can figure out how to piece together.
My goal is ultimately a (multi) line graph that shows a per day basis error rate of each version of my app. Im trying to detect non user generated networking events and bad network retry logic during the testing phase of development by being able to look at the error rate of past versions and all the newest version
Right now I've got
_sourceCategory=app-acceptance _index=i_app_edge*
| parse "Android/MyApp/*/Core" as version
| parse regex "(?:HTTP\/1\.1\")(?<error_code>\s\d{3} )"
| trim(error_code) as status_code
| status_code matches "2*" ? "success" : "failure" as status
| count(status) by version, status
This returns to me a table of results that includes the version, and the number of times there was an error and the number of times everything was OK such as below:
# | version | status | _count |
1 | 1.7.51.5 | success | 1 |
2 | 1.7.49-13 | success | 2 |
3 | 1.7.53.1 | failure | 2 |
4 | 1.7.53.1 | success | 61 |
5 | 1.7.51.17 | failure | 1 |
6 | 1.7.45-21 | success | 18 |
7 | 1.7.47-22 | success | 93 |
8 | 1.7.51.16 | success | 73 |
9 | 1.7.50-61 | success | 10 |
10 | 1.7.51.14 | success | 32 |
11 | 1.7.51.19 | success | 1,585 |
12 | 1.7.51.20 | failure | 2 |
13 | 1.7.51.10 | success | 28 |
14 | 1.7.49-27 | success | 92 |
15 | 1.7.51.19 | failure | 6 |
16 | 1.7.51.17 | success | 307 |
17 | 1.7.50-21 | success | 368 |
18 | 1.7.51.20 | success | 48 |
What I would like to do now is calculate the error rate on a per version basis using something like (failure/(success+failure)) * 100). Then I'd like to slice the data on a per day basis. But its not clear to me where I should stick the | timeslice 1d.
I came sort of close with this query:
(_sourceCategory=app-acceptance _index=i_app_edge*)
| timeslice 15m
|fillmissing timeslice(15m)
| parse "HTTP/1.1\" * " as status_code
| parse "Android/MyApp/*/Core" as Version
| count by appVersion, status_code, _timeslice
| transpose row _timeslice column status_code, Version
But this returns the number of times each specific error code was seen everyday for each version, and I want to calculate failure rate (where everything 200 = OK and everything !200 = failed) for each version each day.
Any help ya'll could provide would be appreciated, been sitting here trying to fuzz this for hours now.
-
Hi Ethan,
You could use the following logic to generalize the error codes and then count the successes vs. the failures.
| if(status_code = "200", 1, 0) as success
| if(!(status_code = "200"), 1, 0) as failure
| sum(success) as successes, sum(failure) as failures by version, _timeslice./Jay Schwegler
Please sign in to leave a comment.
Comments
1 comment