Problem:
Percentile operation does not generate expected results. When I compare the percentile results to other tools I am seeing different results. How does Sumo Logic calculate the percentile?
Resolution:
Sumo Logic uses the actual universal formula of percentile. It maps to the actual values and uses the approximate algorithm if the dataset has a very large number of values.
The percentile operator works in two ways:
- The operator returns exact percentiles at under 1,000 data points.
- At over 1,000 data points, the pct operator automatically switches to the t-digest algorithm for approximate results. This approximation is more accurate near the extremes (e.g., 99th and 1st percentiles) and less accurate closer to the median.
Other tools might interpolate the percentile value if it lies between two values in the dataset and doesn't necessarily map to actual values.
For example, If you have below twelve values
44,58,59,71,59,55,52,27,60,34,47,39
And we want to find the below percentiles
98th percentile
95th percentile
As per the global percentile formula (which Sumo Logic follows) the calculation would be performed as:
Step 1. Arrange the data in ascending order: 27, 34, 39, 44, 47, 52, 55, 58,59, 59, 60, 71
Step 2. Compute the position of the pth percentile (index i):
i = (p / 100) * n), where p = 98 and n = 12
i = (98 / 100) * 12 = 11.76
Step 3. The index i is not an integer, round up. (i = 12) ⇒ the 98th percentile is the value in 12th position, or 71
Answer: the 98th percentile is 71
Similarly for 95th percentile
Step 1. Arrange the data in ascending order: 27, 34, 39, 44, 47, 52, 55, 58,59, 59, 60, 71
Step 2. Compute the position of the pth percentile (index i):
i = (p / 100) * n), where p = 95 and n = 12
i = (95 / 100) * 12 = 11.4
Step 3. The index i is not an integer, round up. (i = 12) ⇒ the 95th percentile is the value in 12th position, or 71
Answer: the 95th percentile is 71
Below is the percentile results in Sumo Logic for the same set of data.
Comments
0 comments
Please sign in to leave a comment.