mrskew — Your Trace File Data Miner

Meet “Mister skew,” the ultimate mining tool for Oracle extended SQL trace data. Of all the trace files you’ve collected, which SQL statement is your biggest time consumer? In which files does that statement make the biggest response time contribution? In your whole directory of trace files, which line of trace data describes your longest database disk read duration? Do you have a hot file? A hot block? Which hash latch address is responsible for your worst cache buffers chains latch contention? This information is all in your extended SQL trace files, and with mrskew, you’ll be able to find it. When you use mrskew, you’ll ask better questions about performance, and—finally—you’ll be able to answer them.

mrskew is a member of the MR Tools suite of products.

Buy Now

Guided Tour

Let's jump right in by considering a common question: How are your disk reads performing? Here's the answer, mrskew style:

$ mrskew --name='db.*read' *.trc
 RANGE {min <= e < max}                  DURATION       CALLS         MEAN          MIN          MAX
-----------------------  ------------------------  ----------  -----------  -----------  -----------
   0.000000    0.000001           0.000000   0.0%        1147     0.000000     0.000000     0.000000
   0.000001    0.000010           0.000346   0.0%          40     0.000009     0.000007     0.000010
   0.000010    0.000100           0.008532   0.0%         139     0.000061     0.000011     0.000100
   0.000100    0.001000           0.280023   1.5%         766     0.000366     0.000102     0.000977
   0.001000    0.010000           0.990533   5.3%         222     0.004462     0.001112     0.009866
   0.010000    0.100000          17.320809  92.0%        1005     0.017235     0.010000     0.080000
   0.100000    1.000000           0.220000   1.2%           2     0.110000     0.110000     0.110000
   1.000000   10.000000           0.000000   0.0%           0                                       
  10.000000  100.000000           0.000000   0.0%           0                                       
 100.000000 1000.000000           0.000000   0.0%           0                                       
1000.000000    Infinity           0.000000   0.0%           0                                       
-----------------------  ------------------------  ----------  -----------  -----------  -----------
              TOTAL (7)          18.820243 100.0%        3321     0.005667     0.000000     0.110000
$

What you see here is a histogram of individual call response times, computed across about 17MiB  of trace data in the directory where I ran this command. Execution time was a little over 2 seconds. In the output, you can see exactly how many such calls there were, and you can see the response times for those calls. In this particular directory, there are lots of calls with durations between 0.010 seconds and 0.100 seconds. That's oddly high, with modern hardware, for single-block read calls. But the pattern db.*read matches both Oracle db file scattered read calls and db file sequential read calls. Of course, db file scattered read calls are multi-block reads. So, what I really want to see is the distribution of read call times grouped by the size, in number of Oracle blocks read—per OS read call. Here's how to do that with mrskew:

$ mrskew --name='db.*read' --group='$p3' *.trc
'p3' VALUE                  DURATION       CALLS         MEAN          MIN          MAX
----------  ------------------------  ----------  -----------  -----------  -----------
       128           8.660000  46.0%         374     0.023155     0.010000     0.050000
         1           4.073917  21.6%        1285     0.003170     0.000000     0.110000
         8           0.595558   3.2%         649     0.000918     0.000000     0.066967
        39           0.220000   1.2%          18     0.012222     0.010000     0.020000
         6           0.153548   0.8%          57     0.002694     0.000000     0.010000
 84 others           5.117220  27.2%         938     0.005455     0.000000     0.040000
----------  ------------------------  ----------  -----------  -----------  -----------
TOTAL (89)          18.820243 100.0%        3321     0.005667     0.000000     0.110000
$

Here you can see that most of our reading time was consumed by 128-block reads, which had an average latency of 0.023 seconds per read. That's not bad at all for reads of that size. In fact, it's fantastic. The mean latency for single-block reads is a mere 0.003 seconds, which is also excellent.

Let's explore a little bit. I wonder if there's a database file that's particularly hot:

$ mrskew --name='db.*read' --group='$p1' --glabel='FILE ID' *.trc
   FILE ID                  DURATION       CALLS         MEAN          MIN          MAX
----------  ------------------------  ----------  -----------  -----------  -----------
        38          14.150000  75.2%        1376     0.010283     0.000000     0.050000
         6           1.853270   9.8%         875     0.002118     0.000000     0.110000
         1           1.394625   7.4%         160     0.008716     0.000007     0.047417
       224           0.779243   4.1%         339     0.002299     0.000000     0.016600
         7           0.213949   1.1%         357     0.000599     0.000030     0.066967
 35 others           0.429156   2.3%         214     0.002005     0.000000     0.030000
----------  ------------------------  ----------  -----------  -----------  -----------
TOTAL (40)          18.820243 100.0%        3321     0.005667     0.000000     0.110000
$

Yep: file #38 accounted for 75.2% of all read time for these trace files. Want to know if you've got hot blocks? With mrskew, that's easy, too:

$ mrskew --name='db.*read' --group='"$p1:$p2"' --glabel='BLOCK ID' *.trc
BLOCK ID                      DURATION       CALLS         MEAN          MIN          MAX
------------  ------------------------  ----------  -----------  -----------  -----------
6:11427                0.110000   0.6%           1     0.110000     0.110000     0.110000
6:1524                 0.110000   0.6%           1     0.110000     0.110000     0.110000
38:73803               0.100000   0.5%           2     0.050000     0.050000     0.050000
6:10569                0.080000   0.4%           1     0.080000     0.080000     0.080000
1:737                  0.071930   0.4%           2     0.035965     0.035965     0.035965
2343 others           18.348313  97.5%        3314     0.005537     0.000000     0.070000
------------  ------------------------  ----------  -----------  -----------  -----------
TOTAL (2348)          18.820243 100.0%        3321     0.005667     0.000000     0.110000
$

Here you get a glimpse of something powerful: you can use full Perl expression syntax in the --group expression. In this example, I grouped by the concatenation of the $p1 value, a colon character, and the $p2 value for each call.

Let's look at some other examples. Maybe an ADDM report has alerted you that latch activity is a prominent time-consumer on your system. You have a whole directory full of trace data for key Oracle sessions, but you want to know which file represents the greatest contribution of latch activity to total response time. Here's the mrskew query to give you the answer:

$ mrskew --name='latch.*' --group='$f' --glabel='FILE' *.trc
FILE                                           DURATION       CALLS         MEAN          MIN          MAX
-----------------------------  ------------------------  ----------  -----------  -----------  -----------
80.trc                                  0.748038  97.8%         117     0.006393     0.000002     0.078776
2020-nostamp.trc                        0.005987   0.8%          19     0.000315     0.000008     0.002533
2020.trc                                0.005987   0.8%          19     0.000315     0.000008     0.002533
edwp1_ora_4774_mfl_sq_2ko.trc           0.004941   0.6%           3     0.001647     0.000052     0.004691
-----------------------------  ------------------------  ----------  -----------  -----------  -----------
TOTAL (4)                               0.764953 100.0%         158     0.004841     0.000002     0.078776
$

Once you know the files on which to focus, then of course you can use mrskew to drill into all the details: which latch names (the $name value)? which hash latch addresses (the $p1 value)? which latch id (the $p2 value)? ...and so on.

Maybe you want to know which SQL statement contributes the most response time within all the trace files in a given directory. Easy. Here's the query:

$ mrskew --name='.*' --group='$hv' *.trc
'hv' VALUE                                            DURATION       CALLS         MEAN          MIN          MAX
------------------------------------  ------------------------  ----------  -----------  -----------  -----------
2343063137                                  2310.364850  47.9%          72    32.088401     0.000000   292.974960
2723039186                                   412.306477   8.6%         466     0.884778     0.000000   187.650000
864012087                                    400.387185   8.3%        1468     0.272743     0.000000   120.000000
#2 clxdos.trc:18                             292.975240   6.1%           1   292.975240   292.975240   292.975240
#2 clxpp_ora_2502664.trc:18                  292.975240   6.1%           1   292.975240   292.975240   292.975240
5780 others                                 1109.898507  23.0%      132835     0.008355     0.000000   107.013169
------------------------------------  ------------------------  ----------  -----------  -----------  -----------
TOTAL (5785)                                4818.907499 100.0%      134843     0.035737     0.000000   292.975240
$

In which files does that statement contribute the most response time?

$ mrskew --name='.*' --group='$f' --where='$hv eq 2343063137' *.trc
'f' VALUE                              DURATION       CALLS         MEAN          MIN          MAX
---------------------  ------------------------  ----------  -----------  -----------  -----------
clxpp_ora_2502664.trc        1155.182425  50.0%          36    32.088401     0.000000   292.974960
clxdos.trc                   1155.182425  50.0%          36    32.088401     0.000000   292.974960
---------------------  ------------------------  ----------  -----------  -----------  -----------
TOTAL (2)                    2310.364850 100.0%          72    32.088401     0.000000   292.974960
$

mrskew is great fun to demo because it can do so many things that we need all the time, that—until now—we've never been able to find out. Here's another query that might pique your interest. It shows which database calls that you traced in this directory contributed the most to your response time:

$ mrskew --name=dbcall --group='$nam' *.trc
'nam' VALUE                  DURATION       CALLS         MEAN          MIN          MAX
-----------  ------------------------  ----------  -----------  -----------  -----------
FETCH               556.715272  79.9%       10294     0.054082     0.000000   187.650000
EXEC                128.357701  18.4%       16747     0.007665     0.000000   100.000000
PARSE                12.035198   1.7%       12488     0.000964     0.000000     0.430000
UNMAP                 0.000000   0.0%           1     0.000000     0.000000     0.000000
CLOSE                 0.000000   0.0%           2     0.000000     0.000000     0.000000
-----------  ------------------------  ----------  -----------  -----------  -----------
TOTAL (5)           697.108171 100.0%       39532     0.017634     0.000000   187.650000
$

We've been using mrskew for over a year internally, and we haven't yet found any end to the possibilities. It's effectively a structured query language for trace data that lets you slice and dice the response times accounted for in your trace files, quickly and in just about any way imaginable. mrskew is the fastest, most powerful way to analyze Oracle Database response times for your business tasks, even if you've never looked at an Oracle trace file in your life.

Buy Now

Documentation

Click here to see the manual page for mrskew