Monday, July 5, 2010

DBreplay AKA RAT (real application testing)

I have been spending a lot of time lately working with DBReplay.. This is an AWESOME arrow to have in your quiver to test upgrading to 11g.

I have made some intersting observations I want to share..

  • When you start your replay, remember that your cache is "cold", if you started your capture with a warm cache, it will take a bit for the cache to catch up.
  • If you started your capture during processing, you will see a lot of divergence.. This is normal
  • A single query taking longer can make a huge impact on the replay (I will explain in detail below).
  • Replaying twice in a row will not have the same results, but the results should be withing 5% of each other.

So while does a single query make sunch a huge impact ??? It all has to do with how the replay synch all the workload. You may have 100 or more different sessions all acting independently, but Oracle keeps it all in synch by SCN number.. This is good right ? Well, yes, but it can affect the replay. What happens if a query that usually takes .04 seconds, and is followed by an update, takes 5 minutes ? Well the replay gets held up by 5 minutes, because the SCN doesn't move until the query is finished.. Multiply that by12 executions, and you've lost an hour out of your replay.. WOW.

The best suggestion I have is to look at the DB time, the CPU time, and the reads. You may find that "overall" the replay used less DB time, even though it took longer to replay.