Power 7 - Local, Near & Far POWER7 Affinity Nine Conclusions



Very interesting and useful notes on SRADID
(Scheduler Resource Affinity Domain IDentifier) 

An SRAD is a group of resources but in our case CPU/cores and the associated memory that is directly attached to it.

I thought I should summarize the long eleven part Local, Near & Far POWER7 Affinity series.

mohisrv:user3:>lssrad -av
REF1   SRAD        MEM              CPU
0           
             0                8122.05           0-3 8-11 16-19 24-27
1          
             1                7221.00           4-7 12-15 20-23 28-31

  • REF1 means the first node that the Virtual machine is on
    • Small machine = motherboard and only ever REF0,
    • Power 770/780 = the four CEC drawers of the large configuration machine (REF1ranges from  0 to 3) and on the
    • Power 795 = the maximum 8 CPU books (REF1 ranges from  0 to 8).
    • Why REF1? REF is short for Reference but beyond that is mystery to me!
  • SRAD means the Scheduler Resource Affinity Domains and are the groups of processors and memory numbered from zero.
  • MEM means memory (of course) and it is not obvious but this is reported in Megabytes!
  • CPU means the logical CPU numbers.
In this above example,
 SRAD0 = ~8G Memory and 16 logical CPUs (Assume SMT4 hence (16 Logical CPUs/ SMT4= 4 physical CPU/Cores))
 SRAD1 = ~7G Memory and 16 logical CPUs (Assume SMT4 hence (16 Logical CPUs/ SMT4= 4 physical CPU/Cores))
1) Placement:
  • Find out the layout of your boxes CPU and RAM and if the RAM is evenly distributed across available DIMMs
  • Find out the placement of your Virtual Machines (LPARs) with lssrad -av  - or - topas -M
2) SMT4:
  • Expect POWER7 SMT4 CPU use to “look” different
  • POWER5 & 6 have two equal threads
  • POWER7 shuts down threads 3 & 4 and even thread 2 - when there is not enough processes running.
3) Entitlement:
  • Only set minimum Entitlements, if you are forced to due to very high consolidation of lots of small workloads.
  • Otherwise, set Entitlement based on regular monitoring for typical physical CPU use or for production on regular CPU peaks.
  • This stops the virtual machine unnecessarily getting forced off the CPU as Entitlement is used up.
4) Virtual Processor:
  • Set just a little larger than Entitlement.
  • Recommended 1 or 2 CPUs - unless it is a very large virtual machine in which case add only a few more.
  • Note: a high VP means unnecessary spreading across the machine which is best avoided for efficiency.
5) Start the important virtual machines first:
  • On machine reboot, start the larger important virtual machines first.
  • The HMC start LPAR with the system feature helps and HMC system profiles.
  • This means they get better placement with local CPU and RAM and don't have to work around other ones.
6) To "unstick" virtual machine (LPAR) placement:
  • If you restart a virtual machine as in AIX "shutdown -fr" or stop & restart it on the HMC then the virtual machine will probably get exactly the same CPUs and RAM placement as it was last running. This feature helps consistency i.e the performance does not unexpectedly go up or down due to the restart.
  • If you want the Hypervisor to "rethink" the placement, you can force it to forget the older placement:
1.      On the HMC add a new tiny LPAR profile for this virtual machine with Entitlement=0.1, VP=1 & RAM=1GB and no virtual or physical adapters.
2.      Start the virtual machine with this tiny profile - it will fail to boot as it has no disks but no harm done.
3.      Shut it down and then start it will the regular profile.
4.      The Hypervisor will find the virtual machine is a lot larger (than the tiny profile) and will decide the best place of it.
7) Drastic DLPAR changes can effect placement
  • If you regularly start/stop virtual machines and change their sizes a lot - keep an eye on the placement. It can get messy if you are using the bulk of your machine (and you should be!).
  • If you get a chance to cold start a now fragmented virtual machine - See 6) Unstick
8) System Firmware currency
  • Lots has been learnt in the first 18 months of POWER7 in areas like performance optimisation, RAS and, yes, a few bug fixes.
  • To get the benefits of these Hypervisor fixes, you have to update the System Firmware.
  • You may find machines purchase from mid-2011 onwards already are running the later Firmware.
  • The October 2011 new "C" models ship with a later 740 Firmware, which includes the fixes.
  • For earlier shipped machines, you are strongly recommended to upgrade to Firmware 730.
    • If you have Capacity Upgrade on Demand there is a bonus with improved VM placement with 730.
    • If you can't take the outage soon then at least update to 720_101 as a minimum for the fixes.
9) AIX currency
  • This was not covered by the original POWER7 affinity series but I thought I should add a comment or two as this effect performance optimization for POWER7 Affinity and related areas.
  • As with System Firmware, the same is true with AIX levels: there is better optimization and there are a few fixes for particular circumstances and taxing high performance workloads.
  • If you are on AIX 5.2 (oh dear me!) take a look at Versioned WPARs for AIX 5.2 for a serious performance boost and ditching that old hardware.
  • If you are on AIX 5.3 (tut tut) upgrade to AIX 6.1 as soon as possible (before April 2012) or look at Versioned WPARs for AIX 5.3 soon - either way for multi-threaded workloads on POWER7 it should be faster due to SMT=4.
  • For AIX 6.1 (cool) you should be on TL4 as a minimum and the latest service packs (obviously)
    • There are a few performance related APARs for AIX 6.1 TL5, TL6 & TL7 for specific workloads. Most users will not experience them. These should only be added after working with AIX Support via a PMR and perfPMR analysis. These are in the SRAD load balancing and CPU folding threshold calculation area.
  • If you are on AIX 7 (well done) but still worth add the latest TL1 and service packs

I hope this helps you get the maximum from your POWER7 hardware and AIX operating system, thanks, Nigel Griffiths.


For more help:
https://www.ibm.com/developerworks/community/blogs/aixpert/entry/local_near_far_memory_part_2_virtual_machine_cpu_memory_lay_out3?lang=en

No comments:

Post a Comment