Thursday, October 14, 2010

Performance Analysis and Optimization of MS Windows NT Server, Part 1

What Is Server Analysis and Optimization?

Cc750282.pp407a(en-us,TechNet.10).gif
Server analysis and optimization is knowing what actions to take to improve system performance in response to demands on the system.
Server analysis and optimization begins with thoughtful and organized record keeping. This is done for the purposes of analyzing resource use to determine the future demands on the system. Server analysis involves looking for the overuse of any hardware resource that causes a decrease in system performance. It is also looking for the residual effect of bottlenecks: other hardware resources that are underused.
Server analysis and optimization involves:
  • Creating a baseline of current use.
  • Monitoring use over a period of time.
  • Analyzing data to find and resolve abnormalities in the system use.
  • Determining expected response times for specific numbers of users and system use.
  • Determining how the system should be used.
  • Determining when to upgrade the system, or when to add additional system resources.
A properly implemented server analysis and optimization strategy includes the tools and techniques to accomplish the monitoring and analysis of a system.

Windows NT Server Resources to Monitor

Cc750282.pp407b(en-us,TechNet.10).gif
Server analysis and optimization begins by determining the ceiling throughput (for example, interactions per second) of each system resource as it is installed on the system and the network. Determining the throughput during installation establishes the allowable throughput for each resource as it is used.
A number of system resources need to be monitored when implementing a server analysis and optimization strategy. The following resources often have the most impact on server performance:
  • Memory
  • Processor
  • Disk subsystem
  • Network subsystem
When monitoring system resources, it is important to monitor not only each resource individually, but also the system as a whole. By monitoring the entire system, it is easier to detect problems that are a result of resource combinations. The use of one system resource can affect the performance of another, thus masking the usage and performance of the second resource. For example, when the disk subsystem is extremely busy, it is very common for it to fail to perform to the expected level. This failure to perform may result from a system that does not have enough RAM. The lack of adequate RAM may then result in excess paging, which lowers the disk subsystem throughput in response to system and user requests. Monitoring all four system resources provides a much clearer look at the effects that resource combinations have on each other.

Memory

Cc750282.pp407c(en-us,TechNet.10).gif
Consider two main types of memory when you analyze server performance: random-access memory (RAM), and cache. Simply put, the more of each, the better.
Also, consider other factors, such as the size and location of the paging file. For example, it is generally recommended to move the paging file from the system partition to another position for performance. However, this eliminates the Crashdump utility.

Processor

Cc750282.pp407d(en-us,TechNet.10).gif
The type of system processor, as well as the number of processors, affects the overall performance of the system. For example, a Digital Alpha XP processor can provide better performance than an Intel 80486.
Windows NT Server supports symmetric multiprocessing so that if a system has multiple applications running concurrently, or applications that are multithreaded, the overall processor power is shared.

Disk Subsystem

Cc750282.pp407e(en-us,TechNet.10).gif
Several factors affect disk subsystem performance, and each of these factors should be taken into consideration during analysis and optimization.
Type and Number of Controllers
The controller type and the number of controllers affect the overall system responsiveness when responding to information requests being read from or written to disk drives. Installing multiple disk controllers can result in higher throughput. Note the throughput of the following controllers:
  • IDE controllers have a throughput of about 2.5 MB per second.
  • Standard SCSI controllers have a throughput of about 3 MB per second.
  • SCSI-2 controllers have a throughput of about 5 MB per second.
  • Fast SCSI-2 controllers have a throughput of about 10 MB per second.
  • PCI controller cards can transfer data at up to 40 MB per second.
Busmaster Controllers
Busmaster controllers have an on-board processor that handles all interrupts until data is ready to be passed to the CPU for processing. This helps the processor to avoid interruptions for data.
Caching
Caching helps improve disk responsiveness as data is cached on the controller and does not require RAM or internal cache.
Controllers That Support RAID
Controllers that support hardware-level RAID (Redundant Array of Inexpensive Disks) can offer better performance than software implemented RAID. By implementing striping or striping with parity, disk performance may be improved. In one test, for example, writing a 200MB file to a stripe set (without parity) was 20 percent faster than writing to a single hard disk drive in the same system. The same result may not occur in all tests, so each system needs to be analyzed independently.
The Type of Work Being Performed
If the applications are disk-bound (many read and write requests), implementing the fastest disk subsystem provides the best performance.
For single processor systems, implementing a Fast SCSI-2 as the minimum controller base is generally recommended.
The Type of Drives Implemented
Disk performance is generally measured in disk access time. It is not uncommon to find hard disk drives with access speed in the low teens or lower.
Implement drives that complement the rest of the architecture, such as the controller. Choose a manufacturer that supplies the fastest drive available in each of its systems.

Network Subsystem

Cc750282.pp407f(en-us,TechNet.10).gif
Overall network performance and capacity may be affected by a number of factors. Consider each factor in its unique environment to determine whether or not it has an impact on network capacity and server performance.
Network Adapter Type
Implement a high bandwidth card (such as a 32-bit bus mastering card), trying to avoid programmed input/output (PIO) adapters, as they use the CPU to move data from the network adapter to RAM. Note the example transfer speeds of the following adapters:
  • 8-bit network adapters transfer up to 400 kilobytes per second (KBps).
  • 16-bit adapters transfer up to 800 KBps.
  • 32-bit adapters transfer up to 1.2 megabytes per second (MBps).
Multiple Network Adapters
Installing multiple network adapters is beneficial in a server environment because doing so allows the server to process network requests over multiple adapters simultaneously. If your network uses multiple protocols, consider placing each protocol on a different adapter. It is common to have all server-based traffic on a single adapter, for example, while performing host access using SNA on a different adapter.
Number of Users
Consider not only the number of users concurrently accessing a server, but also the number of inactive connections, because monitoring each connection requires processing time on the server.
Routers, Bridges, and Other Physical Network Components
Routers, bridges, and other physical network components affect performance of the network, as do data communications facilities.
Protocols in Use
Most protocols give similar performance, so consider the amount of traffic generated to perform a given function. Reducing the number of protocols installed can increase performance.
Additional Network Services in Use
Each service adds memory and processor overhead on the system. These services may include the following:
  • Services for Macintosh
  • RAS
  • DHCP
  • WINS
Applications in Use
Each application adds memory and processor overhead on the system. These applications may include the following:
  • Internet Services
  • Messaging applications
  • Microsoft® SQL Server™
  • Microsoft Systems Management Server
  • Microsoft SNA Server
Directory Services (Domain Model and Structure)
The following may affect network capacity and performance:
  • Number of users—Consider not only the number of users and objects in the domain, but the number of simultaneous logon requests validated by the domain controller or controllers.
  • Number of Backup Domain Controllers (BDCs)—The more domain controllers in a domain, the more domain account synchronization traffic is generated to assure all controllers are synchronized.
  • Proximity of BDCs to Primary Domain Controller (PDC) using WAN links—Domain account synchronization can use a large percentage of WAN bandwidth. Consider changing the ReplicationGovernor parameter to "schedule" the amount of bandwidth that the account synchronization process uses.

Performance Monitor Options

Cc750282.pp407g(en-us,TechNet.10).gif
Performance Monitor allows:
  • Viewing data from multiple computers simultaneously.
  • Seeing how changes that are made affect the computer.
  • Changing charts of current activity while viewing them.
  • Exporting Performance Monitor data to spreadsheets or database programs, or using it as raw input for Microsoft Visual Basic® and C programs.
  • Triggering a program or procedure, or sending notices when a threshold is exceeded.
  • Logging data about various objects from different computers over time. These log files are used to record typical resource use, monitor a problem, or help in capacity planning.
  • Combining selected sections of several log files into a long-term archive.
  • Reporting on current activity or trends over time.
  • Saving different combinations of counter and option settings for quick starts and changes.
Note: Computer initialization startup activities and network traffic can interfere with testing. Wait until the computer settles before testing. Also, disconnect the computer from the network if network activity is not being tested. Network drivers may respond to network events even if they are not directed to the testing computer.

Objects in Performance Monitor

Cc750282.pp407h(en-us,TechNet.10).gif
Performance Monitor measures the behavior of computer objects. The objects represent threads and processes, sections of shared memory, and physical devices. Performance Monitor collects data on activity, demand, and space used by the objects. Some objects, known as core objects, always appear in Performance Monitor; others appear only if the service or process is installing.
Windows NT Performance Monitor Core Objects
Object nameDescription
CacheAn area of physical memory that holds recently used data.
LogicalDiskDisk partitions and other logical views of disk space.
MemoryRandom-access memory used to store code and data.
ObjectsCertain system software objects.
Paging FileFile used to back up virtual memory allocations.
PhysicalDiskHardware disk unit (spindle or RAID device).
ProcessSoftware object that represents a running program.
ProcessorHardware unit that executes program instructions.
RedirectorFile system that diverts file requests to network servers.
SystemCounters that apply to all system hardware and software.
ThreadThe part of a process that uses the processor.

Some objects have multiple instances. Each instance of an object represents a component of the system. For example, a computer can have multiple disk drives. Each disk drive is an instance of the Physical Disk (computer) object.
When the computer being monitored has more than one component of the same object type, Performance Monitor displays multiple instances of the object in the Instance box of the Add to Chart (or to View, Log, or Report) dialog box. When appropriate, it also displays the _Total instance, which represents a sum of the values for all instances of the object. For example, if a computer has multiple physical disks, there are multiple instances of the PhysicalDisk object in the Add to Chart dialog box. The _Total would be an accumulation of all the individual instances.
Some objects are parts of other objects or are dependent upon other objects. The instances of these related objects are shown in the Instances box in the following format:
Parent object = => Child object
where the child object is part of or is dependent upon the parent object. This makes it easier to identify the object.
Performance Monitor was designed to cause minimal impact on the operating system. Be aware, however, that setting high sample rates can have a negative impact on the performance of the computer.
Select object counters to customize each of the four Performance Monitor views: Chart, Log, Report and Alert. These counter and option settings can be saved to a file and then different settings files for all monitoring tasks can be designed.
Note: Only active instances appear in the Instances box. A process must be started before it is seen in Performance Monitor. If logged data is being charted, only processes that were active when logging began appear in the Instances box.

Counters in Performance Monitor

Cc750282.pp407i(en-us,TechNet.10).gif
Performance Monitor can collect, average, and display data from internal counters by using the Windows NT registry and the Performance Library DLLs. A counter defines the type of data that is available for a particular type of object.
Performance Monitor collects data on various aspects of hardware and software performance, such as use, demand, and available space. A Performance Monitor counter is activated by adding it to a chart or report or by adding an object to a log.
As identified on the previous page, there are objects for physical components such as Processors, Physical Disks, and Memory, and there are other objects, such as Process and Paging files. Each object has a set of counters defined for it. Counters in an object record the activity level of the object. Windows NT Server uses the following typographical convention to name a counter of a particular object:
object: counter
For example, the % Processor Time counter of the Process object would appear as:
Process: % Processor Time
to distinguish it from Processor: % Processor Time or Thread: % Processor Time.
Note: When a counter is selected in any view, Performance Monitor collects data for all counters of that object, but displays only the counter selected. This creates minimal overhead, because most overhead in Performance Monitor results from what is displayed on the screen or written to the harddisk.
There are three types of counters:
  • Instantaneous counters display the most recent measurement.
    For example, Process: Thread Count displays the number of threads found in the most recent measurement.
  • Averaging counters measure a value over time and display the average of the last two measurements. When these counters are started, there is a delay for the second measurement to be taken before any values are displayed.
    For example, Memory: Pages/sec, shows the average number of memory pages found in the last two reads.
  • Difference counters subtract the last measurement from the previous one and display the difference if it is positive. If it is negative, they display a zero.
    Performance Monitor does not include any difference counters in its basic set, but they may be included in other applications that use Performance Monitor, and they can also be written. For information about writing performance counters, see the Win32® Software Development Kit.
Some hardware and applications designed for Windows NT come with their own counters. Many of these extensible counters are installed automatically with the product, but some are installed separately. In addition, there are a few specialized counters on the Windows NT Resource Kit 4.0 compact disc that can be installed. See the product documentation and Performance Monitor Help for detailed instructions on adding extensible counters.
Tip Click the Explain button in the Add To dialog box to display the definition for each counter. The Explain button works only when current activity is monitored, not logs.

Performance Monitor Views

Cc750282.pp407j(en-us,TechNet.10).gif
There are four ways to view data using Performance Monitor: chart, log, report, and alert.
The four views operate independently and concurrently, but only one can be viewed at a time. Each view gets data independently from the target computers, so looking at a counter in all four views requires four times the overhead as looking at the same counter in just one view. Luckily, this overhead is small, so concurrent use of views is not a problem. Use the View menu to specify a view in Performance Monitor.
Chart
A chart displays the value of the counter over time. Many counters may be charted at one time.
Report
A report shows the value of the counter. A report of all the counters in Performance Monitor can be created.
Alert
In this view, alerts are set on a counter. This causes an event to be displayed when the counter attains a specified value. Many alerts can be monitored at one time.
Log
In the Log view, the counters are recorded on disk for future analysis. Log files are fed back into Performance Monitor to create charts, reports, or alerts.
Performance Monitor Chart View
Cc750282.pp407k(en-us,TechNet.10).gif
Customized charts that monitor the current performance of selected counters and instances are useful when:
  • Investigating why a computer or application is slow or inefficient.
  • Continuously monitoring systems to find intermittent performance problems.
  • Discovering why capacity needs to be increased.
Different graphs require different settings. Creating charts to reflect these settings requires selecting the computer to be monitored and adding the appropriate objects, counters, and instances. These selections can be saved under a filename for viewing whenever an update on their performance is needed.
To enhance the readability of graphs, vary the scale of the displayed information and the color, width, and style of the line for each counter. You can also modify these properties after a selection is added.
The scale of any displayed value can be changed so that it is displayed in a chart or so that it can be compared with another value. To make very large or small values noticeable, change the vertical maximum on the chart.
In addition, you can use Chart Options, to customize charts and to change the method used for updating the chart values.
Note: Selecting a counter and then pressing the ctrl+h keys will highlight that counter on the chart.
Performance Monitor Report View
Cc750282.pp407l(en-us,TechNet.10).gif
The Report view displays constantly changing counter and instance values for added objects. Values appear in columns for each instance. Report intervals are adjusted, snapshots printed, and data reported or displayed. Reports of averaged counters show the average value during the Time Window interval. Reports of instantaneous counters show the value at the end of the Time Window interval.
Creating reports using current activity can help gain a better understanding of object behavior. The Report view allows:
  • Creating a report on all the counters for a given object and then watching them change under various loads.
  • Creating reports to reflect the same information that is charting or to monitor other specific situations. These selections can then be saved under a filename and reused when an update on the same information is needed.
After selections are added to a report, the selections, listed by computer and object, appear in the report area, and Performance Monitor displays the changing values of the selections in the report.
Performance Monitor Alert View
Cc750282.pp407m(en-us,TechNet.10).gif
The Alert view enables a person to continue working while Performance Monitor tracks events and notifies the person if requested. Use the Alert view to create an alert log that monitors the current performance of selected counters and instances for objects.
With the alert log, several counters can be measured at the same time. When a counter exceeds a given value, the date and time of the event are recorded in the Alert view. One thousand events are recorded, after which the oldest event is discarded when a new one is added. An event can also generate a network alert. When an event occurs, a specified program can be run every time or just the first time that it occurs.
Alert logs can be created to warn of problems in different situations. These selections can be saved under a filename and reused to see if the problem has been fixed.
An alert condition applies to the value of the counter over the time interval specified. The default time interval is five seconds. If an alert is set on Memory: Pages/sec > 50 using the default time interval, the average paging rate for a five-second period has to exceed 50 per second before the alert is triggered.
Note: Alerts cannot be set on two conditions of the same counter for the same instance. For example, an alert cannot be set to be triggered when Processor: %Processor Time on a single processor exceeds 90 percent and another to be triggered when it falls below 30 percent.
Also, an alert cannot be set on more than one instance of an object with the same name. For example, if two processes are running with the same name, an alert can only be set for the first instance of the process. Both instances will appear in the Instances box, but only data collected from the first instance will trigger the alert.
Performance Monitor Log View
Cc750282.pp407n(en-us,TechNet.10).gif
Logging is recording information on the current activity of added objects for later viewing. Data can also be collected from multiple systems into a single log file. Log files contain detailed data for detecting performance problems or other detailed analysis. For server analysis and forecasting future resources allocation, it allows the viewing of trends over a long period, and appending or re-logging files. Log file data is charted, reported, or exported to compare files or examine patterns.
Log view has a display area for listing objects and their corresponding computers. All counters and instances are logged for a selected object.
When logging is started, a log symbol with the changing total file size appears on the right side of the status bar.
Log files become more usable when bookmarks are added at various points while logging. With bookmarks, major points of interest can be highlighted, or the circumstances under which the file was created can be described. These locations are easily returned to when working with the log file. The Bookmark option becomes available when logging is started.
Note: Opening a log that is collecting data will stop the log and clear all counter settings. Performance Monitor does not allow peeking at the log from Chart or Report view because the views share the same data source. To peek at a running log, start a second copy of Performance Monitor, and set Data From to the running log.
No matter which view you use—Chart, Alert, Report, or Log—standard built-in features make Performance Monitor more flexible. Performance Monitor allows:
  • Using the Update Interval to determine how often performance is measured. There is a tradeoff between the precision of the data and Performance Monitor overhead.
  • Using the PRINT SCREEN key to save a bitmap image of the Performance Monitor screen. The image can then be printed or inserted into a document.
  • Clearing the Performance Monitor window, deleting a counter, or deleting the full screen.
  • Exporting the data in a tab-delimited (.tsv) or comma-delimited (.csv) text file to a spreadsheet or database program.
For specific instructions on these topics, use Performance Monitor Help.

Six-Step Process to Server Analysis and Optimization

Cc750282.pp407o(en-us,TechNet.10).gif
Before starting server analysis and optimization, have a strategy or procedure to ensure all goals are accomplished. The following list of steps to follow when performing server analysis and optimization has been adapted from an industry standard strategy. Each step in this process is described in more detail in the upcoming modules.

Creating a Measurement Baseline

Cc750282.pp407p(en-us,TechNet.10).gif
The first step in "A Windows NT Server Approach to Server Analysis and Optimization" involves creating a measurement baseline.
A measurement baseline is a collection of data that indicates how individual system resources, a collection of system resources, or the system as a whole is being used. This information is compared with later activity to help determine system usage and system response to that usage.
When creating a measurement baseline, start by identifying the resources that need to be measured. As a rule, monitor all four of the major server resources no matter which Windows NT Server environment (file and print server, application server, or domain server) is the focus. Although the implications in each server environment are different, include memory, processor, disk, and network objects in the baseline regardless of the environment.
Depending on the server environment, you may need to monitor additional resources and objects. The specific implication of resources in each environment is discussed in Modules 4, 5, and 6.
Some measurement tools are capable of analyzing the captured data and storing it in the format of the native tool. If a tool cannot provide or is not providing what is needed for analysis, export the data to another application. This new application could be a database application, such as Microsoft Access or Microsoft SQL Server, or a spreadsheet application such as Microsoft Excel.
Once the particular set of data is originally captured, regularly capture it and place it in the database. This provides the ability to analyze trends over time.
Listed below are general Performance Monitor objects that may be used to monitor the four server analysis and optimization resources.
ResourcesObjects to include
MemoryMemory (include Cache in the Application Server environment)
ProcessorProcessor, System, Server Work Queues
Disk subsystemPhysicalDisk, LogicalDisk
Network subsystemServer, Network Segment, Network Interface
Optional objectsApplication-specific objects, such as SQL Server, WINS Server, Browser, and RAS

Using Performance Monitor to Create a Measurement Baseline

Cc750282.pp407q(en-us,TechNet.10).gif
As discussed in the previous module, Performance Monitor performs data collection and analysis. It can assist with server analysis and optimization in the following two ways:
  • Creating a measurement baseline.
  • Isolating and gathering data to be placed into a database.
Performance Monitor uses objects and counters to associate statistical information with monitored components. The important features of Performance Monitor for server analysis and optimization are logging, re-logging, and appending log files.
Prior to logging, first select a set of objects to log. For server analysis, it is generally recommended to log the following:
  • System
  • Processor
  • Memory
  • Logical disk
  • Physical disk (if using RAID)
  • Server
  • Cache
  • Network adapter
  • Network segment activity on at least one server in the segment
If you are monitoring RAID disks, be sure to start diskperf with the -ye option.
When re-logging, increase the Log Option and update Time Interval to reduce the amount of data saved. If the original log file is recorded at 60 seconds, and the new file is recorded at 600 second intervals (which is fine for most server analysis uses), the new file will be about one-tenth the size of the original log file. To increase the Log Option and update Time Interval, on the Options menu, click Data From.
Consider appending log files to a master log file to create a single log archive. When re-logging, use the name of the archive log file. The new data will be appended at the end. The format of an archive log file is identical to a normal log file. Bookmarks are automatically inserted to mark the start of each appended log to ease browsing of the archive log file.
Take measurements over a week or more to get a complete measurement baseline. As previously mentioned, concentrate on the periods of peak activity—the baseline will indicate these periods.
Using Performance Monitor for Automating Data Collection
The Performance Monitor Service utilities provided in the Windows NT Resource Kit can be used to automate monitoring. It creates log files in the same format that Performance Monitor does. To do this, use Performance Monitor to specify the data to be collected. Set the update Time Interval option to the desired frequency for data collection. Name the log file and save the settings in a Performance Monitor Workspace settings file. Configure the Performance Monitor Service to start automatically when the systems boots.
Note: Performance Monitor log files can be quite large in size. Make sure adequate disk resources are available for storage of the log file or files. Identify the data that will help in server optimization. Spend time analyzing this data. This can help prevent overloading the system, or prevent you from being overwhelmed by the amount of data.
Be sure to create this database on a computer that is not being monitored. If the database is on the same computer, it affects the data being measured.

Establishing a Database of Measurement Information

Cc750282.pp407r(en-us,TechNet.10).gif
The second step in "A Windows NT Server Approach to Server Analysis and Optimization" is to establish a database of measurement information. This step involves collecting information over a period of time and adding that information to a database for the purposes of analyzing past performance and measuring trends over time.
Information in a database is measurable, manageable, and accessible for analysis. Database utilities greatly complement the data collection utilities. Data collection utilities gather large amounts of information; use the database utilities to organize the information into manageable and meaningful subsets. Once data has been collected from all four major resources and added to the database, use the database utility to analyze and pinpoint specific areas of interest or concern, such as the disk subsystem.

Creating a Database Using Different Applications

To create a database of measurement information for a Windows NT Server system, numerous applications may be used, such as:
  • Performance Monitor
  • Microsoft Excel
  • Microsoft SQL Server
  • Microsoft Access
  • Microsoft FoxPro®
As mentioned earlier in this module, Performance Monitor is an integrated tool for collecting measurement data for a Windows NT–based system. The data is collected and saved in log files. These log files, representing data that is collected over time, are displayed as charts or reports within Performance Monitor, or can be exported to other applications.
Microsoft Excel can be used to import the data from Performance Monitor log files. The data can then be manipulated and analyzed to identify trends and system bottlenecks. You can use the Microsoft Excel macro language or Microsoft Visual Basic to automate the data-analysis process.
Microsoft database applications, such as Microsoft Access, Microsoft FoxPro, and Microsoft SQL Server can be used to import and store large amounts of management data for further analysis using complex searches and queries. Once the data from Performance Monitor is imported into a database application, numerous methods of analysis are available.
Although the actual applications and methods that are used vary, what is crucial is that data is collected over time, and is saved for later analysis. The process of analyzing the data is covered in the "Performance Analysis, Forecasting, and Record Keeping" module.

Windows NT Server Environments

Cc750282.pp407s(en-us,TechNet.10).gif
Before analysis and optimization on a Windows NT Server can begin, determine the type of environment being analyzed. Windows NT Server environments generally fit into one of three categories: file and print server, application server, and domain server. Each of these involves different monitoring considerations and considerations on how to set expectations when performing server analysis and optimization.
File and Print Server
A file and print server is usually accessed by users for data retrieval and document storage, and occasionally for loading application software over the network.
Application Server
An application server is accessed by users in a client/server environment. The server runs an applications engine that users access using a front-end application.
Domain Server
A domain server is a server that generates data transfer between itself and other servers. A primary domain controller, for example, synchronizes the accounts database with backup domain controllers, or a WINS Server replicates its database with its replication partner. Domain servers also validate user logon requests.

Determining Workload Characterization

Cc750282.pp407t(en-us,TechNet.10).gif
Before expectations can be set for a system, it is necessary to know what is being requested of the system. This process is called workload characterization. A workload unit is a list of service requests made on the system or on a specific resource on the system. Examples of workload units are the number of disk access attempts per second, the number of bytes transferred per second, or the process of receiving data from a server (the client sending a request over the network to the server, the server responding over the network to the client).
Determining workload characterization requires understanding what is happening in a specific environment. In a file and print server environment, the area of most concern is disk I/O or the number of users accessing a server, whereas in an application server, the area of most concern is how much memory an application is using. That is not to say that memory usage is not important on a file and print server; rather, concentrate on the device that has the best chance of becoming a system bottleneck.
In a Windows NT Server environment, the two most common workload characteristics are the number of users the system can support and the expected response time for a specific transaction or task (such as copying a file from the server) given a certain number of users on a specific set of hardware.
Determine what is important to each system by the type of work being performed. This is essential to proper server analysis and optimization.

System Bottlenecks

Cc750282.pp407u(en-us,TechNet.10).gif
During the process of determining workload characterization, it is possible to encounter a resource that is not performing properly. The response to file access requests, for example, may be much too long for the number of users accessing the server. In this case, a symptom of a bottleneck has been detected.
A bottleneck is the part of the system that is currently restricting workflow. Generally, it is the over-consumption of a specific resource. It may be that the disk controller or drive is extremely slow accessing data, or that the processor is running at 100 percent utilization, or that too many active processes need access to RAM. Whatever is causing system responsiveness to suffer is the bottleneck.
It is very common that once one bottleneck has been identified and solved, another bottleneck appears. This new bottleneck was either unnoticed because of the severity of the previous bottleneck, or the new bottleneck was caused by solving the initial bottleneck. If the new bottleneck was caused by solving the initial bottleneck, the new bottleneck may have created more demand on another resource, causing it to become the restriction to work flow. Bottleneck detection is the process of isolating the hardware components that restrict the flow of your work.
System bottlenecks generally appear within the four major server analysis and optimization resources introduced in "The Basics of Server Analysis and Optimization" module: memory, processor, the disk subsystem, and the network subsystem. Within a Windows NT environment, use Performance Monitor to monitor current activity to determine if any system bottlenecks are present.
Note: After successfully identifying and resolving system bottlenecks, be sure to repeat steps one and two of the "Windows NT Server Approach to Server Analysis and Optimization." Do this before analyzing for capacity performance and expected system use.
Using Performance Monitor to Chart Bottlenecks
Performance Monitor collects data about objects (system resources) and counters (attributes or statistical information that is gathered on an object). This information helps to isolate and identify bottlenecks.
Recall that when adding objects to a log, all counters for the selected objects are collected automatically.
To identify statistical information for each of the individual attributes, use the data from the log file, and view it in a chart or report format. Viewing the information this way allows the selection of individual counters for each of the captured objects.

Finding Memory Bottlenecks

Cc750282.pp407v(en-us,TechNet.10).gif
The most common resource bottleneck within Windows NT Server is memory— specifically RAM (random-access memory). If only one thing is done to improve performance in a server, it should be the addition of memory.
Paged and Non-paged RAM
RAM in the Windows NT operating system is divided into two categories: paged and non-paged. Paged RAM is virtual memory, where all applications believe they have a full range of memory addresses available. Windows NT does this by giving each application a private memory range called a virtual memory space and by mapping that virtual memory to physical memory.
Non-paged RAM cannot use this configuration. Data placed into non-paged RAM must remain in memory and cannot be written to or retrieved from disk. For example, data structures used by interrupt routines or those that prevent multiprocessor conflicts within the operating system use non-paged RAM.
Virtual Memory System
The virtual memory system in Windows NT 4.0 combines physical memory, the file system cache, and disk into an information storage and retrieval system. The system stores program code and data on disk until it is needed, and then moves it into physical memory. Code and data no longer in active use is written back to disk. However, when a computer does not have enough memory, code and data must be written to and retrieved from the disk more frequently—a slow, resource-intensive process that can become a system bottleneck.
Hard Page Faults
The best indicator of a memory bottleneck is a sustained, high rate of hard page faults. Hard page faults occur when the data a program needs is not found in its working set (the physical memory visible to the program) or elsewhere in physical memory, and must be retrieved from disk. Sustained hard page fault rates—over five per second—are a clear indicator of a memory bottleneck.
Note: For information on virtual memory, see the Supporting Microsoft Windows NT 4.0 Core Technologies course.
Use the following list of Performance Monitor memory counters to determine if RAM is a bottleneck in the system:
  • Pages/sec—This is the number of requested pages that were not immediately available in RAM, and thus had to be accessed from the disk, or had to be written to the disk to make room in RAM for other pages. Generally, if this value has extended periods with the number of pages per second over five, memory may be a bottleneck in the system.
  • Available Bytes—This indicates the amount of available physical memory. It will normally be low, as the Windows NT Disk Cache Manager uses extra memory for caching and then returns it when requests for memory occur. However, if this value is consistently below 4 MB on a server, it is an indication that excessive paging is occurring.
  • Committed Bytes—This indicates the amount of virtual memory that has been committed to either physical RAM for storage, or to pagefile space. If the amount of committed bytes is larger than the amount of physical memory, it may indicate that more RAM is required.
  • Pool Nonpaged Bytes—This indicates the amount of RAM in the Non-paged pool system memory area where space is acquired by operating system components as they accomplish their tasks. If the Pool Nonpaged Bytes value has a steady increase without a corresponding increase in activity on the server, it may indicate that a process that is running has a memory leak, and it should be monitored closely.
CounterAcceptable average rangeDesire high or low valueAction
Pages/sec0–20LowFind process that is causing paging.
Add RAM.
Available BytesMinimum of 4 MBHighFind process using RAM.
Add RAM.
Committed BytesLess than physical RAMLowFind process using RAM.
Add RAM.
Pool Non-paged BytesRemain steady, no increaseNot applicableCheck for memory leak
in application.

Finding Processor Bottlenecks

Cc750282.pp407w(en-us,TechNet.10).gif
Just about everything that occurs on a server involves the CPU. The processor on an application server is generally busier than the processor on a file and print server. As a result, the processor activity and what is considered normal are different between the two types of servers.
Two of the most common causes of CPU bottlenecks are CPU-bound applications and drivers, and excessive interrupts that are generated by inadequate disk or network subsystem components.
Monitor the following Performance Monitor processor counters to help determine if the processor is a bottleneck:
  • % Processor Time—This measures the amount of time the processor is busy. When a processor is consistently running over 75 percent processor usage, the processor has become a system bottleneck. Analyze processor usage to determine what is causing the processor activity. This is accomplished by monitoring individual processes. If the system has multiple processors, then monitor the counter "System: % Total Processor Time."
  • % Privileged Time—This measures the time the processor spends performing operating system services.
  • % User Time—This measures the time the processor spends performing user services, such as running a word processor.
  • Interrupts/sec—This is the number of interrupts the processor is servicing from applications or from hardware devices. Windows NT Server can handle thousands of interrupts per second. However, if the number of interrupts consistently exceeds 1,000 on a 80486/66-based system, or 3,500 on a Pentium 90 PCI bus system, a hardware error or interrupt conflict with devices may be occurring. For example, if a conflict occurs between a hard disk controller and a network adapter card, monitor the disk controller and network adapter card to see if excessive requests are being generated. This is done by monitoring the queue lengths for the physical disk and network interface. Generally, if the queue length is greater than two requests, check for slow disk drives or network adapters that could be causing the queue length backlog.
  • System: Processor Queue Length—This is the number of requests the processor has in its queue. It indicates the number of threads that are ready to be executed and are waiting for processor time. Generally, a processor queue length that is consistently higher than two may indicate congestion. Further analysis of the individual processes making requests on the processor is required to determine what is causing the congestion.
  • Server Work Queues: Queue Length—This is the number of requests in the queue for the selected processor. A consistent queue of over two indicates processor congestion.
CounterAcceptable average rangeDesire high or low valueAction
% Processor TimeLess than 75%LowFind the process using excessive processor time. Upgrade or add another processor.
% Privileged TimeLess than 75%LowFind the process using excessive processor time. Upgrade or add another processor.
% User TimeLess than 75%LowFind the process using excessive processor time. Upgrade or add another processor.
Interrupts/secDepends on processorLowFind the controller card generating interrupts.
System: Processor Queue LengthLess than twoLowUpgrade or add additional processor.
Server Work Queues:
Queue Length
Less than twoLowFind the process using excessive processor time. Upgrade or add another processor.

Another tool that can be used for finding memory bottlenecks is the Windows NT Task Manager. One of the capabilities that Task Manager provides is an analysis of the amount of memory used.
If it is determined that the processor is a system bottleneck, a number of actions can be performed to improve performance. These include the following:
  • Add a faster processor if the system is a file and print server.
  • Add multiple processors for application servers, especially if the application is multithreaded.
  • Off-load processing to another system in the network (either users, applications, or services).

Finding Disk Bottlenecks

Cc750282.pp407x(en-us,TechNet.10).gif
Disks store programs and the data that programs process. While waiting for a computer to respond, it is frequently the disk that is the bottleneck. In this case, the disk subsystem can be the most important aspect of I/O performance, but problems can be hidden by other factors such as the lack of memory.
Performance Monitor disk counters are available with either the LogicalDisk and PhysicalDisk objects. LogicalDisk monitors logical partitions of physical drives. It is useful to determine which partition is causing the disk activity, possibly indicating the application or service that is generating the requests. PhysicalDisk monitors individual hard disk drives, and is useful for monitoring disk drives as a whole.
Performance Monitor disk counters, however, are not enabled by default and must be enabled manually.
To activate disk performance statistics on the local computer
  1. Start a command prompt, and type diskperf -y
  2. Restart the computer.
To activate disk performance on a remote computer called Server1
  1. Start a command prompt, and type diskperf -y \\server1
  2. Restart the remote computer.
If using a RAID implementation, start diskperf with the -ye parameter to get enhanced counters.
When analyzing disk subsystem performance and capacity, monitor the following Performance Monitor disk subsystem counters for bottlenecks:
  • % Disk Time—This indicates the amount of time that the disk drive is busy servicing read and write requests. If this is consistently close to 100 percent, the disk is being used very heavily. Monitoring of individual processes will help determine which process or processes are making the majority of the disk requests.
  • Disk Queue Length—Indicates the number of pending disk I/O requests for the disk drive. If this value is consistently over two, it indicates congestion.
  • Avg. Disk Bytes/Transfer—The average number of bytes transferred to or from the disk during write or read operations. The larger the transfer size, the more efficient the system is running.
  • Disk Bytes/sec—This is the rate bytes are transferred to or from the disk during write or read operations. The higher the average, the more efficient the system is running.
CounterAcceptable average rangeDesire high or low valueAction
% Disk TimeUnder 50%LowMonitor to see if paging is occurring.
Upgrade disk subsystem.
Disk Queue Length0–2LowUpgrade disk subsystem.
Avg. Disk Bytes/TransferDepends on subsystemHighUpgrade disk subsystem.
Disk Bytes/secDepends on subsystemHighUpgrade disk subsystem.

If you determine that the disk subsystem is a system bottleneck, a number of solutions are possible. These solutions include the following:
  • Add a faster controller, such as Fast SCSI-2, or an on-board caching controller.
  • Add more disk drives in a RAID environment. This spreads the data across multiple physical disks and improves performance, especially during reads.
  • Offload processing to another system in the network (either users, applications, or services).

Finding Network Bottlenecks

Cc750282.pp407y(en-us,TechNet.10).gif
Network bottlenecks are one of the more difficult areas to monitor due to the complexity of most networks today. As outlined in the "The Basics of Server Analysis and Optimization" module, a number of different issues can affect the performance of the network. While monitoring the network, a number of different objects and counters can be monitored, such as server, redirector, network segment, and protocols. Determining which ones to monitor depends upon the environment. Below are commonly monitored counters. Use them to form an overall picture of how the network is being used and to help in attempts to uncover bottlenecks.
  • Server: Bytes Total/sec—This is the number of bytes the server has sent and received over the network. It indicates how busy the server is for transmission and reception of data.
  • Server: Logon/sec—This is the number of logon attempts for local authentication, over-the-network authentication, and service accounts in the last second. This counter is beneficial on a domain controller to determine the amount of logon validation occurring.
  • Server: Logon Total—This is the number of logon attempts for local authentication, over-the-network authentication, and service accounts since the computer was last started. This counter is beneficial on a domain controller to determine the amount of logon validation occurring.
  • Network Segment: % Network utilization—This is the percentage of the network bandwidth in use for the local network segment. This can be used to monitor the effect of different network operations on the network, such as user logon validation or domain account synchronization.
    Note: The Network Segment counters are added when the Network Monitor Agent is added through Network Services in Control Panel. When Performance Monitor is actively monitoring Network Segment counters, it places the adapter card into promiscuous mode. While in promiscuous mode, the network adapter card accepts and processes all network traffic, not just traffic destined for itself. This should only be done occasionally and not left for extended periods, as the processing of all network traffic will affect the performance of the system running the Network Monitor Agent software.
  • Network Interface: Bytes Sent/sec—This is the number of bytes sent using this network adapter card.
  • Network Interface: Bytes Total/sec—This is the number of bytes sent and received using this network adapter card.
    Note: The Network Interface counters are added to a TCP/IP host when the SNMP Service is added. These may be added using the Network services in Control Panel.
CounterAcceptable average rangeDesire high or low valueAction
Bytes Total/secFunction of number of NICs and protocols used.HighFurther analysis to determine cause of problem. Add another adapter.
Logon/secNot applicableHighIf logon validation is not completed, add additional domain controllers.
Logons TotalNot applicableHighIf logon validation is not completed, add additional domain controllers.
Network Segment:
% Network Use
Generally lower than 30%, though switched networks can achieve higher use.LowSegment the network.
Limit the protocols in use.
Network Interface:
Bytes Sent/sec
Function of NIC and protocol or protocols.HighUpgrade network adapter/physical network.
Network Interface:
Bytes Total/sec
Function of NIC and protocol or protocols.HighUpgrade network adapter/physical network.

By viewing the above counters, it is possible to view the amount of activity on the server for logon requests and data access. If by monitoring these or other counters, the network subsystem is determined to be a bottleneck, numerous actions can help alleviate the bottleneck. These actions include the following:
  • Improve the hardware of the server by:
    • Adding an additional network adapter.
    • Upgrading to a better performing adapter.
    • Upgrading to better performing routers and bridges.
  • Add more servers to the network, thereby distributing the processing load.
  • Check and improve the physical layer components, such as routers.
  • Segment the network to isolate traffic to appropriate segments.

Monitoring Network Protocols

Cc750282.pp407z(en-us,TechNet.10).gif
In addition to objects and counters, it is also important to monitor how network protocols affect the network; protocols affect the number of broadcast datagrams being generated and the number of retransmissions occurring. By monitoring appropriate counters for the protocols in the environment, a clear picture of the use of the network bandwidth in the protocol is determined.
NetBEUI and NWLink
Both NetBEUI and NWLink have similar counters. The following are three common counters for monitoring:
  • Bytes Total/sec—This is the total number of bytes sent in frames (data packets) and datagrams (such as broadcasts and acknowledgments).
  • Datagrams/sec—The number of non-guaranteed datagrams (broadcasts and acknowledgments) sent and received on the network.
  • Frames/sec—The number of data packets that have been sent and received on the network.
CounterAcceptable average rangeDesire high or low valueAction
Bytes Total/secFunction of number of NICs and activityHighUpgrade NIC; add additional NIC.
Datagrams/secFunction of activityHighMonitor process to determine if causing excessive datagrams.
Frames/secFunction of activityHighReduce broadcast traffic.

TCP/IP
TCP/IP counters are added to a system when the TCP/IP protocol has been installed, and the SNMP Service has been installed. The SNMP Service contains the following objects and counters for TCP/IP related protocols:
  • TCP Segments/sec—The number of TCP segments (frames) that are sent and received over the network.
  • TCP Segments Re-translated/sec—The number of frames (segments) that are re-translated on the network.
  • UDP Datagrams/sec—The number of UDP datagrams (such as broadcasts) that are sent and received.
  • Network Interface: Output Queue Length—The length of the output packet queue (in packets). Generally, a queue longer than two indicates congestion, and analysis of the network structure to determine the cause is necessary.
CounterAcceptable average rangeDesire high or low valueAction
TCP Segments/secFunction of activityHighReduce broadcast traffic.
Segment network.
TCP Segments
Re-translated
Not applicableLowUpgrade physical hardware.
Segment network.
UDP Datagrams/secFunction of activityLowReduce broadcasts.
Network Interface:
Output Queue Length
Less than twoLowUpgrade NIC, add additional NIC, verify physical network components.

Memory Performance Testing

Executive Summary
Kingston Technology Co. contracted Mindcraft, an independent testing lab, to perform a quantitative performance study that answered the following questions:
  1. How much does Web server performance improve as you add more memory?
    The essential graphic to answer these questions is a chart showing Web server operations per second as a function of the amount of memory. In another chart, we show the effect the amount of memory has on response time.
  2. How much more can an application server do with additional memory?
    We answer this question in a chart showing the improvement of application server performance as a function of the amount of memory in the server.
  3. How much can a LDAP (Lightweight Directory Access Protocol) directory server do with more memory?
    We chart the results by showing performance as a function of memory size.

2. Scope of Work

The goal of this benchmark was to show the performance benefits of more memory in Windows NT-based servers. We answer the following questions for the applications we tested:
  • How much more work can an application do with each memory configuration?
  • When is it better to add memory to a system than to add processors?
  • How many users can the system support with this much memory?

Server Computer

For testing purposes, we selected a Compaq ProLiant 5000. This system uses 200MHz Pentium Pro CPUs, and can be configured with one to four processors. We tested the system with 64MB of memory, and we expanded up to 1GB of memory.
  • Web Server Testing
    Mindcraft used SPECweb96 to test Web server performance. The file working size was 622MB. We used Microsoft's IIS (Internet Information Server) 4.0 Web server software.
  • Application Server Testing
    Mindcraft used the Ziff-Davis ServerBench benchmark to test application server performance as a function of memory size.
    Mindcraft set up the machines to use Windows NT 4.0 as the client operating system. They used 24 client systems simulating a total of 72 clients.
  • Directory Server Testing
    Mindcraft used DirectoryMark to test the performance of an LDAP (Lightweight Directory Access Protocol) directory server test.
    Mindcraft used a directory size of 50,000, 200,000 entries, and 400,000 entries

3. What We Found

Web Server

  • Double your memory, cut your response time by more than half.
    A web server houses HTML files that transfer to a client in response to HTTP requests. It can also use applications to generate HTML code. The bottom line is, people don't want to wait around for a slow web page. While performing this benchmark test we found that by increasing the system's memory from 128MB to 256MB, with the system running 1CPU, a user can cut the server response time by 63%. Also, by doubling memory from 256MB to 512MB, a user will improve the system response time by 59%.
    If running a server with 2 CPUs, and system memory is upgraded from 512MB to 1GB, the system will improve response time by 36%.
    By looking at these results we can say:
    Double your memory = cut your response time in half
    Specweb96 Response Time Chart
    Cc750878.mind01(en-us,TechNet.10).gif
  • Increase your server performance by over 500%
    We also measured how many operations per second can be performed on the server as the amount of RAM is increased.
    This benchmark shows that the user can increase web server performance by 540% by upgrading a system from 128MB to 512MB, and running the server with 1 CPU.
    When upgrading from 512 MB to 1GB, and running a server with 2CPUs, a user can increase performance by 51%.
    Specweb96 Operations per second
    Cc750878.mind02(en-us,TechNet.10).gif
    It is important to notice in both charts that when running a system with 512MB, whether it is running one CPU or two CPUs, performance does not change significantly. This indicates that memory plays the key role in performance increase. These charts show that in this case increasing the amount of RAM on a server will be recommended, rather than increasing the number of CPUs.

Application Server

  • Support three to ten times as many clients by increasing your memory
    We tested the application server performance by adding clients in sets of four. We started testing with 1 client, and increased it to a maximum of 72 clients accessing the server at on time.
    Tests demonstrate that increasing your memory in an application server significantly improves the application server performance.
    By increasing base memory from 128MB to 256MB the server was able to support five times as many people before the transactions per second dropped. By increasing memory from 256MB to 512MB the server was able to support ten times as many people.
    So by doubling the server memory a user can dramatically increase the number of clients supported on an application server.
    ServerBench Score
    Cc750878.mind03(en-us,TechNet.10).gif
    This chart illustrates that the number of clients supported on a server is directly related to the amount of RAM installed in the system.
  • Increase the system memory on an application server = increase the number of clients supported
    We also found out that whether running one CPU or two CPUs with 512MB, a performance drop will take place at the same point (see chart below). Once again, this shows that memory plays a very important role on application servers.
    When running two CPUs the highest performance increase happens when 1GB of RAM is added to the system. At this point a server can support the highest amount of clients without showing significant performance drop.
    Cc750878.mind04(en-us,TechNet.10).gif

Directory Server

We tested three size databases:
Cc750878.mind05(en-us,TechNet.10).gif
  • Double your memory, increase your directory server performance by an average of 1000%
    On our first scenario, a 110MB database, we found out that by increasing the base memory from 128MB to 256MB a server will experience a performance increase of 947%. This is a very dramatic performance increase for clients accessing requests from the server.
    50,000 entry database scenario (110MB database)
    Cc750878.mind06(en-us,TechNet.10).gif
    On our second scenario, a database of 375MB, we found out that by doubling memory from 256MB to 512MB, access request time will improve by 3000%.
    200,000 entry database scenario (375MB database)
    Cc750878.mind07(en-us,TechNet.10).gif
    Finally, on our third scenario, a database of 690MB, by doubling memory from 512MB to 1GB, system response time will experience a performance increase of 248%.
    400,000 entry database scenario (690MB database)
    Cc750878.mind08(en-us,TechNet.10).gif

4. Mindcraft Certification

Mindcraft was commissioned by Kingston Technology Company to produce an independent and unbiased assessment of the benefits of memory on Windows NT Servers.
The tests were conducted on the second quarter of 1998.
These tests should be reproducible by others who use the same test lab configurations as well as the computer software configurations used in this benchmark test.

5. About Mindcraft

Mindcraft is a service-oriented, independent test lab. The company was founded in 1985 to provide high quality services and products to vendors and end users who want to test software, systems, and network products.
Mincraft is the largest Accredited POSIX Testing Laboratory in the world. As part of their accreditation by The National Voluntary Laboratory Accreditation Program (NVLAP, part of the United States National Institute of Standards and Technology), they have developed a rigorous quality system that meets international standards.
For more information about Mindcraft go to: http://www.mindcraft.com/

6. About Kingston

Kingston Technology is the world's largest independent manufacturer of memory products for servers, workstations, desktops, portables, and electronic devices. Over the last ten years, Kingston has diversified its product lines to include processor upgrades, flash memory, networking hardware, and storage products. With strictly regulated ISO-registered-facilities in the United States (ISO 9001), Ireland (ISO 9002), and Taiwan (ISO 9002), Kingston markets its products through an extensive worldwide network of distributors, major reseller chains and independent dealers.
In August of 1996, Kingston became part of SOFTBANK Corp. SOFTBANK Holdings Inc. is the holding company for all of SOFTBANK Corporation's U.S.-based activities. Its major operating companies include Ziff-Davis, Kingston Technology Company, SOFTBANK Content Services, and UTStarcom. SOFTBANK is the largest shareholder of Yahoo! and E*Trade, as well as a minority investor in The Rights Exchange, GeoCities, CyberCash, First Virtual Holdings, and E-LOAN. In addition, through affiliated venture funds in the U.S. and Japan, the SOFTBANK Group has made more than 70 investments in Internet companies. Access the SOFTBANK website at http://www.softbank.com/. Visit the Kingston home page on the Internet at http://www.kingston.com/.

Load balancing algorithms

You can select a round-robin algorithm or a biasing algorithm for load balancing. You select an algorithm using the Configuration Tool; see Configuring the Workload Manager.

The round-robin algorithm

The round-robin algorithm assumes that all CICS regions are equally valid for selection. In the round-robin algorithm, when the Client daemon is initially started, it reads from the configuration file a list of all possible CICS regions to which any ECI or EPI request can be sent.
The Workload Manager also records the last region selected. When a new ECI or EPI request is made, the next region in the list is selected as the target. When it reaches the last region it loops around to the first one.

The biasing algorithm

The biasing algorithm provides a way of balancing workload by specifying that workload distribution should favor particular regions. For example, if there are two regions with a bias of 75 and 25, program requests are sent in a ratio of 3:1 to the first region.
If a region fails, the internal biasing calculation changes. If two regions are available, one with a bias of 100 and the other with a bias of 0, all requests are sent to the first region. If the first region becomes unavailable, all requests are directed to the second region. A bias value of 0 is a special case, meaning use only if no other region is available.
The biasing algorithm works only for ECI calls. If you try to run an EPI application whilst the biasing algorithm is selected, the round-robin algorithm is used instead.