Quantcast
Channel: Kevin Holman's System Center Blog
Viewing all 127 articles
Browse latest View live

System Center Operations Manager SDK service failed to register an SPN

$
0
0

System Center Operations Manager SDK service failed to register an SPN

 

 

Have you seen this event in your RMS OpsMgr event logs?

 

Event Type:      Warning

Event Source:   OpsMgr SDK Service

Event Category:            None

Event ID:          26371

Date:                12/13/2007

Time:                2:58:24 PM

User:                N/A

Computer:         RMSCOMPUTER

Description:

The System Center Operations Manager SDK service failed to register an SPN. A domain admin needs to add MSOMSdkSvc/rmscomputer and MSOMSdkSvc/rmscomputer.domain.com to the servicePrincipalName of DOMAIN\sdkaccount

 

This seems to appear in the RC1-SP1 build of OpsMgr.

 

Every time the SDK service starts, it tries to update the SPN’s on the AD account that the SDK service runs under.  It fails, because by default, a user cannot update its own SPNs.  Therefore we see this error logged.

 

If the SDK account is a domain admin – it does not fail – because a domain admin would have the necessary rights.  Obviously – we don’t want the SDK account being a domain admin…. That isn’t required nor is it a best practice.

 

Therefore – to resolve this error, we need to allow the SDK service account rights to update the SPN.  The easiest way, is to go to the user account object for the SDK account in AD – and grant SELF to have full control.

 

A better, more granular way – is to only grant SELF the right of modifying the SPN:

 

  • Run ADSIEdit as a domain admin.
  • Find the SDK domain account, right click, properties.
  • Select the Security tab, click Advanced.
  • Click Add.  Type “SELF” in the object box.  Click OK.
  • Select the Properties Tab.
  • Scroll down and check the “Allow” box for “Read servicePrincipalName” and “Write servicePrincipalName”
  • Click OK.  Click OK.  Click OK.
  • Restart your SDK service – if AD has replicated from where you made the change – all should be resolved.

 To check SPN's:

The following command will show all the HealthService SPN's in the domain:

    Ldifde -f c:\ldifde.txt -t 3268 -d DC=DOMAIN,DC=COM -r "(serviceprincipalname=MSOMHSvc/*)" -l serviceprincipalname -p subtree
 

To view SPN's for a specific server: 

    "setspn -L servername"

 

 


OpsMgr security account rights mapping - what accounts need what privileges?

$
0
0

 

Do you ever wish you had a list of the rights needed to install OpsMgr on each server role?  Or what each service account needs for steady state?  Or how about ongoing support... for your Admin group - to have enough rights in SQL to support OpsMgr?

I have created a spreadsheet of the typical security accounts, and what rights they need on each server role, or database. 

 

Attachment is below:

 

 

Using OpsMgr for intrusion detection and security hardening

$
0
0

Here is an interesting little concept of how to use OpsMgr.

Because I have a lab, that is exposed to the internet over port 3389, I get a LOT of hacking attempts on this lab.  Mostly the source is from bots running on other compromised systems.  These bots just do brute force attacks against the typical Admin accounts and passwords via RDP.  In this article, I am going to show how OpsMgr can not only alert on this condition, but also respond by configuring the Windows Firewall to block these attacks.

 

I will start by analyzing the Server 2008 event that occurs when someone tries to attack using my “Administrator” account:

 

Log Name:          Security
Source:              Microsoft-Windows-Security-Auditing
Date:                  7/14/2009 12:44:05 PM
Event ID:            4625
Task Category:   Account Lockout
Level:                  Information
Keywords:          Audit Failure
User:                   N/A
Computer:           terminalserver.domain.com

Description:   An account failed to log on.

Subject:
    Security ID:             SYSTEM
    Account Name:        TERMINALSERVER$
    Account Domain:     DOMAIN
    Logon ID:                 0x3e7

Logon Type:            10

Account For Which Logon Failed:
    Security ID:             NULL SID
    Account Name:        administrator
    Account Domain:     TERMINALSERVER

Failure Information:
    Failure Reason:        Account locked out.
    Status:                      0xc0000234
    Sub Status:               0x0

Process Information:
    Caller Process ID:          0x14f0
    Caller Process Name:    C:\Windows\System32\winlogon.exe

Network Information:
    Workstation Name:    TERMINALSERVER
    Source Network Address:    10.10.10.1
    Source Port:        1261

Detailed Authentication Information:
    Logon Process:           User32
    Authentication Package:    Negotiate
    Transited Services:    -
    Package Name (NTLM only):    -
    Key Length:        0

 

So… for starters, I want to alert on this condition… when ANYONE is trying multiple times… to RDP into the server, with a disabled account, non-existent account, or valid account, but bad password.  Therefore – I will create a monitor:  Windows Events > Repeated Event Detection > Timer Reset.

The idea here is to only respond when multiple bad passwords are entered in a short time period…. representing an attack.  (I don't want to lock out or block access from my normal users who sometimes mis-type their password on a couple attempts.)

So I create the monitor, target “Windows Server Operating System”, set it to “Security” for the Parent Monitor, and UNCHECK the box enabling it.  (I will later override this monitor and ONLY enable it for my entry terminal server.)

I create my event expression for the security event log, event 4625, and I only want the Logon Type of 10, which is from RDP:

 

image

 

 

Next – I will set up my monitor, to Trigger on Count (of events), Sliding.  Compare count will be set to 5 (events) within a 3 minute interval.  Therefore, as soon as 5 events are captured, in ANY sliding 3 minute “window”, the monitor will change state.

 

image

 

Next… since my goal is really to execute a script/command/response…. (not really a state change is desired) I will set the timer reset to reset the state back to healthy after 2 minutes.  This will free the workflow up to block any other source IP’s which might attack soon after.

 

image

 

I don't want to impact availability data, which assumes critical state = unavailable…. so I will use a Warning State:

 

image

 

Now – I will enable a unique alert for this condition.  I want a critical, high priority alert in this case, and I will set this NOT to close the alert when we auto-resolve the state on the timer.  I also will customize the alert description, to give me a richer alert based on the even details and my custom response.  I talk more about these event parameters HERE.   I will be adding:

 

$Data/Context/Context/DataItem/Params/Param[6]$ typed a bad password accessing directly from computer: $Data/Context/Context/DataItem/Params/Param[14]$ from IP: $Data/Context/Context/DataItem/Params/Param[20]$
The Windows Firewall will be modified to block this IP address in response to this monitor state.

 

image

 

 

Next – I will go back and find my monitor, and add a Recovery for the Warning State:

 

image 

 

I will choose to Run Command.  Give it a name “Modify Windows Firewall”

 

image

 

Next – for the command – I am going to run Netsh.exe which can configure the Windows Firewall running on the terminal server.  Here is the command:

 

C:\Windows\System32\netsh.exe

advfirewall firewall set rule name="Block RDP" new remoteip=$Data/StateChange/DataItem/Context/DataItem/Context/DataItem/Params/Param[20]$

 

$Data/StateChange/DataItem/Context/DataItem/Context/DataItem/Params/Param[20]$ is based on an Event Parameter of the Server 2008 event, which I will pass to the command, so it will gather the IP address of the attacker, and pass that to the command which configures the firewall rule.  Getting this variable was the most complicated for me…..   Marius talked about how to derive this variable HERE  Just understand that the variables you use in an alert description are not the same was used in a diagnostic or recovery.

 

image

 

Cool:

 

image

 

 

My Netsh.exe command modifies an existing custom rule in the Windows Firewall, so I need to make sure I create that and name it “Block RDP”.

Now – I will override this rule and enabled it for my published terminal server, and then test this monitor… by attempting to log into my terminal server via RDP 5 times in a short period, using a disabled account.  This will cause the event in the security event log for each event, and eventually trip the repeated event detection monitor.

 

Alert generates:

image

 

Monitor changes state:

image

 

Recovery runs:

 

image

 

Windows Firewall rule gets modified:

 

image

 

Attack is stopped.

Pretty cool, eh? 

Rare gateway / certificate issue – Event 20077 - the certificate cannot be queried for property information

$
0
0

I was installing a gateway in a locked down DMZ environment today, and ran across an issue getting my certificates to work.

My DMZ based gateway has NO access to browse the Enterprise CA’s website, so I had to request and issue my certificates, and export them all manually.  When trying to use the certificate for the GW – I was getting this event during Health Service startup in the OpsMgr log:

Event Type:    Error
Event Source:    OpsMgr Connector
Event Category:    None
Event ID:    20077
Date:        2/5/2011
Time:        1:48:35 PM
User:        N/A
Computer:    DMZGW1
Description:
The certificate specified in the registry at HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft Operations Manager\3.0\Machine Settings cannot be used for authentication, because the certificate cannot be queried for property information.  The specific error is 0x80092004(%3).
This typically means that no private key was included with the certificate.  Please double-check to ensure the certificate contains a private key.

I was using the following documentation:

How to Obtain a Certificate Using Windows Server 2008 Enterprise CA in Operations Manager 2007

 

The only difference was – I could not submit the request and directly import it using the machine in the DMZ.  Instead I was using my desktop to submit the request to the CA, and then download a copy of it.  This downloaded copy was a .CER file.

It imported just fine in the computer personal store – but would not work – giving the error event above.

 

After a little digging, I found an internal article with the following resolution:

  • Open certmgr for "Computer account" in MMC as a snap-in.
  • Double click on the certificate in question.
  • Go to "Details" tab.
  • Scroll down till you find the "Thumbprint" section.
  • Copy the information and paste in a text editor like notepad which typically looks like below:
  • fb 5a d6 35 50 84 fd 6c ec ca b8 47 2a 36 94 d6 63 15 d3 be
  • certutil.exe -repairstore My "thumbprint"
  • In the above example the command would look like this:
  • certutil.exe -repairstore My "fb 5a d6 35 50 84 fd 6c ec ca b8 47 2a 36 94 d6 63 15 d3 be"
  • Once this is done, On opening the certificate, we should see the text as "you have a private key that corresponds to this certificate."

 

After doing this – sure enough – I verified that the certificate in my computer personal store now has the correct “You have a private key that corresponds to this certificate”

image

 

Now – I had to re-import my trusted root certificate chain, and bounce the Health Service on the Gateway, and it all worked perfectly.

 

I don’t expect this to be a common issue, but figured it worthy of writing up in case others run into this situation.

A list of all possible security events in the Windows Security Event Log

System Center Operations Manager SDK service failed to register an SPN

$
0
0

System Center Operations Manager SDK service failed to register an SPN

 

 

Have you seen this event in your RMS OpsMgr event logs?

 

Event Type:      Warning

Event Source:   OpsMgr SDK Service

Event Category:            None

Event ID:          26371

Date:                12/13/2007

Time:                2:58:24 PM

User:                N/A

Computer:         RMSCOMPUTER

Description:

The System Center Operations Manager SDK service failed to register an SPN. A domain admin needs to add MSOMSdkSvc/rmscomputer and MSOMSdkSvc/rmscomputer.domain.com to the servicePrincipalName of DOMAIN\sdkaccount

 

This seems to appear in the RC1-SP1 build of OpsMgr.

 

Every time the SDK service starts, it tries to update the SPN’s on the AD account that the SDK service runs under.  It fails, because by default, a user cannot update its own SPNs.  Therefore we see this error logged.

 

If the SDK account is a domain admin – it does not fail – because a domain admin would have the necessary rights.  Obviously – we don’t want the SDK account being a domain admin…. That isn’t required nor is it a best practice.

 

Therefore – to resolve this error, we need to allow the SDK service account rights to update the SPN.  The easiest way, is to go to the user account object for the SDK account in AD – and grant SELF to have full control.

 

A better, more granular way – is to only grant SELF the right of modifying the SPN:

 

  • Run ADSIEdit as a domain admin.
  • Find the SDK domain account, right click, properties.
  • Select the Security tab, click Advanced.
  • Click Add.  Type “SELF” in the object box.  Click OK.
  • Select the Properties Tab.
  • Scroll down and check the “Allow” box for “Read servicePrincipalName” and “Write servicePrincipalName”
  • Click OK.  Click OK.  Click OK.
  • Restart your SDK service – if AD has replicated from where you made the change – all should be resolved.

 To check SPN's:

The following command will show all the HealthService SPN's in the domain:

    Ldifde -f c:\ldifde.txt -t 3268 -d DC=DOMAIN,DC=COM -r "(serviceprincipalname=MSOMHSvc/*)" -l serviceprincipalname -p subtree
 

To view SPN's for a specific server: 

    "setspn -L servername"

 

 

OpsMgr security account rights mapping - what accounts need what privileges?

$
0
0

 

Do you ever wish you had a list of the rights needed to install OpsMgr on each server role?  Or what each service account needs for steady state?  Or how about ongoing support... for your Admin group - to have enough rights in SQL to support OpsMgr?

I have created a spreadsheet of the typical security accounts, and what rights they need on each server role, or database. 

 

Attachment is below:

 

 

Using OpsMgr for intrusion detection and security hardening

$
0
0

Here is an interesting little concept of how to use OpsMgr.

Because I have a lab, that is exposed to the internet over port 3389, I get a LOT of hacking attempts on this lab.  Mostly the source is from bots running on other compromised systems.  These bots just do brute force attacks against the typical Admin accounts and passwords via RDP.  In this article, I am going to show how OpsMgr can not only alert on this condition, but also respond by configuring the Windows Firewall to block these attacks.

 

I will start by analyzing the Server 2008 event that occurs when someone tries to attack using my “Administrator” account:

 

Log Name:          Security
Source:              Microsoft-Windows-Security-Auditing
Date:                  7/14/2009 12:44:05 PM
Event ID:            4625
Task Category:   Account Lockout
Level:                  Information
Keywords:          Audit Failure
User:                   N/A
Computer:           terminalserver.domain.com

Description:   An account failed to log on.

Subject:
    Security ID:             SYSTEM
    Account Name:        TERMINALSERVER$
    Account Domain:     DOMAIN
    Logon ID:                 0x3e7

Logon Type:            10

Account For Which Logon Failed:
    Security ID:             NULL SID
    Account Name:        administrator
    Account Domain:     TERMINALSERVER

Failure Information:
    Failure Reason:        Account locked out.
    Status:                      0xc0000234
    Sub Status:               0x0

Process Information:
    Caller Process ID:          0x14f0
    Caller Process Name:    C:\Windows\System32\winlogon.exe

Network Information:
    Workstation Name:    TERMINALSERVER
    Source Network Address:    10.10.10.1
    Source Port:        1261

Detailed Authentication Information:
    Logon Process:           User32
    Authentication Package:    Negotiate
    Transited Services:    -
    Package Name (NTLM only):    -
    Key Length:        0

 

So… for starters, I want to alert on this condition… when ANYONE is trying multiple times… to RDP into the server, with a disabled account, non-existent account, or valid account, but bad password.  Therefore – I will create a monitor:  Windows Events > Repeated Event Detection > Timer Reset.

The idea here is to only respond when multiple bad passwords are entered in a short time period…. representing an attack.  (I don't want to lock out or block access from my normal users who sometimes mis-type their password on a couple attempts.)

So I create the monitor, target “Windows Server Operating System”, set it to “Security” for the Parent Monitor, and UNCHECK the box enabling it.  (I will later override this monitor and ONLY enable it for my entry terminal server.)

I create my event expression for the security event log, event 4625, and I only want the Logon Type of 10, which is from RDP:

 

image

 

 

Next – I will set up my monitor, to Trigger on Count (of events), Sliding.  Compare count will be set to 5 (events) within a 3 minute interval.  Therefore, as soon as 5 events are captured, in ANY sliding 3 minute “window”, the monitor will change state.

 

image

 

Next… since my goal is really to execute a script/command/response…. (not really a state change is desired) I will set the timer reset to reset the state back to healthy after 2 minutes.  This will free the workflow up to block any other source IP’s which might attack soon after.

 

image

 

I don't want to impact availability data, which assumes critical state = unavailable…. so I will use a Warning State:

 

image

 

Now – I will enable a unique alert for this condition.  I want a critical, high priority alert in this case, and I will set this NOT to close the alert when we auto-resolve the state on the timer.  I also will customize the alert description, to give me a richer alert based on the even details and my custom response.  I talk more about these event parameters HERE.   I will be adding:

 

$Data/Context/Context/DataItem/Params/Param[6]$ typed a bad password accessing directly from computer: $Data/Context/Context/DataItem/Params/Param[14]$ from IP: $Data/Context/Context/DataItem/Params/Param[20]$
The Windows Firewall will be modified to block this IP address in response to this monitor state.

 

image

 

 

Next – I will go back and find my monitor, and add a Recovery for the Warning State:

 

image 

 

I will choose to Run Command.  Give it a name “Modify Windows Firewall”

 

image

 

Next – for the command – I am going to run Netsh.exe which can configure the Windows Firewall running on the terminal server.  Here is the command:

 

C:\Windows\System32\netsh.exe

advfirewall firewall set rule name="Block RDP" new remoteip=$Data/StateChange/DataItem/Context/DataItem/Context/DataItem/Params/Param[20]$

 

$Data/StateChange/DataItem/Context/DataItem/Context/DataItem/Params/Param[20]$ is based on an Event Parameter of the Server 2008 event, which I will pass to the command, so it will gather the IP address of the attacker, and pass that to the command which configures the firewall rule.  Getting this variable was the most complicated for me…..   Marius talked about how to derive this variable HERE  Just understand that the variables you use in an alert description are not the same was used in a diagnostic or recovery.

 

image

 

Cool:

 

image

 

 

My Netsh.exe command modifies an existing custom rule in the Windows Firewall, so I need to make sure I create that and name it “Block RDP”.

Now – I will override this rule and enabled it for my published terminal server, and then test this monitor… by attempting to log into my terminal server via RDP 5 times in a short period, using a disabled account.  This will cause the event in the security event log for each event, and eventually trip the repeated event detection monitor.

 

Alert generates:

image

 

Monitor changes state:

image

 

Recovery runs:

 

image

 

Windows Firewall rule gets modified:

 

image

 

Attack is stopped.

Pretty cool, eh? 


Rare gateway / certificate issue – Event 20077 - the certificate cannot be queried for property information

$
0
0

I was installing a gateway in a locked down DMZ environment today, and ran across an issue getting my certificates to work.

My DMZ based gateway has NO access to browse the Enterprise CA’s website, so I had to request and issue my certificates, and export them all manually.  When trying to use the certificate for the GW – I was getting this event during Health Service startup in the OpsMgr log:

Event Type:    Error
Event Source:    OpsMgr Connector
Event Category:    None
Event ID:    20077
Date:        2/5/2011
Time:        1:48:35 PM
User:        N/A
Computer:    DMZGW1
Description:
The certificate specified in the registry at HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft Operations Manager\3.0\Machine Settings cannot be used for authentication, because the certificate cannot be queried for property information.  The specific error is 0x80092004(%3).
This typically means that no private key was included with the certificate.  Please double-check to ensure the certificate contains a private key.

I was using the following documentation:

How to Obtain a Certificate Using Windows Server 2008 Enterprise CA in Operations Manager 2007

 

The only difference was – I could not submit the request and directly import it using the machine in the DMZ.  Instead I was using my desktop to submit the request to the CA, and then download a copy of it.  This downloaded copy was a .CER file.

It imported just fine in the computer personal store – but would not work – giving the error event above.

 

After a little digging, I found an internal article with the following resolution:

  • Open certmgr for "Computer account" in MMC as a snap-in.
  • Double click on the certificate in question.
  • Go to "Details" tab.
  • Scroll down till you find the "Thumbprint" section.
  • Copy the information and paste in a text editor like notepad which typically looks like below:
  • fb 5a d6 35 50 84 fd 6c ec ca b8 47 2a 36 94 d6 63 15 d3 be
  • certutil.exe -repairstore My "thumbprint"
  • In the above example the command would look like this:
  • certutil.exe -repairstore My "fb 5a d6 35 50 84 fd 6c ec ca b8 47 2a 36 94 d6 63 15 d3 be"
  • Once this is done, On opening the certificate, we should see the text as "you have a private key that corresponds to this certificate."

 

After doing this – sure enough – I verified that the certificate in my computer personal store now has the correct “You have a private key that corresponds to this certificate”

image

 

Now – I had to re-import my trusted root certificate chain, and bounce the Health Service on the Gateway, and it all worked perfectly.

 

I don’t expect this to be a common issue, but figured it worthy of writing up in case others run into this situation.

A list of all possible security events in the Windows Security Event Log

Update: Automating Run As Account distribution dynamically

$
0
0

 

Just an FYI – I have updated the automatic run as account distribution script I published, to make it more reliable in large environments, limit resources used and decrease the chance of a timeout, along with adding better debug logging.

 

Get the script and read more here:

Automating Run As Account Distribution – Finally!

 

I also published this script in a simple management pack with a rule, which will run the script once a day in your management group.  It targets the All Management Servers Resource Pool so this will have high availability and only run on the single Management Server that is hosting that object.

 

Get the Management Pack here:

https://gallery.technet.microsoft.com/Management-Pack-to-06730af3

How SQL database free space monitoring works in the SQL management pack

$
0
0

 

image

 

This is based on 6.6.4.0 version of the SQL MP

 

First – understand the SQL MP discovers the following items:

  • SQL Database
  • SQL DB File Group
  • SQL DB File
  • SQL DB Log File

The Database > hosts > DB File Group > hosts DB File.

Also – the Database > hosts > DB Log File.

 

Let’s start with free space monitoring in the DB file, this is the lowest level of monitoring.

There are unit monitors that directly target the “SQL Server 2012 DB File” class.

The monitor for space is called: “DB File Space”   (Microsoft.SQLServer.2012.Monitoring.DBFileSpaceMonitor)

 

clip_image001

 

This runs every 15 minutes, and accepts a default threshold of 10% (critical) and 20% (warning). This monitor does not generate alerts – it simply rolls up state. The reason for this is because you can have multiple files in a file group for a DB, and just having a single file being full is not an issue.

 

Microsoft.SQLServer.2012.Monitoring.DBFileSpaceMonitor uses the Microsoft.SQLServer.2012.DBFileSizeMonitorType

Microsoft.SQLServer.2012.DBFileSizeMonitorType  uses the Microsoft.SQLServer.2012.DBFileSizeRawPerfProvider datasource.

Microsoft.SQLServer.2012.DBFileSizeRawPerfProvider datasource runs GetSQL2012DBFilesFreeSpace.vbs with the following parameters from the Monitor configuration:

“$Config/ConnectionString$” “$Config/ServerName$” “$Config/SqlInstanceName$” “$Target/Host/Host/Host/Property[Type=”SQL!Microsoft.SQLServer.DBEngine”]/TcpPort$”

 

This script checks many configuration settings about the individual DB file – then rolls up a health state after complete.

 

Scenario: Autogrow is enabled

  • If autogrow is enabled for the DB file, the script checks the DB setting for FileMaxSize to be set.
  • If FileMaxSize is set – this is considered the upper limit to threshold against. (unless logical disk size is smaller than FileMaxSize)
  • If FileMaxSize is NOT set (Unlimited) then the logical disk size is considered the upper limit.

Scenario: Autogrow is NOT enabled:

  • If autogrow is not enabled, then the file size is considered the max file size and this value is used for threshold comparison.

 

The DB files will be healthy or unhealthy based on this calculation. Again – no alerts yet.

Next – all the discovered DB file monitors roll their health state up one level to the monitor “DB File Space (rollup)”

clip_image002

 

This is a rollup dependency monitor targeting the filegroup object, and has a “best state” rollup policy. Which means if ANY child DB file has free space, then the rollup is healthy.  That makes sense.

clip_image003

 

This monitor DOES generate alerts named “File Group is Running out of space”

clip_image004

 

This monitor rolls up health to “DB File Group Space” monitor.

clip_image005

 

which is an Aggregate monitor, which has a “Worst state of any member” policy. This is used for rollup only.

clip_image006

 

This monitor rolls up health to the “DB File Group Space (rollup)” monitor

clip_image007

 

This is a rollup dependency monitor targeting the database object, and has a “worst state” rollup policy. Which means if ANY FILE GROUP is unhealthy, we consider the DB unhealthy.

 

This rolls up to the “DB Space” monitor, which is an Aggregate rollup monitor to roll health to the DB object.

image

 

 

SUMMARY of DB file monitoring:

  • The ACTUAL space monitoring in the SQL MP is done at the individual DB file level.
  • Alerting is done at the DB File GROUP level based on a “best of” rollup.
  • Everything else is designed to roll the health up correctly from DB file to File Group, and from File Group to Database object.

 

Log file free space monitoring:

This works EXACTLY like DB file space monitoring, except it is less complicated because there is no concept of a “filegroup” so the log file monitor rolls up to the DB object with a single dependency monitor (rollup), which is also where the alerts generate from.

image

 

 

Now, if you DO use autogrow, and you place multiple DB files or log files on the SAME logical disk – the management pack does NOT take that into account, so your individual DB and log file monitors might not trigger because they individually are not above the threshold, but cumulatively they could fill the disk. This is why the Base OS disk free space monitoring is still critical for SQL volumes.  This is documented in the MP guide.

 

 

Alternatives:

IF – for some reason – a customer did not want to discover DB files and file groups, and ONLY wanted the total database space calculated, there is a disabled monitor targeting the DB object for the DB and one for the log file. You could optionally disable the discovery of DB files and filegroups, and have a MUCH simpler design (although not quite as actionable potentially)

clip_image008

A customer might take this approach if they have a VERY large SQL environment, and wants to reduce scale impact by not discovering DB file groups and DB files. Additionally, this reduces all the performance collection impact which would otherwise be collecting data for all those individual objects. 

Another reason to take this approach is if you have a HUGE SQL server with a LOT of databases and DB files.  The amount of scripts running on that server could be VERY large and very impactful to the server.  You could selectively disable the discoveries for that server, run the Remove-SCOMDisabledClassInstance to clean them out of SCOM, and then enable just the smaller monitors.

If you don’t NEED monitoring of individual files and file groups, this approach makes some sense.

MP Update: SCCM 2012 MP version 5.00.8239.1008

$
0
0

 

The System Center Config Mgr 2012 MP has been updated.

Unfortunately – it is undocumented what was changed from the previous release.  Sad smile

 

We can make some guesses, however from the supported configurations page in the guide:

Configuration

Support

System Center 2012 Configuration Manager Service Pack (SP2) CU3 or later version

Yes

System Center 2012 R2 Configuration Manager CU3 or later version

Yes

System Center Configuration Manager 1602 or Later

Yes

Configuration Manager 2007

Not supported

 

The previous ConfigMgr 2012 MP supported SCCM 2012, and SCCM 2012 SP1.  It was never updated for SCCM 2012 SP2 or SCCM 2012R2…. but it worked for those versions.

However – this new MP now explicitly states support for:

  • SCCM 2012 SP2 CU3+
  • SCCM 2012 R2 CU3+
  • SCCM build 1602+

Writing a custom class for your network devices

$
0
0

 

image

 

While there is built in monitoring for network devices in SCOM – there are scenarios where we want to create custom classes for specific network device types.  Perhaps you want to create your own SNMP based polling monitors, and run them against specific network device types, such as a specific firewall brand, or router.

Creating your custom class is quite simple – based on a common System OID that the devices will share.

This concept was documented by Daniele Grandini, https://nocentdocent.wordpress.com/2013/05/21/discovery-identifying-the-device-snmp-mp-chap-2-sysctr-scom/  I am simply taking it a step further, in publishing a full MP example for you to work by.

 

In the first step – we need to define the MP manifest, and add a reference to the System.NetworkManagement.Library since we will be targeting the “Node” class from that MP:

 

<Manifest> <Identity> <ID>Example.Network</ID> <Version>1.0.0.2</Version> </Identity> <Name>Example Network</Name> <References> <Reference Alias="Network"> <ID>System.NetworkManagement.Library</ID> <Version>7.1.10226.0</Version> <PublicKeyToken>31bf3856ad364e35</PublicKeyToken> </Reference> <Reference Alias="Windows"> <ID>Microsoft.Windows.Library</ID> <Version>6.0.4837.0</Version> <PublicKeyToken>31bf3856ad364e35</PublicKeyToken> </Reference> <Reference Alias="System"> <ID>System.Library</ID> <Version>6.0.4837.0</Version> <PublicKeyToken>31bf3856ad364e35</PublicKeyToken> </Reference> <Reference Alias="SC"> <ID>Microsoft.SystemCenter.Library</ID> <Version>6.0.4837.0</Version> <PublicKeyToken>31bf3856ad364e35</PublicKeyToken> </Reference> <Reference Alias="Health"> <ID>System.Health.Library</ID> <Version>6.0.4837.0</Version> <PublicKeyToken>31bf3856ad364e35</PublicKeyToken> </Reference> </References> </Manifest>

 

Next, we will define our class.  We will use Node as the Base Class.

 

<TypeDefinitions> <EntityTypes> <ClassTypes> <ClassType ID="Example.Network.Device" Accessibility="Public" Abstract="false" Base="Network!System.NetworkManagement.Node" Hosted="false" Singleton="false" Extension="false" /> </ClassTypes> </EntityTypes>

 

Then – a datasource module that will be used for each discovery for a unique device type.  We try to make datasource modules reusable – and have each workflow simply pass the necessary items instead of hard coding them:

 

<ModuleTypes> <DataSourceModuleType ID="Example.Network.Device.Discovery.DS" Accessibility="Internal" Batching="false"> <Configuration> <xsd:element minOccurs="1" name="IntervalSeconds" type="xsd:integer" /> <xsd:element minOccurs="0" name="SyncTime" type="xsd:string" /> <xsd:element name="OID" type="xsd:string" /> <xsd:element name="DisplayName" type="xsd:string" /> <xsd:element name="Model" type="xsd:string" /> <xsd:element name="Vendor" type="xsd:string" /> </Configuration> <OverrideableParameters> <OverrideableParameter ID="IntervalSeconds" ParameterType="int" Selector="$Config/IntervalSeconds$"/> <OverrideableParameter ID="SyncTime" ParameterType="string" Selector="$Config/SyncTime$"/> </OverrideableParameters> <ModuleImplementation Isolation="Any"> <Composite> <MemberModules> <DataSource ID="Scheduler" TypeID="System!System.Discovery.Scheduler"> <Scheduler> <SimpleReccuringSchedule> <Interval>$Config/IntervalSeconds$</Interval> <SyncTime>$Config/SyncTime$</SyncTime> </SimpleReccuringSchedule> <ExcludeDates /> </Scheduler> </DataSource> <ConditionDetection ID="MapToDiscovery" TypeID="System!System.Discovery.FilteredClassSnapshotDataMapper"> <Expression> <SimpleExpression> <ValueExpression> <Value>$Target/Property[Type="Network!System.NetworkManagement.Node"]/SystemObjectID$</Value> </ValueExpression> <Operator>Equal</Operator> <ValueExpression> <Value Type="String">$Config/OID$</Value> </ValueExpression> </SimpleExpression> </Expression> <ClassId>$MPElement[Name='Example.Network.Device']$</ClassId> <InstanceSettings> <Settings> <Setting> <Name>$MPElement[Name='System!System.Entity']/DisplayName$</Name> <Value>$Config/DisplayName$</Value> </Setting> <Setting> <Name>$MPElement[Name='Network!System.NetworkManagement.Node']/DeviceKey$</Name> <Value>$Target/Property[Type="Network!System.NetworkManagement.Node"]/DeviceKey$</Value> </Setting> <Setting> <Name>$MPElement[Name='Network!System.NetworkManagement.Node']/Model$</Name> <Value>$Config/Model$</Value> </Setting> <Setting> <Name>$MPElement[Name='Network!System.NetworkManagement.Node']/Vendor$</Name> <Value>$Config/Vendor$</Value> </Setting> </Settings> </InstanceSettings> </ConditionDetection> </MemberModules> <Composition> <Node ID="MapToDiscovery"> <Node ID="Scheduler" /> </Node> </Composition> </Composite> </ModuleImplementation> <OutputType>System!System.Discovery.Data</OutputType> </DataSourceModuleType> </ModuleTypes>

 

The above datasource is probably the most complicated part of this.  We are creating a composite DS, combining the System.Discovery.Scheduler module, with the System.Discovery.FilteredClassSnapshotDataMapper module.

The scheduler is simple – we pass in the interval.

The System.Discovery.FilteredClassSnapshotDataMapper is more complicated – it basically allows you to create a filtered discovery of existing objects, based on an expression matching on a class property.  In this case, if the System OID equals a specific OID we pass in the discovery, it is a match and we will create an instance of the class.  Since all your desired network devices will share a common System OID, this is the perfect property to match on.

In this DS, I also included the ability to pass the Model and Vendor – you can inherit whatever is present from the Node property if the discovered network device is CERTIFIED, or provide your own custom ones in the discovery, if GENERIC.

 

Last – we define our discovery.

 

<Discoveries> <Discovery ID="Example.Network.Device.Discovery" Enabled="true" ConfirmDelivery="false" Remotable="true" Priority="Normal" Target="Network!System.NetworkManagement.Node"> <Category>Discovery</Category> <DiscoveryTypes> <DiscoveryClass TypeID="Example.Network.Device" /> </DiscoveryTypes> <DataSource ID="DS" TypeID="Example.Network.Device.Discovery.DS"> <IntervalSeconds>14400</IntervalSeconds> <SyncTime /> <OID>.1.3.6.1.4.1.8072.3.2.10</OID> <DisplayName>$Target/Property[Type="Network!System.NetworkManagement.Node"]/sysName$</DisplayName> <Model>$Target/Property[Type="Network!System.NetworkManagement.Node"]/Model$</Model> <Vendor>$Target/Property[Type="Network!System.NetworkManagement.Node"]/Vendor$</Vendor> </DataSource> </Discovery> </Discoveries>

 

The discovery is simple – we simply call the datasource module, and pass any necessary parameters to the discovery.  In this case, each discovery will include a Class Type which we are trying to discover, a System OID for the device type/class, map the existing display name, and then include the model and vendor.  You can hard code the model and vendor as text in each discovery if desired.  The OID in my example is for a Linux system, you will need to change this.

You should add multiple discoveries for each different class type you want to create.  These can be placed in unique MP’s for each network device type, or combine them all into one MP, up to you.

 

You can download a copy of the entire example mp here:

https://gallery.technet.microsoft.com/SCOM-Custom-Network-device-b2b16959

SQL MP Run As Accounts – NO LONGER REQUIRED

$
0
0

 

image             image

 

Over the years I have written many articles dealing with RunAs accounts.  Specifically, the most common need is for monitoring with the SQL MP.  I have explained the issues and configurations in detail here:  Configuring Run As Accounts and Profiles in OpsMgr – A SQL Management Pack Example

 

Later, I wrote an automation solution to script the biggest pain point of RunAs accounts:  distributing them, here:  Automating Run As Account Distribution – Finally!  Then – took it a step further, and built this automation into a management pack here:  Update-  Automating Run As Account distribution dynamically

 

Now – I want to show a different approach to configuring monitoring for the SQL MP, which might make life a lot simpler for SCOM admins, and SQL teams.

 

What if I told you – there was a way to not have to mess with RunAs accounts and the SQL MP at all?  No creating the accounts, no distributing them, no associating them with the profiles – none of that?    Interested?   Then read on.

 

The big challenge in SQL monitoring is that the SCOM agent runs as LocalSystem for the default agent action account.  However, LocalSystem does not have full rights to SQL server, and should not ever be granted the SysAdmin role in SQL.  This is because the LocalSystem account is quite easy to impersonate to anyone who already has admin rights to the OS.

We can solve this challenge, by introducing Service SID’s.  SQL already uses Service Security Identifiers (SID’s) to grant access for the service running SQL server, to the SQL instance.  You can read more about that here:  https://support.microsoft.com/en-us/kb/2620201

Service SID’s were introduced in Windows Server 2008 and later.

 

We can do the same thing for the SCOM Healthservice.  This idea was brought to me by a fellow MS consultant – Ralph Kyttle.  He pointed out, this is exactly how OMS works to gather data about SQL server.  We have an article describing this recommended configuration here:  https://support.microsoft.com/en-us/kb/2667175

 

Essentially – this can be accomplished in two steps:

  1. Enable the HealthService to be able to use a service SID.
  2. Create a login for the HealthService SID to be able to access SQL server.

 

That’s it!

This creates a login in SQL, and allows the SCOM agent to be able to monitor SQL server, without having to maintain another credential, deal with password changes, and removes the security concern of a compromised RunAs account being able to access every SQL server in the company!  No more configuration, no more credential distribution.

 

I even wrote a Management Pack to make setting this initial configuration up much simpler.  Let me demonstrate:

 

First, we need to ensure that all SCOM agents, where SQL is discovered – have the service SID enabled.  I wrote a monitor to detect when this is not configured, and targeted the SQL SEED classes:

image

 

This monitor will show a warning state when the Service SID is not configured, and will generate a warning alert:

 

image

 

The monitor has a script recovery action, which is disabled by default.  You can enable this and it will automatically configure this as soon as SQL is detected, and will restart the agent.

 

image

 

Alternatively – I wrote two tasks you can run – the second one configures the service SID, but will wait for the next reboot (or service restart) before this actually becomes active.  The first task configures the service AND then restarts the agent Healthservice:

 

image

 

Here is what it looks like in action:

 

image

 

So – once that is complete – we can create the login for SQL.

If you switch to the SQL instances view, or a Database Engine view – you will see a new task show up which will create a SQL login for the HealthService.

 

image

 

If you run this task, and don’t have rights to the SQL server – you will get this:

 

image

 

Have your SQL team run the task and provide a credential to the task that will be able to create a login and assign the necessary SysAdmin role to the service:

 

image

 

Voila!

 

image

 

What this actually does – is create this login on the SQL server and set it to SysAdmin role:

 

image

 

All of these activities are logged for audit in the Task Status view:

 

image

 

Now – as new SQL servers are added over time – the Service SID can automatically be configured using the recovery, and the SQL team will just need to add the HealthService login as part of their build configuration, or run this task one time for each new SQL server to enable it for monitoring.

 

I find this to be much simpler than dealing with RunAs accounts, and it appears to be a more secure solution as well.  I welcome any feedback on this approach, or for my Management Pack Addendum.

 

I have included my SQL RunAs Addendum MP’s to be available below:

 

https://gallery.technet.microsoft.com/SQL-Server-RunAs-Addendum-0c183c32


UR9 for SCOM 2012 R2 – Step by Step

$
0
0

image48

 

This is an updated article replacing the original – to include the deployment of the Linux MP’s which shipped later.  Since Microsoft changed blog platforms over to WordPress – it will not allow me to update the previous one.

 

NOTE:  I get this question every time we release an update rollup:   ALL SCOM Update Rollups are CUMULATIVE.  This means you do not need to apply them in order, you can always just apply the latest update.  If you have deployed SCOM 2012R2 and never applied an update rollup – you can go strait to the latest one available.  If you applied an older one (such as UR3) you can always go straight to the latest one!

 

 

KB Article for OpsMgr:  https://support.microsoft.com/en-us/kb/3129774

KB article for Linux updates:  https://support.microsoft.com/en-us/kb/3141435

Download catalog site:  http://catalog.update.microsoft.com/v7/site/Search.aspx?q=3129774

 

Key fixes:

  • SharePoint workflows fail with an access violation under APM
    A certain sequence of the events may trigger an access violation in APM code when it tries to read data from the cache during the Application Domain unload. This fix resolves this kind of behavior.
  • Application Pool worker process crashes under APM with heap corruption
    During the Application Domain unload two threads might try to dispose of the same memory block leading to DOUBLE FREE heap corruption. This fix makes sure that memory is disposed of only one time.
  • Some Application Pool worker processes become unresponsive if many applications are started under APM at the same time
    Microsoft Monitoring Agent APM service has a critical section around WMI queries it performs. If a WMI query takes a long time to complete, many worker processes are waiting for the active one to complete the call. Those application pools may become unresponsive, depending on the wait duration. This fix eliminates the need in WMI query and significantly improves the performance of this code path.
  • MOMAgent cannot validate RunAs Account if only RODC is available
    If there’s a read-only domain controller (RODC), the MonAgent cannot validate the RunAs account. This fix resolves this issue.
  • Missing event monitor does not warn within the specified time range in SCOM 2012 R2 the first time after restart
    When you create a monitor for a missed event, the first alert takes twice the amount of time specified time in the monitor. This fix resolves the issue, and the alert is generated in the time specified.
  • SCOM cannot verify the User Account / Password expiration date if it is set by using Password Setting object
    Fine grained password policies are stored in a different container from the user object container in Active Directory. This fix resolves the problems in computing resultant set of policy (RSOP) from these containers for a user object.
  • SLO Detail report displays histogram incorrectly
    In some specific scenarios, the representation of the downtime graph is not displayed correctly. This fix resolves this kind of behavior.
  • APM support for IIS 10 and Windows Server 2016
    Support of IIS 10 on Windows Server 2016 is added for the APM feature in System Center 2012 R2 Operations Manager. An additional management pack Microsoft.SystemCenter.Apm.Web.IIS10.mp is required to enable this functionality. This management pack is located in %SystemDrive%\Program Files\System Center 2012 R2\Operations Manager\Server\Management Packs for Update Rollups alongside its dependencies after the installation of Update Rollup 9.
    Important Note One dependency is not included in Update Rollup 9 and should be downloaded separately:

    Microsoft.Windows.InternetInformationServices.2016.mp

  • APM Agent Modules workflow fail during workflow shutdown with Null Reference Exception
    The Dispose() method of Retry Manager of APM connection workflow is executed two times during the module shutdown. The second try to execute this Dispose() method may cause a Null Reference Exception. This fix makes sure that the Dispose() method can be safely executed one or more times.
  • AEM Data fills up SCOM Operational database and is never groomed out
    If you use SCOM’s Agentless Exception Monitoring to examine application crash data and report on it, the data never grooms out of the SCOM Operational database. The problem with this is that soon the SCOM environment will be overloaded with all the instances and relationships of the applications, error groups, and Windows-based computers, all which are hosted by the management servers. This fix resolves this issue. Additionally, the following management pack’s must be imported in the following order:
    • Microsoft.SystemCenter.ClientMonitoring.Library.mp
    • Microsoft.SystemCenter.DataWarehouse.Report.Library.mp
    • Microsoft.SystemCenter.ClientMonitoring.Views.Internal.mp
    • Microsoft.SystemCenter.ClientMonitoring.Internal.mp
  • The DownTime report from the Availability report does not handle the Business Hours settings
    In the downtime report, the downtime table was not considering the business hours. This fix resolves this issue and business hours will be shown based on the specified business hour values.
    The updated RDL files are located in the following location:

    %SystemDrive%\Program Files\Microsoft System Center 2012 R2\Operations Manager\Server\Reporting

    To update the RDL file, follow these steps:

    1. Go to http://MachineName/Reports_INSTANCE1/Pages/Folder.aspxMachineName //Reporting Server.
    2. On this page, go to the folder to which you want to add the RDL file. In this case, click Microsoft.SystemCenter.DataWarehouse.Report.Library.
    3. Upload the new RDL files by clicking the upload button at the top. For more information, see https://msdn.microsoft.com/en-us/library/ms157332.aspx.
  • Adding a decimal sign in an SLT Collection Rule SLO in the ENU Console on a non-ENU OS does not work
    You run the System Center 2012 R2 Operations Manager Console in English on a computer that has the language settings configured to use a non-English (United States) language that uses a comma (,) as the decimal sign instead of a period (.). When you try to create Service Level Tracking, and you want to add a Collection Rule SLO, the value you enter as the threshold cannot be configured by using a decimal sign. This fix resolves the issue.
  • SCOM Agent issue while logging Operations Management Suite (OMS) communication failure
    An issue occurs when OMS communication failures are logged. This fix resolves this issue.

 

Issues that are fixed in the UNIX and Linux management packs

 

  • Discovery of Linux computers may fail for some system locales
    Using the Discovery Wizard or Windows PowerShell cmdlets to discover Linux computers may fail during the final Agent Verification step for computers that have some system locales, such as zh_TW.UTF-8. The scxadmin command that is used to restart the agent during the discovery process did not correctly handle Unicode text in the standard out-of-the-service command.
  • The UNIX/Linux Agent intermittently closes connections during TLS handshaking
    Symptoms include the following:
    • Failed heartbeats for UNIX or Linux computers, especially when the SSLv3 protocol is disabled on the Management Servers.
    • Schannel errors in the System log that contain text that resembles the following:

      A fatal error occurred while creating an SSL client credentials. The internal error state is 10013.

    • WS-Management errors in the event log that contain text that resembles the following:

      WSManFault
      Message = The server certificate on the destination computer (<UNIX/LINUX-COMPUTER-NAME) has the following errors:
      Encountered an internal error in the SSL library.
      Error number: -2147012721 0x80072F8F
      A security error occurred

 

 

Lets get started.

From reading the KB article – the order of operations is:

  1. Install the update rollup package on the following server infrastructure:
    • Management servers
    • Gateway servers
    • Web console server role computers
    • Operations console role computers
  2. Apply SQL scripts.
  3. Manually import the management packs.
  4. Update Agents

Now, NORMALLY we need to add another step – if we are using Xplat monitoring – need to update the Linux/Unix MP’s and agents.   However, in UR8 and UR9 for SCOM 2012 R2, there are no updates for Linux

 

 

 

1.  Management Servers

image

Since there is no RMS anymore, it doesn’t matter which management server I start with.  There is no need to begin with whomever holds the RMSe role.  I simply make sure I only patch one management server at a time to allow for agent failover without overloading any single management server.

I can apply this update manually via the MSP files, or I can use Windows Update.  I have 3 management servers, so I will demonstrate both.  I will do the first management server manually.  This management server holds 3 roles, and each must be patched:  Management Server, Web Console, and Console.

The first thing I do when I download the updates from the catalog, is copy the cab files for my language to a single location:

Then extract the contents:

image

Once I have the MSP files, I am ready to start applying the update to each server by role.

***Note:  You MUST log on to each server role as a Local Administrator, SCOM Admin, AND your account must also have System Administrator (SA) role to the database instances that host your OpsMgr databases.

My first server is a management server, and the web console, and has the OpsMgr console installed, so I copy those update files locally, and execute them per the KB, from an elevated command prompt:

image

This launches a quick UI which applies the update.  It will bounce the SCOM services as well.  The update usually does not provide any feedback that it had success or failure. 

I got a prompt to restart:

image

I choose yes and allow the server to restart to complete the update.

 

You can check the application log for the MsiInstaller events to show completion:

Log Name:      Application
Source:        MsiInstaller
Date:          1/27/2016 9:37:28 AM
Event ID:      1036
Description:
Windows Installer installed an update. Product Name: System Center Operations Manager 2012 Server. Product Version: 7.1.10226.0. Product Language: 1033. Manufacturer: Microsoft Corporation. Update Name: System Center 2012 R2 Operations Manager UR9 Update Patch. Installation success or error status: 0.

You can also spot check a couple DLL files for the file version attribute. 

image

Next up – run the Web Console update:

image

This runs much faster.   A quick file spot check:

image

Lastly – install the console update (make sure your console is closed):

image

A quick file spot check:

image

 

 

Additional Management Servers:

image

I now move on to my additional management servers, applying the server update, then the console update and web console update where applicable.

On this next management server, I will use the example of Windows Update as opposed to manually installing the MSP files.  I check online, and make sure that I have configured Windows Update to give me updates for additional products: 

image

The applicable updates show up under optional – so I tick the boxes and apply these updates.

After a reboot – go back and verify the update was a success by spot checking some file versions like we did above.

 

 

Updating Gateways:

image

I can use Windows Update or manual installation.

image

The update launches a UI and quickly finishes.

Then I will spot check the DLL’s:

image

I can also spot-check the \AgentManagement folder, and make sure my agent update files are dropped here correctly:

image

 

***NOTE:  You can delete any older UR update files from the \AgentManagement directories.  The UR’s do not clean these up and they provide no purpose for being present any longer.

 

 

 

2. Apply the SQL Scripts

In the path on your management servers, where you installed/extracted the update, there are two SQL script files: 

%SystemDrive%\Program Files\Microsoft System Center 2012 R2\Operations Manager\Server\SQL Script for Update Rollups

(note – your path may vary slightly depending on if you have an upgraded environment of clean install)

image

First – let’s run the script to update the OperationsManager database.  Open a SQL management studio query window, connect it to your Operations Manager database, and then open the script file.  Make sure it is pointing to your OperationsManager database, then execute the script.

You should run this script with each UR, even if you ran this on a previous UR.  The script body can change so as a best practice always re-run this.

image

Click the “Execute” button in SQL mgmt. studio.  The execution could take a considerable amount of time and you might see a spike in processor utilization on your SQL database server during this operation.  I have had customers state this takes from a few minutes to as long as an hour. In MOST cases – you will need to shut down the SDK, Config, and Monitoring Agent (healthservice) on ALL your management servers in order for this to be able to run with success.

You will see the following (or similar) output:

image47

or

image

IF YOU GET AN ERROR – STOP!  Do not continue.  Try re-running the script several times until it completes without errors.  In a production environment, you almost certainly have to shut down the services (sdk, config, and healthservice) on your management servers, to break their connection to the databases, to get a successful run.

Technical tidbit:   Even if you previously ran this script in UR1, UR2, UR3, UR4, UR5, UR6, UR7, or UR8, you should run this again for UR9, as the script body can change with updated UR’s.

image

Next, we have a script to run against the warehouse DB.  Do not skip this step under any circumstances.    From:

%SystemDrive%\Program Files\Microsoft System Center 2012 R2\Operations Manager\Server\SQL Script for Update Rollups

(note – your path may vary slightly depending on if you have an upgraded environment of clean install)

Open a SQL management studio query window, connect it to your OperationsManagerDW database, and then open the script file UR_Datawarehouse.sql.  Make sure it is pointing to your OperationsManagerDW database, then execute the script.

If you see a warning about line endings, choose Yes to continue.

image

Click the “Execute” button in SQL mgmt. studio.  The execution could take a considerable amount of time and you might see a spike in processor utilization on your SQL database server during this operation.

You will see the following (or similar) output:

image

 

 

 

3. Manually import the management packs

image

There are 55 management packs in this update!   Most of these we don’t need – so read carefully.

The path for these is on your management server, after you have installed the “Server” update:

\Program Files\Microsoft System Center 2012 R2\Operations Manager\Server\Management Packs for Update Rollups

However, the majority of them are Advisor/OMS, and language specific.  Only import the ones you need, and that are correct for your language.  I will remove all the MP’s for other languages (keeping only ENU), and I am left with the following:

image

 

What NOT to import:

The Advisor MP’s are only needed if you are using Microsoft Operations Management Suite cloud service, (Previously known as Advisor, and Operation Insights).

The APM MP’s are only needed if you are using the APM feature in SCOM.

Note the APM MP with a red X.  This MP requires the IIS MP’s for Windows Server 2016 which are in Technical Preview at the time of this writing.  Only import this if you are using APM *and* you need to monitor Windows Server 2016.  If so, you will need to download and install the technical preview editions of that MP from https://www.microsoft.com/en-us/download/details.aspx?id=48256

The TFS MP bundle is only used for specific scenarios, such as DevOps scenarios where you have integrated APM with TFS, etc.  If you are not currently using these MP’s, there is no need to import or update them.  I’d skip this MP import unless you already have these MP’s present in your environment.

However, the Image and Visualization libraries deal with Dashboard updates, and these always need to be updated.

I import all of these shown without issue.

 

 

4.  Update Agents

image43_thumb

Agents should be placed into pending actions by this update for any agent that was not manually installed (remotely manageable = yes):  

 

One the Management servers where I used Windows Update to patch them, their agents did not show up in this list.  Only agents where I manually patched their management server showed up in this list.  FYI.   The experience is NOT the same when using Windows Update vs manual.  If yours don’t show up – you can try running the update for that management server again – manually.

image

 

If your agents are not placed into pending management – this is generally caused by not running the update from an elevated command prompt, or having manually installed agents which will not be placed into pending.

In this case – my agents that were reporting to a management server that was updated using Windows Update – did NOT place agents into pending.  Only the agents reporting to the management server for which I manually executed the patch worked.

I manually re-ran the server MSP file manually on these management servers, from an elevated command prompt, and they all showed up:

 

 image

 

You can approve these – which will result in a success message once complete:

 

 image

 

Soon you should start to see PatchList getting filled in from the Agents By Version view under Operations Manager monitoring folder in the console:

 

image

 

 

  5.  Update Unix/Linux MPs and Agents

image

The current Linux MP’s can be downloaded from:

https://www.microsoft.com/en-us/download/details.aspx?id=29696

 

7.5.1050.0 is current at this time for SCOM 2012 R2 and these shipped shortly after UR9. 

****Note – take GREAT care when downloading – that you select the correct download for SCOM 2012 R2.  You must scroll down in the list and select the MSI for 2012 R2:

 

 

image

 

Download the MSI and run it.  It will extract the MP’s to C:\Program Files (x86)\System Center Management Packs\System Center 2012 R2 Management Packs for Unix and Linux\

Update any MP’s you are already using.   These are mine for RHEL, SUSE, and the Universal Linux libraries. 

 

image

 

You will likely observe VERY high CPU utilization of your management servers and database server during and immediately following these MP imports.  Give it plenty of time to complete the process of the import and MPB deployments.

 

Next – you need to restart the “Microsoft Monitoring Agent” service on any management servers which manage Linux systems.  I don’t know why – but my MP’s never drop/update in the \Program Files\Microsoft System Center 2012 R2\Operations Manager\Server\AgentManagement\UnixAgents\DownloadedKits filder until this servcie is restarted.

 

Next up – you would upgrade your agents on the Unix/Linux monitored agents.  You can now do this straight from the console:

image 

image

 

You can input credentials or use existing RunAs accounts if those have enough rights to perform this action.

Finally:

 

image

 

 

6.  Update the remaining deployed consoles

image

This is an important step.  I have consoles deployed around my infrastructure – on my Orchestrator server, SCVMM server, on my personal workstation, on all the other SCOM admins on my team, on a Terminal Server we use as a tools machine, etc.  These should all get the matching update version.

 

 

 

Review:

Now at this point, we would check the OpsMgr event logs on our management servers, check for any new or strange alerts coming in, and ensure that there are no issues after the update.

image

Known issues:

See the existing list of known issues documented in the KB article.

1.  Many people are reporting that the SQL script is failing to complete when executed.  You should attempt to run this multiple times until it completes without error.  You might need to stop the Exchange correlation engine, stop all the SCOM services on the management servers, and/or bounce the SQL server services in order to get a successful completion in a busy management group.  The errors reported appear as below:

——————————————————
(1 row(s) affected)
(1 row(s) affected)
Msg 1205, Level 13, State 56, Line 1
Transaction (Process ID 152) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.
Msg 3727, Level 16, State 0, Line 1
Could not drop constraint. See previous errors.
——————————————————–

You probably have a ton of old event data in your Data Warehouse

$
0
0

 

image

 

Prior to SCOM 2012 R2 UR7, we had an issue where we did not groom out old data from the Event Parameter and Event Rule tables in the DW.  This will show up as these tables growing quite large, especially the event parameter tables.  They will never groom out the old, orphaned data.

It isn’t a big deal, but if you’d like to free up some space in your Data Warehouse database – read on.

 

I’ll just go out and say that ANYONE who ever ran a SCOM management group prior to SCOM 2012 R2 UR7, is affected.  How much just depends on how many events you were collecting and shoving into your DW.

Once you apply UR7 or later, this issue stops, and the normal grooming will groom out the data as events get groomed.  HOWEVER – we will never go back and clean out the old, already orphaned event parameters and event rules.

 

Nicole was the first person I saw write about this issue:

https://blogs.msdn.microsoft.com/nicole_welch/2016/01/07/scom-2012-large-event-parameter-tables/

 

Essentially – to know if you are affected, there are some SQL statements you can run…. but I wrote my own.  These take a long time to run – but it gives you an idea of how many events are in scope to be groomed.

 

SELECT count(*) from event.vEventParameter ep WHERE ep.EventOriginId NOT IN (SELECT distinct EventOriginId from event.vEvent) select count(*) from event.vEventRule er WHERE er.EventOriginId NOT IN (SELECT distinct EventOriginId from event.vEvent)

 

 

Nicole has a stored procedure listed on her site – where you can run that to create the stored proc – then use the statement calling the sproc with a “max rows to groom” parameter.   It works well and I recommend it.

 

Alternatively – you can just run this as a straight SQL query.  I will post that below:

I set MaxRowsToGroom hard coded to 1,000,000 rows.  I found this runs pretty quick and doesn’t use a lot of transaction log space.  You can adjust this depending on how much cleanup you need to do if you prefer the query approach, or just use the stored proc and the loop command in the blog post linked above.

 

UPDATE 5/26:  I changed the script to allow for customers who have multiple Event Parameter tables and were running into the error:

Msg 512, Level 16, State 1, Line 7

Subquery returned more than 1 value. This is not permitted when the subquery follows =, !=, <, <= , >, >= or when the subquery is used as an expression.

 

I don’t have a DB with multiple Event Param tables – however I simply dumped the TableGuids to a temp table and loop based on each row in the temp table.  This should work.

 

ALWAYS TAKE A BACKUP OF YOUR DATABASE FIRST!

 

DECLARE @MaxRowsToGroom int ,@RowsDeleted int SET NOCOUNT ON; SET @MaxRowsToGroom = 1000000 DECLARE @RuleTableName sysname ,@DetailTableName sysname ,@ParamTableName sysname ,@DatasetId uniqueidentifier = (select DatasetId from StandardDataset where SchemaName = 'Event') ,@TableGuid uniqueidentifier ,@Statement nvarchar(max) ,@schemaName sysname = 'Event' IF OBJECT_ID('tempdb..#Tables') IS NOT NULL DROP TABLE #Tables SELECT RowNum = ROW_NUMBER() OVER(ORDER BY TableGuid) ,TableGuid INTO #Tables FROM StandardDatasetTableMap where DatasetId = (select DatasetId from StandardDataset where SchemaName = 'Event') DECLARE @MaxRownum INT SET @MaxRownum = (SELECT MAX(RowNum) FROM #Tables) DECLARE @Iter INT SET @Iter = (SELECT MIN(RowNum) FROM #Tables) WHILE @Iter <= @MaxRownum BEGIN SET @TableGuid =(SELECT TableGuid FROM #Tables WHERE RowNum = @Iter) --BEGIN TRY BEGIN TRAN SELECT TOP 1 @RuleTableName = BaseTableName + '_' + REPLACE(CAST(@TableGuid AS varchar(50)), '-', '') FROM StandardDatasetAggregationStorage WHERE (DatasetId = @DatasetId) AND (AggregationTypeId = 0) AND (DependentTableInd = 1) AND (TableTag = 'Rule') SET @Statement = 'DELETE TOP (' + CAST(@MaxRowsToGroom AS varchar(15)) + ')' + ' FROM ' + QUOTENAME(@SchemaName) + '.' + QUOTENAME(@RuleTableName) + ' WHERE (EventOriginId NOT IN (SELECT EventOriginId FROM Event.vEvent)) ' execute (@Statement) SELECT TOP 1 @ParamTableName = BaseTableName + '_' + REPLACE(CAST(@TableGuid AS varchar(50)), '-', '') FROM StandardDatasetAggregationStorage WHERE (DatasetId = @DatasetId) AND (AggregationTypeId = 0) AND (DependentTableInd = 1) AND (TableTag = 'Parameter') SET @Statement = 'DELETE TOP (' + CAST(@MaxRowsToGroom AS varchar(15)) + ')' + ' FROM ' + QUOTENAME(@SchemaName) + '.' + QUOTENAME(@ParamTableName) + ' WHERE (EventOriginId NOT IN (SELECT EventOriginId FROM Event.vEvent)) ' execute (@Statement) SET @RowsDeleted = @@ROWCOUNT COMMIT SET @Iter = @Iter + 1 select @RowsDeleted as RowsDeleted END IF OBJECT_ID('tempdb..#Tables') IS NOT NULL DROP TABLE #Tables

 

 

I do recommend you clean this up.  It doesn’t hurt anything sitting there, other than potentially making any event based reports run slower, but the big impact to me is just dealing with such a large DW, backups, restores, and cost of ownership of a database that big, for little reason.

Make sure you update statistics when you are done, if not also a full DBReindex.   To update statistics – run:    exec sp_updatestats

 

 

Here is an example of my before and after:

Before:

image

 

After:

image

 

Trimmed from 3.3 GB to 117 MB!!!!!   If this were a large production environment, this could be a substantial amount of data.

 

 

And remember – most collected events are worthless to begin with.  As a tuning exercise – I recommend disabling MOST of the out of the box event collections, and also reduce your event retention in the DW:

https://blogs.technet.microsoft.com/kevinholman/2009/11/25/tuning-tip-turning-off-some-over-collection-of-events/

https://blogs.technet.microsoft.com/kevinholman/2010/01/05/understanding-and-modifying-data-warehouse-retention-and-grooming/

Monitoring a file hash using SCOM

$
0
0

 

I had an interesting customer request recently – to monitor for a specific system file, and make SURE it is not a modified/threat file.

 

You can use this as a simple example of a two-state timed script monitor (using vbscript) which demonstrates script arguments, logging, alerting, propertybag outputs, etc.

 

In this case – there is a file located at %windir%\system32\sethc.exe

This is the “Sticky Keys” UI that pops up when you press shift key 5 times.  There are several articles out there on how to create a “back door” to change this file out with cmd.exe, and open a command prompt without logging into a system, if you have access to the console.

In this case – the customer wanted to monitor for any changes to this file. 

I started by writing a script using VBScript, so it will work on Server 2003, 2008, 2008R2, 2012, and 2012R2.  The script calls CertUtil.exe, which will generate the hash for any file.  Then the scripts compares this file hash to a list of “known good” hashes.

The script accepts two arguments, the filepath location, and the comma separated list of known good hashes.

 

' ' File Hash monitoring script ' Kevin Holman ' 5/2016 ' Option Explicit dim oArgs, filepath, paramHashes, oAPI, oBag, strCommand, oShell dim strHashCmd, strHashLine, strHashOut, strHash, HashesArray, Hash, strMatch 'Accept arguments for the file path, and known good hashes in comma delimited format Set oArgs=wscript.arguments filepath = oArgs(0) paramHashes = oArgs(1) 'Load MOMScript API and PropertyBag function Set oAPI = CreateObject("MOM.ScriptAPI") Set oBag = oAPI.CreatePropertyBag() 'Log script event that we are starting task Call oAPI.LogScriptEvent("filehashcheck.vbs", 3322, 0, "Starting hashfile script with filepath: " & filepath & " with known good hashes: " & paramHashes) 'build the command to run for CertUtil strCommand = "%windir%\system32\certutil.exe -hashfile " & filepath 'Create the Wscript Shell object and execute the command Set oShell = WScript.CreateObject("WScript.Shell") Set strHashCmd = oShell.Exec(strCommand) 'Parse the output of CertUtil and output only on the line with the hash Do While Not strHashCmd.StdOut.AtEndOfStream strHashLine = strHashCmd.StdOut.ReadLine() If Instr(strHashLine, "SHA") Then 'skip ElseIf Instr(strHashLine, "CertUtil") Then 'skip Else strHashOut = strHashLine End If Loop 'Remove spaces from the hash strHash = Replace(strHashOut, " ", "") 'Split the comma seperated hashlist parameter into an array HashesArray = split(paramHashes,",") 'Loop through the array and see if our file hash matches any known good hash For Each Hash in HashesArray 'wscript.echo Hash If strHash = Hash Then 'wscript.echo "Match found" Call oAPI.LogScriptEvent("filehashcheck.vbs", 3323, 0, "Good match found. The file " & filepath & " was found to have hash " & strHash & " which was found in the supplied known good hashes: " & paramHashes) Call oBag.AddValue("Match","GoodHashFound") Call oBag.AddValue("CurrentFileHash",strHash) Call oBag.AddValue("FilePath",filepath) Call oBag.AddValue("GoodHashList",paramHashes) oAPI.Return(oBag) wscript.quit Else 'wscript.echo "Match not found" strMatch = "missing" End If Next 'If we get to this part of the script a hash was not found. Output a bad propertybag If strMatch = "missing" Then Call oAPI.LogScriptEvent("filehashcheck.vbs", 3324, 2, "This file " & filepath & " does not match any known good hashes. It was found to have hash " & strHash & " which was NOT found in the supplied known good hashes: " & paramHashes) Call oBag.AddValue("Match","HashNotFound") Call oBag.AddValue("CurrentFileHash",strHash) Call oBag.AddValue("FilePath",filepath) Call oBag.AddValue("GoodHashList",paramHashes) oAPI.Return(oBag) End If wscript.quit

 

I then put this script into a two-state monitor targeting Windows Server OperatingSystem, so every monitored server will run it once a day, and check to see if the supplied file is correct, or if a vulnerability might exist.

 

Here is the Monitor example:

 

<UnitMonitor ID="Custom.HashFile.CompareHash.Monitor" Accessibility="Public" Enabled="true" Target="Windows!Microsoft.Windows.Server.OperatingSystem" ParentMonitorID="Health!System.Health.SecurityState" Remotable="true" Priority="Normal" TypeID="Windows!Microsoft.Windows.TimedScript.TwoStateMonitorType" ConfirmDelivery="false"> <Category>SecurityHealth</Category> <AlertSettings AlertMessage="Custom.HashFile.CompareHash.Monitor.AlertMessage"> <AlertOnState>Warning</AlertOnState> <AutoResolve>true</AutoResolve> <AlertPriority>Normal</AlertPriority> <AlertSeverity>Warning</AlertSeverity> <AlertParameters> <AlertParameter1>$Target/Host/Property[Type="Windows!Microsoft.Windows.Computer"]/NetworkName$</AlertParameter1> <AlertParameter2>$Data/Context/Property[@Name='FilePath']$</AlertParameter2> <AlertParameter3>$Data/Context/Property[@Name='CurrentFileHash']$</AlertParameter3> <AlertParameter4>$Data/Context/Property[@Name='GoodHashList']$</AlertParameter4> </AlertParameters> </AlertSettings> <OperationalStates> <OperationalState ID="GoodHashFound" MonitorTypeStateID="Success" HealthState="Success" /> <OperationalState ID="HashNotFound" MonitorTypeStateID="Error" HealthState="Warning" /> </OperationalStates> <Configuration> <IntervalSeconds>86321</IntervalSeconds> <SyncTime /> <ScriptName>FileHashCheck.vbs</ScriptName> <Arguments>filepath hashlist</Arguments> <ScriptBody><![CDATA[' ' File Hash monitoring script ' Kevin Holman ' 5/2016 ' Option Explicit dim oArgs, filepath, paramHashes, oAPI, oBag, strCommand, oShell dim strHashCmd, strHashLine, strHashOut, strHash, HashesArray, Hash, strMatch 'Accept arguments for the file path, and known good hashes in comma delimited format Set oArgs=wscript.arguments filepath = oArgs(0) paramHashes = oArgs(1) 'Load MOMScript API and PropertyBag function Set oAPI = CreateObject("MOM.ScriptAPI") Set oBag = oAPI.CreatePropertyBag() 'Log script event that we are starting task Call oAPI.LogScriptEvent("filehashcheck.vbs", 3322, 0, "Starting hashfile script with filepath: " & filepath & " with known good hashes: " & paramHashes) 'build the command to run for CertUtil strCommand = "%windir%\system32\certutil.exe -hashfile " & filepath 'Create the Wscript Shell object and execute the command Set oShell = WScript.CreateObject("WScript.Shell") Set strHashCmd = oShell.Exec(strCommand) 'Parse the output of CertUtil and output only on the line with the hash Do While Not strHashCmd.StdOut.AtEndOfStream strHashLine = strHashCmd.StdOut.ReadLine() If Instr(strHashLine, "SHA") Then 'skip ElseIf Instr(strHashLine, "CertUtil") Then 'skip Else strHashOut = strHashLine End If Loop 'Remove spaces from the hash strHash = Replace(strHashOut, " ", "") 'Split the comma seperated hashlist parameter into an array HashesArray = split(paramHashes,",") 'Loop through the array and see if our file hash matches any known good hash For Each Hash in HashesArray 'wscript.echo Hash If strHash = Hash Then 'wscript.echo "Match found" Call oAPI.LogScriptEvent("filehashcheck.vbs", 3323, 0, "Good match found. The file " & filepath & " was found to have hash " & strHash & " which was found in the supplied known good hashes: " & paramHashes) Call oBag.AddValue("Match","GoodHashFound") Call oBag.AddValue("CurrentFileHash",strHash) Call oBag.AddValue("FilePath",filepath) Call oBag.AddValue("GoodHashList",paramHashes) oAPI.Return(oBag) wscript.quit Else 'wscript.echo "Match not found" strMatch = "missing" End If Next 'If we get to this part of the script a hash was not found. Output a bad propertybag If strMatch = "missing" Then Call oAPI.LogScriptEvent("filehashcheck.vbs", 3324, 2, "This file " & filepath & " does not match any known good hashes. It was found to have hash " & strHash & " which was NOT found in the supplied known good hashes: " & paramHashes) Call oBag.AddValue("Match","HashNotFound") Call oBag.AddValue("CurrentFileHash",strHash) Call oBag.AddValue("FilePath",filepath) Call oBag.AddValue("GoodHashList",paramHashes) oAPI.Return(oBag) End If wscript.quit]]></ScriptBody> <TimeoutSeconds>60</TimeoutSeconds> <ErrorExpression> <SimpleExpression> <ValueExpression> <XPathQuery Type="String">Property[@Name='Match']</XPathQuery> </ValueExpression> <Operator>Equal</Operator> <ValueExpression> <Value Type="String">HashNotFound</Value> </ValueExpression> </SimpleExpression> </ErrorExpression> <SuccessExpression> <SimpleExpression> <ValueExpression> <XPathQuery Type="String">Property[@Name='Match']</XPathQuery> </ValueExpression> <Operator>Equal</Operator> <ValueExpression> <Value Type="String">GoodHashFound</Value> </ValueExpression> </SimpleExpression> </SuccessExpression> </Configuration> </UnitMonitor>

 

Lastly – I create an override for the monitor – which allows you to specify the file, and the known good hash list, which appears like this:

 

image

 

When a bad hash is detected – we generate an alert:

 

image

 

And Health Explorer provides good context:

 

image

 

We also do logging for the script when it starts, and the output:

 

Log Name:      Operations Manager
Source:        Health Service Script
Date:          5/26/2016 11:45:55 AM
Event ID:      3324
Task Category: None
Level:         Warning
Keywords:      Classic
User:          N/A
Computer:      WINS2008R2.opsmgr.net
Description:
filehashcheck.vbs : This file C:\Windows\system32\sethc.exe does not match any known good hashes.  It was found to have hash 0f3c4ff28f354aede202d54e9d1c5529a3bf87d8 which was NOT found in the supplied known good hashes: 167891d5ef9a442cce490e7e317bfd24a623ee12,81de6ab557b31b8c34800c3a4150be6740ef445a

 

 

The download of the complete management pack is available at:

https://gallery.technet.microsoft.com/Management-Pack-to-Monitor-153d8cfa

How to change the SCOM agent heartbeat interval in PowerShell

$
0
0

 

Perhaps you have a special group of servers that are on poorly connected network segments, but most of your servers are in datacenters.  You may want to set the default heartbeat interval higher for these specific agents, so they are less likely to create heartbeat failures.  You can do this easily in the UI, but there isn’t a simple cmdlet to do this for a group of agents.

Here is a method you can use:

$agent = get-scomagent | where {$_.DisplayName -eq 'yourspecialsnowflake.domain.com'} $agent.HeartbeatInterval = 360 $agent.ApplyChanges()

 

In this example – you might set this differently for all the agents in your DMZ domain:

$agents = get-scomagent | where {$_.domain -eq 'DMZ'} foreach ($agent in $agents) { $agent.HeartbeatInterval = 360 } $agent.ApplyChanges()

Follow me on twitter

Viewing all 127 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>