20110301: MI token bloat

This document is intended to address concerns and issues connected with a MI user or computer having a large number of groups in their logon token. For a more technical details and info, the reader can see the extra resources in the More Info section at the bottom of this document.

Background

Legacy Windows logon token design together with the Kerberos standard max token size combine to give a Windows logon token an upper size limit. When this limit is exceeded but more data needs to be added to the logon token, logon fails for that user. The Windows logon token contains a list of security identifiers (called SIDs) which list what groups the user is a member of and what privileges the user has on the local computer. A Windows logon token is created on *each* Windows computer that a Windows user interacts with, and each of these different Windows logon tokens can have a different list of security identifiers dependent on the privileges and groups only present on that local computer. Another factor which complicates the issue is nested groups. The security identifier of each nested group the user is a member of is also included in the logon token, meaning that you can’t just look at the direct memberships of a given user to identify if a given user could have a problem, but rather you need the effective group memberships. Additionally, the type of logon and the type of groups affect the logon token size. So a given Windows user at any given moment doesn’t necessarily have a single logon token size; instead the logon token size depends on where the user is logging in. This means that a user might exceed the limit on one computer but not another. However, the size doesn’t typically vary a great deal so usually generalizations can be made. One generalization often made is the assertion that a Windows user can be a member of up to 1024 groups before encountering this issue.

Do we have this problem? (or Monitoring the Presence of Problems)

Using the formula documented in KB327825, starting with the standard 12000 byte limit minus the typical 1200 byte overhead, leaves 10800 bytes for group memberships. Over 99.9% of groups in MI are universal groups. This means that 99.9% of MI groups count 8 bytes against the limit. That means a MI user can roughly have 1350 group memberships before bumping into the token limit. Kerberos places an additional limit on number of groups limiting to 1024 groups. Whether 1350 or 1024, this is a very large number of groups, meaning that a given MI user would need to be in almost 2% of all existing MI groups. Since 71% of all MI groups are course groups, with very structured membership, it is quite unlikely that many MI users are over this limit, but it is remotely possible.

This analysis is based on the premise that a MI user account is used on a computer/service based in the netid.washington.edu domain. If the MI user account is used on a computer/service in another domain, then our ability to predict the likelihood is reduced. This is because the MI user account may have group memberships in that domain, and these other domain group memberships are likely to be the more costly kind, five times more costly than the typical MI group membership. So when a MI user account is used across a trust, there is more of a possibility that a MI user with MI group memberships numbering in the hundreds might run into a problem, but it’s still expected to be pretty uncommon.

More broadly, MI code called effectiveMemberships will be able to determine the approximate logon token size for all MI users, making a minimum of assumptions/generalizations. This code is written in-house and may be of great interest elsewhere since nothing quite like it exists. After this code is available (in development as of 3/8/11), we should be better able periodically run this code to monitor the status in the future.

Based on a one-time report using the Groups Service as a data source early in March 2011, it was found that no UW NetID had more than 450 effective group memberships, so no existing MI users have this problem. 99.98% of UW NetIDs had fewer than 100 effective group memberships. Of those with more than 100 effective group memberships, only 8 had more than 200, so in general, this doesn’t appear to be a problem in need of an urgent solution without a great deal of increased group use. Of the 8 with the most effective group memberships, a very high percentage of those group memberships are from u_netid_* groups.

That said, the Groups Service does represent an interface without any checks that could be used to temporarily deny service to specific MI users or computers. When those MI users or computers represent the MI service itself or widely used services within MI, then the impact of malicious or accidental actions via the Groups Service is high. So while it’s unlikely MI will run into a token bloat problem any time soon inadvertently via normal use, there is a targeted risk for sensitive MI users and computers.

What We Can Do to Fix or Minimize This Problem

Fundamentally, there is a technology design problem here which is triggered by a common and greatly useful functionality, i.e. groups. It’s unlikely Microsoft will expend a great deal of energy changing the technology design on our behalf. And it’s unreasonable to stop or halt group use in MI. So the available solutions are similar to most resource exhaustion problems-put succinctly in the common saying of reduce, reuse, recycle.

There are a number of possible actions which may help to reduce the criticality of this potential issue. Among them are:

UW Groups service could enable a MI affiliate, along with adoption of use of this affiliate and deployment of near real-time group sync
In concert with the above item, identification of groups which have no value in MI and removal of their MI affiliate status would overall reduce MI user token size. An example of a set of groups already identified is the u_netid_* groups, which represent Shared UW NetID owners
Addition of self-removal features in the UW Groups service would reduce the friction required to deprovision undesired group memberships on your own MI users
Addition of additional membership validation features in the UW Groups service could help to prevent individual MI users from going over a certain threshold
Analysis of the use of nested groups for emerging institutional and data driven groups and its impact on this problem may help guide the formation of those groups to help reduce their impact, either by reducing the use of nested groups, more selective enablement of the MI affiliate status of nested groups where not necessary (along with flattening via the MI Group Sync agent), or other changes
Provide a way to protect system critical MI users and computers from being added to too many groups or require approval for adding these restricted MI users and computers to groups. Critical MI users and computers might include things like eadm users, domain controllers, Exchange and SharePoint computers and other MI users that have the potential to impact large numbers of users

Next Steps

MI will put in place effectiveMemberships, run on some periodic basis (TBD), allowing us to get an ongoing sense of the criticality of this problem and monitor its status in the future.

Which of the additional actions will be the most cost-effective is unknown, partially because how much effort is needed to realize these actions is unknown. Since this does not appear to generally be an issue, only the least costly and those that reduce the most risky outcomes should be pursued at this time. Two actions that may meet this criteria are:

Provide protection for system critical MI users and computers
Deprovision to MI the subset of undesired group memberships which is least costly to deprovision, e.g. u_netid_* groups

More Info

Some great technical resources:

Good blog summary: http://derek858.blogspot.com/2010/09/alka-seltzer-for-your-windows-token.html

Outcome

Periodically, the Groups Service runs a report on total group memberships by user. The MI service has not completed effectiveMemberships.