IT Connect

Information technology tools and resources at the UW

Identity Data Mapping

This document is intended for IT Professionals seeking to understand how “whitepage” identity data is mapped to the NETID domain.

NOTE: This document has been updated to reflect a change being implemented on March 1, 2017. Until that time, the “Preferred Name” values are not used, and the pdsDisplayName value has some casing (which used to be documented here).

Background

Microsoft Identity Manager (MIM) is one of two Microsoft Infrastructure components which can set directory information on NETID user accounts. This document only covers MIM behavior. The NETID Kiwi component also sets some directory information, upon Kiwi events. The reader should consult the NETID Kiwi documentation to be aware of changes which might intersect with what is represented here.

Table Legend

The following table shows which PDS attributes are connected to which NETID domain Active Directory (AD) attributes. However, this table doesn’t tell the full story. In most cases, a direct mapping is involved, meaning that the PDS attribute value is unchanged and set on the AD attribute. The direct mappings have normal text. However, this is not true of all cases. For the cases where it is not true, a more complicated algorithm is used to determine the NETID value based on input from the PDS value(s). The more complex mappings are in italics and explained below.

Publishing Flag and Multiple Affiliations

In almost every case, you’ll see that a PDS attribute with “publish” in the name is present. These “publish” attributes are sometimes called publishing flags. They govern whether the individual has allowed a set of information about them to be published. If the individual has chosen to allow publishing, then we make use of that information. However, in some cases it’s more complicated, because there are multiple publishing flags and potentially several sets of personal data depending on the affiliation of the person. In the case where a person is both staff and student, the staff personal information is preferred. If only one of the two flags permits publishing, then only that set of information is used.

PDS Active Directory (NETID)
eduPersonAffiliation eduPersonAffiliation
uidNumber uidNumber
uwRegID uwRegID
uwPriorRegID uwPriorRegID
uwPriorNetID uwPriorNetID
uwWPPublish extensionAttribute1
uwTest uwTest
uwNetID
displayName
uwSWPPublish + uwSWPName
uwEWPPublish + uwEWPName
UWWI: uwDisplayNameOverride
displayName
uwSWPPublish + uwSWPName
uwEWPPublish + uwEWPName
uwPersonPreferredFirst

MI: uwDisplayNameOverride
givenName
uwNetID
displayName
uwSWPPublish + uwSWPName
uwEWPPublish + uwEWPName
uwPersonPreferredSurname

MI: uwDisplayNameOverride
sn
uwSWPPublish + uwSWPName
uwEWPPublish + uwEWPName
uwPersonPreferredMiddle

MI: uwDisplayNameOverride
initials
uwEWPPublish
uwEWPTitle1
uwSWPPublish
uwSWPClass
title
uwEWPPublish
uwEWPPhone1
uwSWPPublish
uwSWPPhone
telephoneNumber
uwEWPPublish
uwEWPDept1
uwSWPPublish
uwSWPDept1
department
uwEWPPublish
uwEWPAddr1
streetAddress
uwEWPPublish
uwEWPFacsimile
facsimileTelephoneNumber
uwEWPPublish
uwEmployeeMailstop
physicalDeliveryOfficeName

Complex Attribute Mappings

For naming attributes, a complicated algorithm is used to determine the NETID value based on input from the data sources available. The more complex mappings are italicized in the mapping table above.

Name Attributes

Microsoft Infrastructure “Name Attributes” for the purposes of this discussion:

  • givenName
  • initials
  • sn
  • displayName

Exchange, Sharepoint, Skype for Business, and other applications in the Office 365 suite leverage the Microsoft Infrastructure name attributes. There are many other applications which also do their UW NetID identity integration via Microsoft Infrastructure user accounts.

Name Sources

There are many possible data sources for name information (in descending order of priority):

  • NETID’s custom uwDisplayNameOverride
  • Preferred Name
    • uwPersonPreferredFirst
    • uwPersonPreferredMiddle
    • uwPersonPreferredSurname
  • uwEWPName
  • uwSWPName
  • uwRegisteredName
    • uwRegisteredFirstMiddle
    • uwRegisteredSurname
  • displayName (from PDS)
  • uwNetID

The uwDisplayNameOverride attribute comes from NETID, and represents an option of last resort for cases where there is a clear business need to override all other data sources. It requires approval from the Microsoft Infrastructure service manager.

Preferred Name, available via a self-service management portal at https://identity.uw.edu, provides a central authority for UW identities to assert name values of their preference for use in UW systems.

The uwEWPName value comes from the Employee Self-Service (ESS) application and is only available to current employees. That data source does minimal data input validation so the formatting is not consistent.

The uwSWPName comes from the UW Registrar (for students). There is no digital interface to make changes–you must go in person to their offices. This option is only available to current students, not former students.

The uwRegistered* attributes usually reflect the official name of record, but are case insensitive and there is no clear authority to publish that data, so using it carries an associated privacy risk. For this reason, we do not use the uwRegistered* attributes.

The displayName source from PDS is currently the only data source available for non-personal UW NetIDs. This data source is *ONLY* used if the corresponding account is not a personal UW NetID. For non-personal UW NetIDs this value comes from the AVF source system. If you want to change this value for a non-personal UW NetID, you can use https://uwnetid.washington.edu/manage and the Name field exposed there.

The uwNetID value provides a default value if no other data source is present.

The availability of each of the name attributes is dependent on a variety of factors. For example, the affiliation of the UW NetID can limit what is possible, e.g. only students can have uwSWPName values. Whether the corresponding publish flag is set (uwEWPPublish and uwSWPPublish) restricts our ability to use the corresponding name value (uwEWPName and uwSWPName). If a UW NetID has not elected to have their information available in one of the information sets (Employee or Student), or the corresponding publish flag indicates we can’t use that data, then that corresponding data will be treated as not available.

Preferred Data Source For Each Name Attribute

The following list shows the preferred data sources if “available” in order, from most preferred (on the left) to least preferred (on the right):

  • givenName <- uwDisplayNameOverride, uwPersonPreferredFirst, uwEWPName, uwSWPName
  • sn <- uwDisplayNameOverride, uwPersonPreferredSurname, uwEWPName, uwSWPName, pdsDisplayName, uwNetID
  • initials <- uwDisplayNameOverride, uwPersonPreferredMiddle, uwEWPName, uwSWPName
  • displayName <- uwDisplayNameOverride, uwEWPName, uwSWPName, pdsDisplayName, uwNetID

So for example, the sn attribute will use the uwDisplayNameOverride source if available, if not, then it’ll use the uwPersonPreferredSurname if available, if not, then it’ll use the uwEWPName or uwSWPName source if available, if not, then it’ll use the pdsDisplayName source if available, if not, then it’ll use the uwNetID source.

Note that it is possible that givenName and initials are not set at all because there are no available data sources. Also note that sn and displayName are always set because there is always a uwNetID value.

Additional Processing For uwEWPName and uwSWPName

For the uwEWPName and uwSWPName sources some processing is required to determine what name values are used. This processing is equivalent to:

  1. A value is taken from the data source
  2. The value is parsed. That value could be formatted as either:
    (a) “Last Name, First Name Middle Name” or
    (b) “First Name Middle Name Last Name”.
    If a comma is present, then the first form (a) is assumed. Otherwise the second form (b) is assumed.The surname is then extracted from the value using the form already determined. This may result in incorrect values for surname if the surname is composed of multiple parts separated by spaces. To mitigate this, if the second form is encountered, the last part of the name is compared to the uwRegisteredSurname and if they match (case insensitive) then that portion of the name is deemed the sn.After parsing the sn value, the remaining portion of the name is parsed for givenName and initials. The first portion of the remainder is assumed to be the givenName, and the first character of the remainder is assumed to be the middle initial.
  3. The displayName will be set as givenName + “ “ +  (initials + “. ”) + sn, where empty values of any of those are dropped and the corresponding spaces if necessary.

Note that multipart first names (e.g. Mary Anne) will be parsed incorrectly under this scheme. Also note that additional unexpected text can cause incorrect parsing (e.g. Joe Johnson Jr.).

Additional Processing for pdsDisplayName

For the pdsDisplayName source, the processing is equivalent to:

  1. The value is taken from the data source
  2. If there is a space in the value, the last word becomes the sn
  3. The value is the displayName

Additional Processing for uwDisplayNameOverride

For the uwDisplayNameOverride source, the processing is equivalent to:

  1. A value is taken from the data source
  2. The value is the displayName
  3. If there is a uwRegisteredSurname value and it matches (case insensitive) a substring in the Name value, then that substring is the sn. Otherwise, the last word in the value is the sn.
  4. If there are two or more words in the value, then the first word is the givenName.
  5. If there are three or more words in the value, then the first character of the second word is the initials.

A Word About Microsoft Infrastructure Identity Name Design

The approach to name data at the UW is complicated because there are many different user populations with many different data sources behind each identity population. And of course, each of those data sources has different methods to make changes to the data. This means that any given application (and infrastructure like ours), must make a number of decisions about which name data to use. This means that applications are left making complex decisions about which name data to use, which can be especially complicated when a given identity has multiple affiliations. In contrast, the Preferred Name data source is unique in that it is a single central authority for name data for UW identities, and provides a self-service mechanism to effect changes.

Because of this complex background, Microsoft Infrastructure has always documented the algorithm behind our naming logic, so everyone can understand what we are doing and how they might change what they see. Until the Preferred Name data source emerged (and we adopted it), there were a number of scenarios where there was literally nothing you could do to change the display name on an identity. Fortunately, that is no longer the case.

You’ll note that we’ve preferred data sources where the user has the ability to directly change the value. You’ll also note that we’ve chosen to “normalize” naming data which has poor input validation so there is consistency in the resulting values.

There are differing opinions about these choices, but the fact remains that given the history of naming data here at the UW, there are poor options. We must balance privacy and disparate data sources, and correct for consistency.

Fortunately, with the emergence of the Preferred Name data source, the problems left are:

  1. There is no syntax or formatting enforcement for many of the data sources, which means we must make best guesses on how to correctly parse data that can be formatted an infinite number of ways. Fortunately, this issue is somewhat mitigated by Preferred Name.