top of page
Search
Tatiana Slepukhin-Zamachnaia

In this article you are going to lean Rules of Retention, or Principles of Retention in M365. It is one of the most important topics you need to understand and memorize. Without knowing these Principles, you are risking getting your organization in trouble.


Why does M365 need Rules of Retention?


In M365 we different types of documents in different repositories and these documents would have different retention requirements.


If you publish different Retention Policies to SharePoint Site, which Retention Policy would win? A winning policy is determined by the Rules of Retention.


Before we proceed it’s important to understand that we have Retention, which is retaining the content and Deletion, which applies to when the content is deleted. Retention and Deletion actions are part of the Retention Policies.  


Retention Policies in Microsoft 365 aren’t just about keeping content safe—they also control when content should be deleted. Think of Retention as preserving content for a set period and Deletion as a controlled way to remove it once it’s no longer needed.


The competing policies that could be applied to a SharePoint Site, when including deletion might look like this: We have multiple Retention Policies as well as multiple deletion policies.

M365 has a specific order of processing the Retention Policies. All Retention Policies are orchestrated according to specific Rules. There is an algorithm behind it, and you need to know all the Rules – otherwise your retention might fail.   


There are quite a few rules of Retention, and I will provide the clues that will help you memorize them quickly and without too much effort.


Rule Number # 1

Retention wins over Deletion


Let’s say you have a SP Site and there are two Retention Policies. One says Retain for 5 years and the other says delete in 3 years. If we were to respect the Deletion Policy, we would lose the documents early. But this first rule of Retention assures that we do NOT lose the documents, we’d rather make sure that we keep them. Whatever is lost is lost. We want to be on the cautious side here so M365 will play it safe.


Rule # 2

Longest Retention Wins


If we have two Retention Policies with different retention periods, the longest Retention Policy will win.


One way to look at it is also thinking about keeping the items safe. The intent of both Policies is to Retain. And as such, we want to respect the longest retention period.

So, the Retain for 7 Years Policy wins over Retain for 5 years Policy.   

 

Rule # 3:

Shortest Deletion Policy Wins


Let's consider two different Deletion Policies:

One is Delete in 2 years; the other one is Delete in 3 years.


Since both are Deletion Policies, the intent for each policy is to Delete. And so when the Deletion Policies compete, the shortest Deletion Policy will win.


In this example the winning policy is a Delete in 2 years policy


 

Ok, you might be confused about the last rule…. how come we need to keep it safe but then the shortest Deletion wins? We need to be mindful of the actions that compete, such as Retention and Deletion.   


Retention Over Deletion     -->Keep it Safe, Retention Wins over Deletion

Retention Over Retention    --> Here are two Retention Policies, it’s all about Retaining when they compete, hence In a way, it is also Keep it Safe, Longest Retention Wins

Deletion Over Deletion    --> When we have two competing Deletion actions, we are talking ONLY about the INTENT TO DELETE for both policies. Since the intend is to delete, the winner is the shortest one.



Rule # 4

Explicit over Implicit


So, here we have the Retention Policies in MS Purview that were published to SharePoint Site. These policies were automatically applied to the Items within the SharePoint Site. And this application of Retention is called Implicit.

Now we have the users who are not overly happy with the auto-applied policies. What they can do, they can manually apply Retention Labels that were published to the SharePoint Site.


Manual application of the Retention Labels is called Explicit. And so, manual or explicit application of Retention always wins.



First user applies a Label that says Retain an item for 5 years. If you recall what you just learned, normally the longest retention period wins. However, since she explicitly applies Retention, in this case she will override implicit Retention regardless of the duration of Retention.


Second user applies a label that says Delete after 10 years. Normally the shortest Deletion period would win, but since he explicitly applies this Label, the number of years is irrelevant.


Third user applies a label that says Delete after 2 years to a document that was supposed to be retained for 5 years. As you recall, Retention wins over deletion, the Keep it Safe rule. But again, since it is explicit manual action, it disregards that Keep it Safe rule because it was applied implicitly.


Rule # 5

Narrow over Wide


"In Microsoft 365, we have the following scopes of Retention Policies:

  • Org-Wide Retention Policies: These policies cover the entire organization, applying to all SharePoint Sites, OneDrive accounts, and other content locations.

  • Site-Specific Retention Policies: These apply to individual SharePoint Sites, providing a narrower scope than Org-Wide Policies.

Example:

Mailbox-Specific Retention Policies (Exchange Online): In Exchange Online, you can assign Retention Policies directly to individual mailboxes, which take precedence over broader Org-Wide Policies for those mailboxes.

This Rule of Retention ensures that policies with a narrower focus take precedence over broader settings so that specific needs are prioritized.

Additionally, you can apply Retention Labels to individual libraries. If you have a Retention Policy applied to an entire SharePoint Site, a Retention Label applied at the library level will override the Site Retention Policy. In this case, the Retention Label follows the Explicit over Implicit rule we discussed earlier.


Rule # 6

Hold Wins over Deletion


This rule is for Legal Holds or eDiscovery holds.

Here’s how it works: if there’s a deletion policy in place, say to delete files after three years, and that content is placed under a legal hold, the hold takes priority. This pauses the deletion policy, keeping the content preserved as long as the hold is active.

To sum it up:

  1. Holds always take priority over deletion.

  2. Holds pause any deletion policy, ensuring content stays protected for legal or compliance needs."


Microsoft provides a Retention Flowchart to visualize these rules, but I found it slightly confusing, particularly for people who do not know well all Rules of Retention, and that’s part of why I created this article and the video, which you can find below. But regardless, you’re now well-equipped to manage retention in Microsoft 365 with confidence. 

Source: Microsoft

I suggest that you consider creating a custom diagram to reflect the specific hierarchy of your retention policies and labels that are published within your M365, especially if you manage complex retention needs.




 

Tatiana Slepukhin-Zamachnaia

Updated: Dec 1, 2024

DLP Limitations with Non-Text Data


So, your organization has set up Data Loss Prevention (DLP) Policies to protect sensitive information. Awesome!


But here’s the problem: what if someone takes a snapshot of sensitive information and then exfiltrates it? Another issue involves PDF files.


Now, there are two kinds of PDF files—or rather, two types of information they can contain.

First, there are text-based PDFs, where the text is digitally encoded as characters, making it selectable and searchable. Then, there are image-based PDFs, where text is stored as graphical representations, essentially images of text.


You can test if a PDF is text-based by selecting the text with your cursor. If you can highlight it, it’s text-based. Another way to confirm is by copying and pasting the text into a text editor.

Your DLP Policies can handle text-based PDFs but won’t work on image-based PDFs without OCR enabled.


Even with text-based PDFs, if the file uses proprietary text encoding, it could make the text less accessible to DLP tools.


Insiders can be clever. If they’re bad actors, they’ll probably know these limitations and won’t hesitate to use images to bypass information protection.


OCR to the rescue


Here’s where OCR (Optical Character Recognition) comes to the rescue!


By enabling OCR in Microsoft Purview, your Data Loss Prevention (DLP) Policies gain the ability to detect sensitive information within image-based PDFs and images.   


With OCR enabled, Microsoft Purview can extract and analyze text embedded in images, making it nearly impossible for bad actors to slip through unnoticed.


OCR Is supported for the following M365 workloads:

·       Exchange

·       Teams

·       SharePoint

·       OneDrive for Business

·       Windows Devices


Currently the following file types are supported: JPEG, JPG, BMP and PNG.


Keep in mind, though, that OCR cannot read handwritten text—it can only recognize machine-typed text or printed text in images.

 

Configure OCR


Go to the Microsoft Purview Portal, select Settings, and then select Optical Character Recognition (OCR).


You can see that the option to enable OCR is greyed out in my tenant. This is because billing is not set up—I’m using a free Developer Tenant for this video. Normally, an organization would have billing set up, and you could enable it here.

 

OCR Estimates


Microsoft provides a free estimate tool, which is very handy, particularly if you have a lot of images.


You can try it for free by clicking this button.

When the estimates are available, you will see the "View estimations" button:



Click on it to go to the Estimates dashboard:


I have 220 images in my Tenant, so the estimated charges are 0.22 dollars:


Important note about graphic PDF files – at present, the estimates for this files are not supported in SharePoint and OneDrive. Additionally, keep in mind that each page within a PDF file is counted as one distinct image. So, if you have one graphic-based PDF file that contains 90 pages, you will be charged for 90 images.


Microsoft Purview’s OCR charges are based on the number of unique images scanned. Once scanned, the results are reused, regardless of how many policies, users, or activities involve the image, ensuring no duplicate charges.


Once you start the estimation process, estimates will be calculated daily until you explicitly stop it. The caveat here is that OCR and the OCR Cost Estimator can’t run simultaneously. So, if you’ve already enabled OCR and rely on it, make sure to stop the estimation process first.


Select More options here, and then click on Stop estimation.


You can always restart the estimation process; however, make sure to download the current report first.


When you start a new estimation, all existing data on the dashboards will be wiped out.



To download the current estimates, go to the Estimates dashboard and select Download Report to save the data in CSV format.


Here is the example of the CSV file:



OCR Limitations


The limitations of OCR that you need to be aware of:

  • The maximum supported image size is 50 MB.

  • The minimum image dimensions are 50 x 50 pixels, and the maximum dimensions are 16K x 16K pixels.

  • Zipped archives cannot be scanned.

  • OCR cannot scan images embedded within Microsoft Word documents.

Some of you might be wondering: after enabling OCR, do you need to modify your existing DLP Policies? The answer is no— existing DLP Policies will automatically start scanning images.

 

OCR PowerShell Commands

 

There are OCR PowerShell cmdlets available, but at this time, Microsoft doesn’t offer any documentation on their usage—I’m sure it’s coming. In the meantime, I’ll show you a trick to find PowerShell cmdlets for newly released features or any features, for that matter.


Connect to MS Purview:

Import-Module ExchangeOnlineManagement
Connect-IPPSSession

Then, try searching for anything related to OCR, like this:

Get-Command | Where-Object {$_.Name -like "*OCR*"}

You’ll get some unrelated commands that include “OCR” in their names, but you’ll also see the following relevant ones:


Fetch current OCR Configuration/Settings:

Get-OcrConfiguration 

Create new OCR Configuration and configure settings:

New-OcrConfiguration 

Remove current OCR Configuration

Remove-OcrConfiguration

Modify an existing OCR Configuration

Set-OcrConfiguration 

When creating an OCR configuration, you can enable OCR for specific locations or exclude certain locations. This is very important if you have a lot of images in your tenant, as OCR can quickly become expensive. While I showed you cost estimates for my demo tenant, it’s essentially empty. In a real organization, with many images, OCR bills can pile up fast.


For example, let’s say you have a SharePoint site where users store images of their pets or photos from corporate parties and social events. These could add up to gigabytes of images that you don’t want to scan. Who cares if someone exfiltrates a photo of your manager’s puppy?


Using either New-OcrConfiguraiton or Set-OcrConfiguration you can specify Exchange, SharePoint, or OneDrive locations to include, or you can exclude specific locations using the Exception parameter, such as SharePointLocationException.  


You can then extract the Locations or Exceptions using the following:

$arrayValues = (Get-command Get-OcrConfiguration).parameters.SharePointLocations $arrayValues[0]

Note that you will get a NullReference Exception if you don’t have OCR Configured yet.

 

Now that you know how to configure OCR and control its costs, it’s time to configure and optimize your settings. Focus on scanning only the locations that matter and avoid wasting money on unnecessary scans.



Tatiana Slepukhin-Zamachnaia

You may have heard that you can restore a SharePoint Library or read an article mentioning this option. But when you try to find it, you realize it's not available. In this article, I'll show you how to locate this option.

Here is a screenshot of the library that I messed up, and now I need to restore this entire Library. Let’s go to the settings and look at the menu. Take a look at the 'Library Settings' option—there’s nothing there. The 'Restore this Library' option is supposed to appear right below 'Library Settings,' but it is missing.

Now watch this:

We click on 'Library Settings' to open the settings.

Then click “More library settings”.



The settings page opens, but we won’t find our option here either. Instead, we’ll locate our library in the left panel—it’s 'Version Demo 2'—and navigate back to it. Let’s click on it.



We’re now back in the library where we started. But this time, if we go to the settings, the 'Restore this library' option has appeared right under 'Library Settings.'



Let’s select 'Restore this library,' and as you can see, we’re now on the restore page.


If you decide to restore the library later and click Cancel, when you return to the library, you’ll have to go through the same acrobatics we just did to navigate back to the 'Restore' page. As you will see, the 'Restore this library' option has disappeared again.


If you do it one more time—notice that this behavior is consistent. You’ll have to use this workaround every time you need to restore the library, unless Microsoft fixes it.


Now that you know how to find this option, I caution you to use it carefully until you fully understand how it works. I’m preparing an article where I’ll show you how easy it is to mess up a library and lose all your files if you're not mindful of some quirks the Restore option has.  


Watch the video that shows this behavior step-by-step:



bottom of page