Jump to content

Pausing Jobs and How This Could Affect Deduplication


Michael Hadder

Recommended Posts

  • Administrators

If you are ever in the situation where you need to pause a job for some reason, then you should know the possible impact this could create with deduplication. In general, pausing a job will not impact anything with the job, but this can change depending on other jobs that are running or will be run. Here is an example of something that can happen:

  • We start a job called Job A. Then we pause Job A and start another job (Job B) in the same custodian.
  • We are de-duplicating at the custodian level.
  • Let's say Job A has processed 10 files before it was paused, one of those files was found in Job B > Job B will list the item as a duplicate of the item in Job A.
  • Let's let Job B finish and unpause Job A and let it finish
  • The last item in Job A was a duplicate of an item in Job B > Job A will deduplicate it out, keeping the item from Job B

In this case both jobs have an item that was deduplicated from the other. Whichever job discovers the item first will be the reference, and all other jobs will mark their items as duplicates, even if one job finishes before the other or if a job is paused. Let's go over another scenario:

  • Let's start Job A and let it get though the same 10 documents again, then pause it
  • Now we will start Job B and let it finish. One of Job B's items is a duplicate of one of the items in Job A that is done with processing, and gets deduplicated
  • Now we delete Job A. What happens to the item in Job B? It still is marked as a duplicate, and will still be deduplicated.

Now we are in a problem where an item is getting marked as a duplicate despite the original job being deleted. Normally if we delete Job A before starting Job B, this wouldn't cause a problem, but because the jobs were active at the same time, the jobs are now "linked" by their duplicate items, and deleting one job means potentially re-running the other.

 

 

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...