

This research paper from Microsoft on Farsite claims 'up to 50%' saving on de-dupe with a convergent file system - but that was tested against 500 computers in a corporate environment and it was done back in 2002. Users now store a lot more photos, a lot more of their own video, and any content that is DRM'd is also unique. There is nothing 'finally' about this additional information. The discussion and criticism of the claims on Twitter was knowing this information about convergent encryption and the key being derived from the content. There is a lot more that is still unanswered - such as how an 'intelligent cache' allows 'unlimited' storage to be available offline. I really wish these guys would release a research paper with their results, or include more information on their website before they make such bold claims in public.Īny content that is DRM'd is also unique.

The computation involved in keying media on the fly while it's being downloaded is not insignificant when considered in volume. The added pain of storing everyone's unique keys also discourages this behavior. At worst you'll see different keys being used by region or datacenter, or perhaps key rotation on a weeks-months scale. Some media (both DRM and non-DRM) will be trivially unique because of metadata like purchaser info or music tags. In some cases this makes the first block unique but all later blocks are deduped, in other cases you need to be somewhat content aware so you can treat the header data separate from the real media data. This also allows you to catch a lot of data people ripped themselves using standard settings. You can save on operating system and application files, but it isn't 60%.
