\documentclass[10pt]{article}
\usepackage{amsmath}
\usepackage{graphicx}
\usepackage{multicol}
\usepackage{vmargin}
%\addtolength{\textwidth}{\marginparwidth}
\setmarginsrb{2cm}{2cm}{2cm}{2cm}{0pt}{0mm}{0pt}{20mm}
\title{Cryptanalysis of Contents Scrambling System}
\author{Frank A. Stevenson \texttt{frank@funcom.com}}
\date{8th November 1999\\Updated 13 Nov.}
\begin{document}
\maketitle
\abstract{CSS is a scrambling system used in the distribution for movies on DVD (Digital Versatile Disc) a high capacity CD like storage system. Its main purpose is to prevent the unauthorized duplication of disc contents. This is achieved through encrypting the files, and storing keys in hardware. Here we will describe the system, and show that even if the keys can be securely stored in hardware, the data will not be protected from unauthorized copying. Severe weaknesses in the ciphers effectively voids the need for the hardware keys when decrypting the content.
Conversion from HTML to PostScript has been done by Thorsten Fischer \texttt{frosch@cs.tu-berlin.de}. This document is available at \texttt{http://www.derfrosch.de/}. The text has been adapted to suit the PostScript format - there were no changes of the content. The original document may be found at \texttt{http://crypto.gq.nu/.}}
\begin{multicols}{2}
\section*{General Disclaimer}
This information is provided as is, with no warranties on its accuracy or usability. It is based on a piece of source code claiming to be the css algorithms, and which have since been confirmed to interoperate with the CSS system. The author has not read any official CSS documentation, and any errors in the terminology is a result of this. This information has not to the knowledge of the author been made available through breaches of the DVD consortium Non Disclosure Agreement.
\section{System Overview}
Every DVD player is equipped with a small set of player keys. When presented with a new disc, the player will attempt to decrypt the contents with the set of keys it possesses. Every disk has a disk key data block that is organized as follows:
\begin{itemize}
\item 5 bytes hash of decrypted disk key ($hash$)
\item disk key encrypted with player key~1 ($dk_{1}$)
\item disk key encrypted with player key~2 ($dk_{2}$)
\item \ldots
\item disk key encrypted with player key~409 ($dk_{409}$)
\end{itemize}
Suppose the player has a valid key for slot 213, it will calculate
\begin{equation}\label{eqn1}
K_{d} = D_{A} (dk_{213}, K_{p213})
\end{equation}
To verify that Kd is correct, the following check is done, if the check fails, it will try the next player key.
\begin{equation}\label{eqn2}
K_{d} = D_{A} (hash, K_{d})
\end{equation}
An obvious weakness stems from this check, by trying all $2^{40}$ possible $K_{d}$, disk key can be deduced without knowing any valid player key. As will be shown later, this attack can be carried out with a complexity of $2^{25}$, making such an attack feasible in runtime applications. Another obvious attack is that by having 1 working player key, other player keys can be derived through a similar search. This can be done offline, also keys obtained from the former attack can be used as a starting point.
To decrypt the contents an additional key $tk$ - the title key is decrypted with the now decrypted and verified disk key.
\begin{equation}\label{eqn3}
K_{t} = D_{B} (tk, K_{d})
\end{equation}
Each sector of the data files is the optionally encrypted by a key that is derived from $K_{t}$ by exclusive or of specified bytes from the unencrypted first 128 bytes of the 2048 bytes sector. The decryption is done with the CSS stream cipher primitive described in section~\ref{streamprimitive}.
\section{CSS streamcipher primitive}\label{streamprimitive}
The CSS streamcipher is a very simplistic one, based on 2 LFSRs being added together to produce output bytes. There is no truncation, both LFSR are clocked 8 times for every byte output, and there are 4 ways of combining the output of the LFSRs to an output byte. These four modes are just settings on 2 inverter switches, and the modes operation are used for the following purposes.
\begin{enumerate}
\item Authentication to DVD drive (not discussed)
\item Decryption of Disk key ($D_{A}$)
\item Decryption of Title key ($D_{B}$)
\item Decryption of data blocks.
\end{enumerate}
LFSR1: 17 bits ? taps, and is initialized by the 2 first bytes of key, and setting the most significant bit to 1 to prevent null cycling.
LFSR2: 25 bits 4 taps, is initialized with byte 3,4,5 of the key shifting all but the 3 least significant bits up 1 position, and setting bit 4 to prevent null cycling.
As new bits are clocked into the LFSRs, the same bits are clocked in with reversed order to the two LFSRs output bytes. (With optional inversion of bits.)
The output of LFSR1 is $O_{1}(1)$, $O_{1}(2)$, $O_{1}(3)$\ldots
Likewise LFSR2 produces $O_{2}(1)$, $O_{2}(2)$, $O_{2}(3)$\ldots
These two streams are combined through 8 bits addition with carry carried over to the next output. The carry bit is zero at start of stream.
\begin{equation}
\begin{split}
O(i) = O_{1}(i) + O_{2}(i) + c,\\ \text{where $c$ is carry bit from $O(i-1)$}
\end{split}
\end{equation}
This streamcipher is very weak, a trivial $2^{16}$ attack is possible with output bytes known for $i = \{1,2,3,4,5,6\}$. Guess the initial state of LFSR1, and clock out 4 bytes. $O_{2}(1)$, $O_{2}(2)$, $O_{2}(3)$, $O_{2}(4)$ can then be uniquely determined, and from them the state at $i=4$ is fully known. The guess on LFSR1 can then be verified by clocking out 2 or more bytes of the cipher and comparing the result.
Another important attack is the case when only $O(i)$ for $i = \{1,2,3,4,5\}$ is known. Guess the initial state of LFSR1, and clock out 3 bytes. Now $O_{2}(1)$,
$O_{2}(2)$ and $O_{2}(3)$ can be found as in the above attack. This will reveal all but the most significant bit of LFSR2s state at $i=3$. If both possible settings for MSB is tried, and LFSR2 is clocked backwards 24 steps, a state where bit 4 is set at i=1 can always be found. (This is stated without proof). Select the setting of the most significant bit for LFSR2 such that LFSR2 is in a legal state at $i=1$, and clock out two more bytes to verify the guess of LFSR1. For some values of $O(i = \{1,2,3,4,5\})$ multiple start states can be found, and for others none. Selecting the correct start state is not a problem, as this attack is used in situations where only the first five output bytes are of significance (encryption of keys).
\section{CSS mangling step}
When the CSS streamcipher is used to encrypt keys such as in $D_{A} (data,key)$ and $D_{B}(data,key)$, an additional mangling step is performed on the data. This cipher is best illustrated with the following block diagram.
\begin{itemize}
\item $A(1,2,3,4,5)$ are the input bytes (data)
\item $C(1,2,3,4,5)$ are the output bytes (data)
\item $k_{i} = O(i)$ where $O(i=\{1,2,3,4,5\})$ is streamcipher output from key
\item $B(1,2,3,4,5)$ are temporary stages
\end{itemize}
The cipher is evaluated top down, with exceptions indicated by an arrow.
\resizebox*{\columnwidth}{!}{\includegraphics[angle=0]{mangle.ps}}
Examples of evaluating cipher:
\begin{itemize}
\item $B(j)$ = xor $(F (A(j)), A (j-1), k_{j})$ for $j = \{2,3,4,5\}$
\item $B(1)$ = xor $(F (A(1)), B(5), k_{1})$
\item $C(j)$ = xor $(F (B(j)), B(j-1), k_{j})$ for $j = \{2,3,4,5\}$
\item $C(1)$ = xor $(F (B(1)), k_{1})$
\end{itemize}
$F$ is a function, defined by a byte permutation table. With known cipher and plaintext, the whole cipher unravels with a minimal amount of work. Here is how:
\begin{itemize}
\item Make a guess on $k_{5}$
\item $B(5) =$ xor $(F (A(5)), A(4), k_{5})$
\item $B(4) =$ xor $(F (B(5)), C(5), k_{5})$
\item $k_{4} =$ xor $(F (A(4)), A(3), B(4))$
\item $B(3) =$ xor $(F (B(4)), C(4), k_{4})$
\item $k_{3} =$ xor $(F (A(3)), A(2), B(3))$
\item $B(2) =$ xor $(F (B(3)), C(3), k_{3})$
\item $k_{2}$ = xor $(F (A(2)), A(1), B(2))$
\item $B(1) =$ xor $(F (B(2)), C(2), k_{2})$
\item $k_{1} =$ xor $(F (A(1)), B(5), B(1))$
\item verify by checking $C(1) = $ xor $(F (B(1), k_{1})$
\end{itemize}
Thus by trying 256 possibilities, we can recover 5 output bytes from the CSS streamcipher, and so recover the key by using the five known output bytes. This attack can be put to immediate use for recovering other player keys as in the notes to eqn.~\ref{eqn2},~\ref{eqn3}. Even if the player key is not recovered through the reversal of the stream cipher, the output of the streamcipher is known, and will still be usefull for decrypting disks that employ other player keys.
\section{Attacking the hash of the disk key}
Reversing the hash at the start of the disc key block is somewhat more complicated. From [\ref{eqn2}] we see that only the hash value is known, the problem is finding a disk key such that the decrypted hash equals the disk key itself. An attack of complexity $2^{25}$ proceeds as follows.
First some aspects on the value of $k_{2}$ will have to be considered. $A(1)$ and $A(2)$ is known, and a table can be build by running through every possible combination of $k_{2}$ and $B(1)$ and calculate the resulting $C(2)$. When trying to build a table of possible values $k_{2}$ of indexed by $C(2)$ and $B(1)$ there will be many values that map to the same set of indices, so a the table must be able to hold several values of $k_{2}$ in each location.
Guess the start state of LFSR1, calculate $O1 (i = \{1,2,3,4,5\})$. Next guess $B(1)$ and complete the following calculations:
\begin{itemize}
\item $k_{1}$ = xor $(F (B(1)), C(1))$\\$C(1,2)$ is known, they are the start state of LFSR1
\item $B(5) =$ xor $(F (A(1)), B(1), k1)$
\item $k_{5} =$ xor $(F (A(5)), A(4), B(5))$
\item Through the table indexed by $C(2)$ and $B(1)$ all permissible $k_{2}$ can be found, there can be from $0-8$, on average $1$. For all permissible $k_{2}$ calculate:
\begin{itemize}
\item $O_{2}(1)$, $O_{2}(2)$, and 2 possible $O_{2}(5)$. This is possible since $k_{1,2,5}$ are found.
\item For every legal initial state of LFSR2 there exists a one to one mapping to $O_{2}(1,2,5)$, by generating a table with $2^{24}$ entries the start state of LFSR2 can be found. Thus $C(1,2,3,4,5)$ is potentially known.
\item $B(4) =$ xor $(F (B(5)), C(5), k_{5})$
\item $k_{4} =$ xor $(F (A(4)), A(3), B(4))$
\item $B(3) =$ xor $(F (B(4)), C(4), k_{4})$
\item $k_{3} =$ xor $(F (A(3)), A(2), B(3))$
\item $B(2) =$ xor $(F (B(3)), C(3), k_{3})$
\item verify $k_{2} =$ xor $(F (A(2)), A(1), B(2))$, this holds for $\frac{1}{256}$ tries ($2^{17}$ altogether) and if the test holds, the key $C(1,2,3,4,5)$ can be tested by eqn.~\ref{eqn2}. If eqn.~\ref{eqn2} holds, then a key has been found that will satisfy the hash. From experience it is possible to find from zero to a few such keys to any given hash value. When multiple disc keys are found trial decryption of the files will eliminate the false keys.
\end{itemize}
\end{itemize}
This attack when implemented on a Pentium~III running 450 MHz, will recover a disk key from the hash alone in less than 18 seconds. This is clearly much less than what is to be expected of a 40 bits cipher.
\section{Conclusion}
The author has through email correspondence learned that attacks as described at~(\ref{streamprimitive}) have indeed been carried out by brute force prior to this analysis. CSS was designed with a 40 bit keylength to comply with US government export regulation, and as such it easily compromised through brute force attacks (such are the intentions of export control). Moreover the 40 bits have not been put to good use, as the ciphers succumb to attacks with much lower computational work than which is permitted in the export control rules. Whether CSS is a serious cryptographic cipher is debatable. It has been clearly been demonstrated that its strength does not match the keylength. If the cipher was intended to get security by remaining secret, this is yet another testament to the fact that security through obscurity is an unworkable principle.
\section{Further information}
I have collected links to posts that were made to the Livid project mailing list. These include the original anonymous posting of the CSS algorithm, as well as full C source code for the attacks I outline here:\\
\texttt{http://crypto.gq.nu/livid.html}
\end{multicols}
\end{document}