325 lines
		
	
	
		
			9.6 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
			
		
		
	
	
			325 lines
		
	
	
		
			9.6 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
| 
 | |
|           DVD subtitles
 | |
|          ---------------
 | |
| 
 | |
| 
 | |
|   0. Introduction
 | |
|   1. Basics
 | |
|   2. The data structure
 | |
|   3. Reading the control header
 | |
|   4. Decoding the graphics
 | |
|   5. What I do not know yet / What I need
 | |
|   6. Thanks
 | |
|   7. Changes
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| The latest version of this document can be found here:
 | |
| http://www.via.ecp.fr/~sam/doc/dvd/
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 0. Introduction
 | |
| 
 | |
|   One of the last things we missed in DVD decoding under my system was the
 | |
| decoding of subtitles. I found no information on the web or Usenet about them,
 | |
| apart from a few words on them being run-length encoded in the DVD FAQ.
 | |
| 
 | |
|   So we decided to reverse-engineer their format (it's completely legal in
 | |
| France, since we did it on interoperability purposes), and managed to get
 | |
| almost all of it.
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 1. Basics
 | |
| 
 | |
|   DVD subtitles are hidden in private PS packets (0x000001ba), just like AC3
 | |
| streams are.
 | |
| 
 | |
|   Within the PS packet, there are PES packets, and like AC3, the header for the
 | |
| ones containing subtitles have a 0x000001bd header.
 | |
|   As for AC3, where there's an ID like (0x80 + x), there's a subtitle ID equal
 | |
| to (0x20 + x), where x is the subtitle ID. Thus there seems to be only
 | |
| 16 possible different subtitles on a DVD (my Taxi Driver copy has 16).
 | |
| 
 | |
|   I'll suppose you know how to extract AC3 from a DVD, and jump to the
 | |
| interesting part of this documentation. Anyway you're unlikely to have
 | |
| understood what I said without already being familiar with MPEG2.
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 2. The data structure
 | |
| 
 | |
| A subtitle packet, after its parts have been collected and appended, looks
 | |
| like this :
 | |
| 
 | |
|    +----------------------------------------------------------+    
 | |
|    |                                                          |
 | |
|    |   0    2                                         size    |
 | |
|    |   +----+------------------------+-----------------+      |
 | |
|    |   |size|       data packet      |     control     |      |
 | |
|    |   +----+------------------------+-----------------+      |
 | |
|    |                                                          |
 | |
|    |                     a subtitle packet                    |
 | |
|    |                                                          |
 | |
|    +----------------------------------------------------------+    
 | |
| 
 | |
| size is a 2 bytes word, and data packet and control may have any size.
 | |
| 
 | |
| 
 | |
| Here is the structure of the data packet :
 | |
| 
 | |
|    +----------------------------------------------------------+    
 | |
|    |                                                          |
 | |
|    |   2    4                                        S0+2     |
 | |
|    |   +----+------------------------------------------+      |
 | |
|    |   | S0 |                  data                    |      |
 | |
|    |   +----+------------------------------------------+      |
 | |
|    |                                                          |
 | |
|    |                      the data packet                     |
 | |
|    |                                                          |
 | |
|    +----------------------------------------------------------+    
 | |
| 
 | |
| S0, the data packet size, is a 2 bytes word.
 | |
| 
 | |
| 
 | |
| Finally, here's the structure of the control packet :
 | |
| 
 | |
|    +----------------------------------------------------------+    
 | |
|    |                                                          |
 | |
|    | S0+2  S0+4                                 S1       size |
 | |
|    |   +----+---------+---------+--+---------+--+---------+   |
 | |
|    |   | S1 |ctrl seq |ctrl seq |..|ctrl seq |ff| end seq |   |
 | |
|    |   +----+---------+---------+--+---------+--+---------+   |
 | |
|    |                                                          |
 | |
|    |                     the control packet                   |
 | |
|    |                                                          |
 | |
|    +----------------------------------------------------------+    
 | |
| 
 | |
| To summarize :
 | |
| 
 | |
|  - S1, at offset S0+2, the position of the end sequence
 | |
|  - several control sequences
 | |
|  - the 'ff' byte
 | |
|  - the end sequence
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 3. Reading the control header
 | |
| 
 | |
| The first thing to read is the control sequences. There are several
 | |
| types of them, and each type is determined by its first byte. As far
 | |
| as I know, each type has a fixed length.
 | |
| 
 | |
|  * type 0x01 : '01' - 1 byte
 | |
|   it seems to be an empty control sequence.
 | |
| 
 | |
|  * type 0x03 : '03wxyz' - 3 bytes
 | |
|   this one has the palette information ; it basically says 'encoded color 0
 | |
|  is the with color of the palette, encoded color 1 is the xth color, aso.
 | |
| 
 | |
|  * type 0x04 : '04wxyz' - 3 bytes
 | |
|   I *think* this is the alpha channel information ; I only saw values of 0 or f
 | |
|  for those nibbles, so I can't really be sure, but it seems plausible.
 | |
| 
 | |
|  * type 0x05 : '05xxxXXXyyyYYY' - 7 bytes
 | |
|   the coordinates of the subtitle on the screen :
 | |
|    xxx is the first column of the subtitle
 | |
|    XXX is the last column of the subtitle
 | |
|    yyy is the first line of the subtitle
 | |
|    YYY is the last line of the subtitle
 | |
|   thus the subtitle's size is (XXX-xxx+1) x (YYY-yyy+1)
 | |
| 
 | |
|  * type 0x06 : '06xxxxyyyy' - 5 bytes
 | |
|   xxxx is the position of the first graphic line, and yyyy is the position of
 | |
|  the second one (the graphics are interlaced, so it helps a lot :p)
 | |
| 
 | |
| The end sequence has this structure:
 | |
| 
 | |
|  xxxx yyyy 02 ff (ff)
 | |
| 
 | |
|  it ends with 'ff' or 'ffff', to make the whole packet have an even length.
 | |
| 
 | |
| FIXME: I absolutely don't know what xxxx is. I suppose it may be some date
 | |
| information since I found it nowhere else, but I can't be sure.
 | |
| 
 | |
|  yyyy is equal to S1 (see picture).
 | |
| 
 | |
| 
 | |
| Example of a control header :
 | |
| ----
 | |
| 0A 0C 01 03 02 31 04 0F F0 05 00 02 CF 00 22 3E 06 00 06 04 E9 FF 00 93 0A 0C 02 FF
 | |
| ----
 | |
| Let's decode it. First of all, S1 = 0x0a0c.
 | |
| 
 | |
| The control sequences are :
 | |
|  01
 | |
|    Nothing to say about this one
 | |
|  03 02 31
 | |
|    Color 0 is 0, color 1 is 2, color 2 is 3, and color 3 is 1.
 | |
|  04 0F F0
 | |
|    Colors 0 and 3 are transparent, and colors 2 and 3 are opaque (not sure of this one)
 | |
|  05 00 02 CF 00 22 3E
 | |
|    The first column is 0x000, the last one is 0x2cf, the first line is 0x002, and
 | |
|    the last line is 0x23e. Thus the subtitle's size is 0x2d0 x 0x23d.
 | |
|  06 00 06 04 E9
 | |
|    The first encoded image starts at offset 0x006, and the second one starts at 0x04e9.
 | |
| 
 | |
| And the end sequence is :
 | |
|   00 93 0A 0C 02 FF
 | |
|    Which means... well, not many things now. We can at least verify that S1 (0x0a0c) is
 | |
|    there.
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 4. Decoding the graphics
 | |
| 
 | |
|    The graphics are rather easy to decode (at least, when you know how to do it - it
 | |
|   took us one whole week to figure out what the encoding was :p).
 | |
| 
 | |
|    The picture is interlaced, for instance for a 40 lines picture :
 | |
| 
 | |
|   line 0  ---------------#----------
 | |
|   line 2  ------#-------------------
 | |
|    ...
 | |
|   line 38 ------------#-------------
 | |
|   line 1  ------------------#-------
 | |
|   line 3  --------#-----------------
 | |
|    ...
 | |
|   line 39 -------------#------------
 | |
| 
 | |
|    When decoding you should get:
 | |
| 
 | |
|   line 0  ---------------#----------
 | |
|   line 1  ------------------#-------
 | |
|   line 2  ------#-------------------
 | |
|   line 3  --------#-----------------
 | |
|    ...
 | |
|   line 38 ------------#-------------
 | |
|   line 39 -------------#------------
 | |
| 
 | |
|    Computers with weak processors could choose only to decode even lines
 | |
|   in order to gain some time, for instance.
 | |
| 
 | |
| 
 | |
|    The encoding is run-length encoded, with the following alphabet:
 | |
| 
 | |
|    0xf
 | |
|    0xe
 | |
|    0xd
 | |
|    0xc
 | |
|    0xb
 | |
|    0xa
 | |
|    0x9
 | |
|    0x8
 | |
|    0x7
 | |
|    0x6
 | |
|    0x5
 | |
|    0x4
 | |
|    0x3-
 | |
|    0x2-
 | |
|    0x1-
 | |
|    0x0f-
 | |
|    0x0e-
 | |
|    0x0d-
 | |
|    0x0c-
 | |
|    0x0b-
 | |
|    0x0a-
 | |
|    0x09-
 | |
|    0x08-
 | |
|    0x07-
 | |
|    0x06-
 | |
|    0x05-
 | |
|    0x04-
 | |
|    0x03--
 | |
|    0x02--
 | |
|    0x01--
 | |
|    0x0000
 | |
| 
 | |
|    '-' stands for any other nibble. Once a sequence X of this alphabet has
 | |
|   been read, the pixels can be displayed : (X >> 2) is the number of pixels
 | |
|   to display, and (X & 0x3) is the color of the pixel.
 | |
| 
 | |
|    For instance, 0x23 means "8 pixels of color 3".
 | |
| 
 | |
|    "0000" has a special meaning : it's a carriage return. The decoder should
 | |
|   do a carriage return when reaching the end of the line, or when encountering
 | |
|   this "0000" sequence. When doing a carriage return, the parser should be
 | |
|   reset to the next even position (it cannot be nibble-aligned at the start
 | |
|   of a line).
 | |
| 
 | |
|    After a carriage return, the parser should read a line on the other
 | |
|   interlaced picture, and swap like this after each carriage return.
 | |
| 
 | |
|    Perhaps I don't explain this very well, so you'd better have a look at
 | |
|   the enclosed source.
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 5. What I do not know yet / What I need
 | |
| 
 | |
| I don't know what's in the end sequence yet.
 | |
| 
 | |
| Also, I don't know exactly when to display subtitles, and when to remove them.
 | |
| 
 | |
| I don't know if there are other types of control sequences (in my programs I consider
 | |
| 0xff as a control sequence type, as well as 0x02. I don't know if it's correct or not,
 | |
| so please comment on this).
 | |
| 
 | |
| I don't know what the "official" color palette is.
 | |
| 
 | |
| I don't know how to handle transparency information.
 | |
| 
 | |
| I don't know if this document is generic enough.
 | |
| 
 | |
| So what I need is you :
 | |
| 
 | |
|  - if you can, patch this document or my programs to fix strange behaviour with your subtitles.
 | |
| 
 | |
|  - send me your subtitles (there's a program to extract them enclosed) ; the first 10 KB
 | |
|   of subtitles in a VOB should be enough, but it would be cool if you sent me one subtitle
 | |
|   file per language.
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 6. Thanks
 | |
| 
 | |
|  Thanks to Michel Lespinasse <walken@via.ecp.fr> for his great help on understanding
 | |
| the RLE stuff, and for all the ideas he had.
 | |
| 
 | |
|  Thanks to mass (David Waite) and taaz (David I. Lehn) from irc at
 | |
| openprojects.net for sending me their subtitles.
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 7. Changes
 | |
| 
 | |
|  20000116: added the 'changes' section.
 | |
|  20000116: added David Waite's and David I. Lehn's name.
 | |
|  20000116: changed "x0" and "x1" to "S0" and "S1" to make it less confusing.
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| -- 
 | |
| Paris, January 16th 2000
 | |
| Samuel Hocevar <sam@via.ecp.fr>
 |