Ripping DVD to Matroska and H.264

Introduction
Matroska is an open source, open standards multimedia container format, similar to AVI in concept, but far more advanced in practice. This guide will introduce you to some of the tools available to create, edit, and play Matroska files.

Excellent documentation is provided in the MAN pages of the programs used, but this guide will kick-start you into making high quality rips using the original subtitle and audio streams, and transcoding the MPEG-2 video to H.264, the codec used in Blu-ray movies, for reducing size while preserving the video quality as well as possible.

Matroska is supported in many video players such as VLC, MPlayer and Totem.

All of the commands involved in the ripping process are given with bash variables, so that they can be easily copy-pasted into bash scripts, automating the ripping process for highest convenience.

Practical Uses
If you have ever encoded movies into an AVI, and been slightly frustrated with the format's limitations, then Matroska is probably what you are looking for. Matroska boasts advanced features over AVI and WMV such as being able to include detailed chapter information, sub-chapters, titles, multiple audio and video streams, menus, subtitles, attachments and custom metadata.

One practical application would be that you have a DVD you want to watch on your computer, but you'd like to keep as much of the original format as possible (title, subtitles, audio tracks, chapter selection) but with the bonus of having the video re-encoded to something else (MPEG4 or Theora, for examples) to save on space.

DVD to Matroska
There are a few tools that you can use to complete every step, from ripping to watching, but in our howto we will focus on mplayer/mencoder/transcode to rip the DVD, mkvmerge to build the Matroska file (part of mkvtoolnix), and VLC to watch the movie.

Rip VOB to Hard Drive
The first step is to actually get the video files from the DVD that you want to encode. It's generally a good idea to rip the movie to the harddrive first, because encoding will go much faster reading from the harddrive, versus reading directly from the DVD disc drive. ''This HOWTO is based on a full-length feature movie. Ripping TV DVDs isn't any more difficult -- you just have to find the actual title number, whereas most ripping tools with movies will just pick the largest, which is almost always the movie.'' For our example, we'll actually use MPlayer to rip a DVD, since it's one very simple step, and generally always calculates the starting and stop point of the movies correctly (instead of messing with various vob-ripping options). Other tools for getting the VOB file to the harddrive are vobcopy, dvdbackup and dvdcpy. Once you've emerged MPlayer with the correct DVD support, with your DVD in the drive, first make sure you can play the movie:

or

Once you have found the number of the title, you're ready to rip it straight to a .VOB.

or

Extraction of the VOB file will probably take a few minutes to do, and likely will produce very little output, so be patient. Note that the ripped VOB file takes up a lot of space for a full-length movie. In some cases, it will take a while to get the CSS keys. The VOB file you just ripped will still contain all the audio/video/subtitle tracks the original DVD had. We only need the actual DVD for a few more seconds.

Chapters and Subtitle Information
We need physical access to the the DVD disc to extract chapter information and color information for the subtitles. Getting the chapter information is very simple, thanks to a tool called dvdxchap, which is part of the ogmtools package (with the 'dvd' use flag). So, just :

Then run the program and dump the output to a text file. Assuming that corresponds to your DVD device:

If you look at chapters.txt you'll see the format is something like this: chapters.txt The time index is in format of hours:minutes:seconds.milliseconds. The chapter name can be whatever you like, so feel free to edit it if you like. Or, you can always create a custom chapter list of your own. When we later rip the subtitles, we need information concerning the placement and color palette of the subtitles. This information is retrieved from vts_01_0.ifo. We therefore copy this file to our ripping directory for later use.

Note that is just an example of the mountpoint of the DVD, and may differ on your system. For the rest of the ripping process, the DVD disc is no longer needed. If the subtitles on the disc are correctly mastered, it can be more convenient to browse through the available subtitle and audio tracks (see below) using the MPlayer GUI than doing everything from command line, though.

Subtitles
In order to rip subtitles, we have to know the number of the subtitles tracks that we want to rip. If the disc is correctly mastered, you can switch between available subtitle tracks in the mplayer GUI (default key for changing subtitles is 'j'). You can get a hint as to which subtitle languages that are available by issuing the following command and examining its output :

A typical output would be: Output Many times however, this information is wrong, which in the case of this example becomes apparent later in the mplayer output: MPlayer's output The VOB file actually contains 12 subtitles in this case, and to identify which language corresponds to which subtitle number, you can issue:

Once you know which subtitle tracks to rip, use the following commands to extract the subtitles (tccat and tcextract are part of the transcode package, and subtitle2vobsub can be found in the subtitleripper package):

Note that we are using the copied vts_01_0.ifo from the previous step. Running these commands for several SIDs will gather all the subtitles in the vobsubs.vob and vobsubs.idx files, which makes it more convenient to automate bitrate calculation later. ${HEXSID} is the hexadecimal number 20 (decimal number 32) plus the SID. For SIDs 0 to 9, this translates into HEXSID 0x20 to 0x29. For higher SIDs, you have to work a little harder. SID 18 (hexadecimal 12) corresponds to HEXSID 0x32. Note: Some versions of tcextract (v1.0.3 for instance) seem use decimal numbers instead of hex as described here. It is always a good idea to check that you extracted the right subtitles. You can do this by running:

The reason why we create a subdirectory for this extraction is because there are a huge number of individual pgm files created in the process, and we want somewhere to put them where they don't litter too much. Once you have checked the pgm files, you can delete them.

Alternative to transcode and subtitleripper: If you have a recent version of mencoder and don't want to install the additional tools you can also extract the subtitles with mencoder:

This produces the two vobsub files subs.idx and subs.sub. If you call mencoder for different subtitle SIDs with the same file name base "subs" it will collect all the subtitles in these two files. Mplayer has also a -dumpsub option, but this option didn't work for me.

Audio
Identify the audio tracks that you are interested in ripping. A look at the output of mplayer as with the subtitles gives you a hint about the audio track IDs (AIDs) of interest. In our example we observe this part in the output from mplayer:

MPlayer's output As you can see, AIDs start at 128. Listen to each audio track you are interested in by running:

For each audio track that you want to rip, issue :

Assuming that you are ripping AC3 tracks (tracks can be in other formats, e.g. DTS). Using the mplayer GUI, you can play the resulting AC3 file. Write down the length of the track in hours, minutes and seconds. This will help when we later calculate the bitrate of the transcoded video stream.

Since the Matroska container accepts many audio formats, the audio stream can be transcoded to another format with lower bitrate. Keep in mind that it's best to keep conversion from one lossy format to another to a minimum, however.

Alternative to audio extraction: If you don't want to transcode the audio tracks (and don't need them to calculate the bitrate), you might also skip extracting these tracks and leave them inside the vob file: mkvmerge is able to fetch them from there.

Preparing to Transcode
Before transcoding, we want to determine how the video should be cropped, and calculate the bitrate of the transcoded video stream.

Cropping
To determine cropping parameters automatically, issue

The -sb 50000000 part is included to skip the first ~50MB of VOB data, since movies sometimes contain a title screen that has a larger frame than the movie itself. To minimize wasting space on encoding black borders, we want to crop as tightly as possible. Record the detected cropping parameters (for example 720:560:0:6).

Bitrate
An excellent bitrate calculator that suits our need can be found at Marc Rintsch's homepage. (direct link.) Extract bitrate.py from the downloaded tar file and run it:

Include all files that should be merged with the video file into the resulting MKV container file in the command line ( including different audio tracks, for example). ${TARGETSIZE} sets the target size in MB of the resulting MKV file. A target size of 2240 will result in MKV files small enough to fit two movies on a single-layered DVD, with some room (about 30MB) for errors. According to the Doom9 Codec shoot-out 2005, encoding the video with the x264 codec at these bitrates (equvalent to 3CDs, as mentioned in the codec shoot-out) will result in quality that is very close to the original VOB file. ${DURATION} is the duration of the movie (e.g. 1:23:45 for 1 hour, 23 minutes and 45 seconds) that we found earlier using the mplayer GUI. Write down the bitrate in kilobits per second (kbits/s)

Transcode to H.264
Matroska actually supports a lot of codecs that can be wrapped in an .mkv file. See mkvmerge --list-types for the full list.

For this example, we will be transcoding the video to H.264. This is a more modern codec than XViD or DivX, resulting in higher quality when comparing video streams encoded at equivalent bitrates. Make sure that you have enabled the flag when emerging MPlayer. With ${CROP} specifying our cropping parameters and ${BITRATE} set the desired (integer/whole number) bitrate, threads=# for multi-core cpus, in addition to our previous variables,issue the following commands for 2-pass encoding (the first pass only analyzes the video, so we discard the output by directing it to ):

The encoding parameters used give a nice tradeoff of encoding speed vs. quality, but can be adjusted for greater encoding speed or higher video quality by following the advice on e.g. mplayerhq.hu (scroll to the bottom of the page if you are impatient). The settings above are valid for most NTSC streams. In case of PAL streams, the pullup,softskip flags should be omitted

Encoding takes awhile. An average movie will take 3-5 hours on the first pass and the second pass will take a little longer. It is important to note that one of the advantages of the Matroska container is it supports software scaling of video files so there is no need to scale your video for the sole purpose of correcting the aspect when encoding your video, you can specify the display aspect when muxing the file. The mencoder commands given will however scale the image to 1:1 aspect ratio.

Different from what the .264 file extension suggests mencoder saves audio tracks by default into an avi container. There is no harm with that. But if you want to have a plain x264 stream add something like "-nosound -of rawvideo". There shouldn't occur any time synchronisation problems for video and audio because of that.

Merge Multimedia Streams
To merge video, audio and subtitle streams, and the chapter information, we will be using mkvmerge, which is part of the mkvtoolnix set of tools. Be sure to enable the  wxwindows  flag when emerging mkvtoolnix, in order to build the Matroska merging GUI (mmg)

A basic merge would look something like this:

You must specify the frames per second of the video or else mkvmerge will default to 25fps. This is only necessary for elementary h.264 streams, if your video is already in a container disregard the --default-duration option.

Hint for fetching tracks directly from the vob file: Use for example an option like "-a 0,1 ${RIPPATH}movie.vob" to add the first two audio tracks in the vob file. Try mkvmerge -i ${RIPPATH}movie.vob to get a list of the track IDs. You might want to prefix this option with something like "--language 0:ger --language 1:eng" to get the lables right. The language option can be prepended to both audio and subtitle inputs.

While mkvmerge can be used directly from the command line, mmg makes the job a lot easier, and continuously shows the resulting command line, for the day that you want to go hardcore. Add the mp4, ac3 and idx files from earlier steps to the list of input files in mmg. There are a lot of options to set, and the more information you provide, the nicer the resulting Matroska file, of course. Be sure to specify track names, track languages, file/segment title, chapter file at least.

Watching Movies
There are a lot of media players out there, but when it comes to Matroska support, VLC (VideoLan Client) is heads above the rest. The main outstanding feature that this has above others is chapter support, meaning you can jump ahead and back from one chapters to another. If you're not interested in that, MPlayer and Xine can both play Matroska audio and video files as well. Just remember to add matroska to your USE flags when emerging the media players. Please note that as of mplayer-1.0_rc1_p20070824, the matroska was removed (matroska support is built by default). As for xine, there seems to be support for matroska: there is no matroska USE flag, but matroska plays fine in kaffeine, using xine-lib-1.1.15-r1.

MPlayer Chapter Support
The MPlayer developers have added Matroska chapter support, and it's now available in an ebuild snapshot. Just emerge media-video/mplayer-1.0_pre20060810. To skip chapters, use @ for the next chapter, and ! to go back a chapter.

Graphical and command-line interfaces

 * OGMRip (media-video/ogmrip) -- Graphical frontend and libraries for ripping DVDs and encoding to AVI/OGM/MKV/MP4

This HOWTO is perfect for the technically oriented users. However, OGMRip might suit your needs in case you are looking for a clean HIG-compliant GTK+2 GUI frontend to MPlayer that can encode video to H.264, audio to AAC and put everything into a Matroska container (this listing is far from exhaustive). Please note that shRip is OGMRip's CLI interface. Because of the inherent GUI weaknesses, you will of course fail to find all the options that MPlayer provides. Still, OGMRip is worth a try if you feel rusty about all the available encoding terminology and options. Recently, as of v. 0.12.0, OGMRip has introduced "profiles", i.e. use or create predefined settings for specific tasks. The best part of the newish "profiles" is that there is a "User" Video Quality setting, which may be manually adjusted by User. It seems that even the lowest ("Normal") OGMRip default Quality settings are "Extreme" for my hardware, an Intel(R) Pentium(R) M processor 1.60GHz and 482MB Memory. In case you have "hardware issues", in OGMRip you can manually set most of the H.264 encoding options suggested in this HOWTO.


 * shRip (media-video/shrip) -- Command line tool for ripping DVDs and encoding to AVI/OGM/MKV/MP4


 * mkvtoolnix (media-video/mkvtoolnix) -- Tools to create, alter, and inspect Matroska files

A tool that may help you dealing with .mkv files is the mkvtoolnix package. To use the desired toolkit, simply adjust your use flags appropriately in package.use before emerging the package.