Principal Investigator/Program Director Williams,Robert W.
4: Streaming video
Introduction. Over the last three years computers and connections to the Internet have improved by at least an order of magnitude. A personal Road Runner cable modem connection <www.rr.com> of the type now available widely to home customers runs at ~3000 kbps (Hamilton, 1999)approximately 54 times faster than a 56 Kbps modem. This is also twice the speed of a standard T1 connection that run at 1500 Kbps. Peak download speeds with a cable modem can reach 7 Mbps. This is enough bandwidth for high quality streaming video. In fact, cable modem speeds now often exceed the bandwidth of major university server connections, an odd and ironic reversal of fortune, and one that highlights the importance of university network upgrades such as the Internet 2 initiative. Similar changes are occurring on the hardware and video fronts. Extremely fast computer and video accelerators are now routinely used to play complex and demanding interactive Internet video games such as Quake and Myth. Many home computers are now more powerful than UNIX workstations that are only 1 or 2 years old. Computers equipped with 1020 GB drives and CPUs capable of processing nearly a billion floating point operations per second are not on the horizonthey are already in the living room. A terabyte of hard disk capacity, enough to hold one million 1 MB images of the type we have put into the MBL, now costs just under $10,000 ($212 per 25GB IBM DeskStar drive).
Even in a computer industry that is used to fast progress, this is blistering progress. The opportunities in terms of video microscopy are enormous, and in our opinion the major challenge is not in developing new technology, but deciding the best way to take advantage of what is already available. This is particularly the case when it comes to streaming video. In comparison to the equipment demands of a typical internet game aficionado, what we propose to do in terms of streaming video and in terms of controlling three axes of movement of a single microscope is a simple task. The bottlenecks for the iScope project for the foreseeable future will not be disk space, computer hardware, or connectivity. The main problems will really center around our ability to process suitable tissue for imaging and then to photograph that tissue and get those images or movies on to a web server. We spent a great deal of effort in assembling the images in the 1999 edition of the MBL. From the point of view of any other image databases of which we are aware, it is an extremely large and well structured collection. Yet all of these files on the web server (post-compression) only amount to 2 GB of images, about $10 of disk space, and an amount that fits on one side of a DVD-RAM disk.
Introduction to streaming video.
Streaming video is a technology intended for real time broadcast of compressed video over the Internet. From the clients point of view the spatial and temporal resolution of the video depends from the quality of their connection. The server polls the client and based upon the connection speed then servers a video stream at an appropriate rate and resolution. In the context of microscope control all of the video must be real time rather than prerecorded video. Given the speed of current G3 and G4 processors providing real time processed video is not difficult. An important consideration is the quality of the network connection of the streaming server.
All the necessary hardware and software to upgrade the iScope to a true streaming server is in place. We have a G3 that runs the new Mac OSX, a UNIX operating system with a Macintosh like shell. OS X comes with a custom QuickTime 4streaming server. The QT4 streaming architecture has an impressive feature set, and we will be able to simultaneously deliver streams of video at multiple resolutions. The QT4 streaming system will allow us to deliver at least 10 fps at resolution of 160 x 120 pixels even to clients on 56 Kb connections. At the other end of the spectrum, the full MiniDV camera output amounts to 3.6 MB/sec, and it is likely that this bandwidth will soon be available to many of our client, giving them the same video quality that we have at the camera. Letters in the appendix emphasize the enthusiasm of senior network and computer administrators of UT Memphis for this virtual microscopy lab. They view this project as an extremely powerful teaching tool, and they have guaranteed us their full cooperation is ensuring that our streaming servers will have the most suitable connections to the Internet.
FROM AIMS> Progressive video will be handled by a conventional Apache web server running on a Macintosh G3. Four streaming servers will put on the UT Memphis fiber optic backbone using 100BaseT or higher bandwidth (see appendix letters). There will be two sources of video: html progressive video and live streaming video. Both will be provided in QT4 format. Gamma will be customized for either Windows or Macintosh operating systems.
FROM AIMS> There are numerous special requirements for effective streaming microscopy on the web. Not only must the video system provide a rapid update from the camera, but the web microscopist needs to be able to rapidly control the movement of the stage. Achieving fluid two-way communication is a challenge. For example, in our current implementation, the duty cycle of the systemthat is, the time between initiating a request to move the stage and the time at which the video is actually refreshed on the clients web browseris approximately 1020 seconds, even on an in-house connection. This lag is primarily due to the fact that we have not yet implemented a streaming video server. The current system is a standard web server that orders new video frames from the camera at 10-second intervals. For true streaming video, faster low-resolution video is preferable to slow high-resolution video during the navigational stage. A relatively large field of view is also preferable to a small high-magnification view. For this reason, each microscope used for streaming video will be equipped with two video ports and two digital cameras. One camera will be optimized for navigation and the other for analysis and image capture.
Video format and distortion.
Aspect ratios of different video formats vary considerably (4:3 for NTSC, 16:9 for HDTV, 2.21:1 for Cinemascope). The majority of images will be acquired and processed at using the CCIR601 format: 720 x 486 pixel, 30 fps digital video format. This format has pixels with a ratio of 1:1.08. Aspect ratio is not distorted on analog playback. However, all standard computer monitors display square pixels and as a consequence, if uncorrected, DV format appears wider than expected. Non-square pixel correction will be performed as a batch process using Media Cleaner Pro 4. Archival material will be retained in the original miniDV format, but through-focus series will be processed and uploaded to our servers as 640 x 480 pixel movies.
Navigational aids for streaming videomicroscopy.
The current iScope provides x.y.z field coordinates relative to an arbitrarily define 0.0.0 point. We will need to integrate the Slide-Coordinate database with the iScope code in order to have a consistently defined 0.0.0 point of reference. The system of fiducial marks and coordinate scheme for each slide is one of the key aims of Project 1 (MBL) and is taken up elsewhere in more detail. Rather than displaying the x.y.z. coordinates in a separate part of the web browser, we will merge the coordinate and a calibration scale onto an alpha channel of the streaming video. A simple click will either add or subtract this alpha channel from the clients view of the section. Lumsdon et al. (1995) have described a simple analog video mixer that allowed them to align video data with electrophysiological recordings. Our solution will be along the same lines, but we will be able to exploit the alpha channel to do this interactively using a web interface. The QuickTime streaming server supports an 8-bit alpha channel.
We plan to acquire essentially all movies in the miniDV format. The choice of this format is a crucial design decision and one that needs to be justified. Our first justification is that this format is of significantly higher image quality than conventional NTSC video currently used in most laboratories. It is also a higher quality than Hi8 or S-VHS standards. A major problem with NTSC and PAL video is that both are interlaced formats. The two fields that make up a frame can give rise to a comb artifact. This artifact is objectionable when NTSC signals are displayed on non-interlaced computer monitors. The raw NTSC format also takes up a huge amount of space: 27 Mb per second of video. In contrast, the miniDV format incorporates progressive scan, non-interlaced acquisition at 30 fps. The resolution is also more than adequate for the task at hand: at the video magnification at which the majority of image stacks will be acquired 720 x 486 pixels/frame will be able to capture even the finest details. For example, with the current iScope and a 40x objective, individual pixels have dimensions of about 0.15 µm, much finer than the resolving powers of the objective. This ensures a high modulation transfer from the image plane to the CCDs and ultimately, to the display.
Like NTSC video, the miniDV format has an aspect ratio of 4:3, and is therefore well matched to display on conventional televisions and most computer monitors. This aspect ratio is more suited for video microscopy (although 1:1 would be ideal) than the 16:9 ratio of digital TV.
Of greatest significance, miniDV is digital from its inception in the camera and can be copied from generation to generation without loss. The data stream output from the camera is fixed at a constant 3.6 MB/seca tractable rate that can be handled without interruption by modern computer hardware and disk drives. This 3.6 MB/sec is fed via the now ubiquitous Firewire/IEEE1394/I-Link directly to a computer. Firewire cables can be up to 10 meters in length, allowing us to separated microscopes from computers if we need to.
Cost is a major factor that has driven our decision to adopt the MiniDV format. Very high quality 3-CCD cameras can be purchased for approximately $4,500. The camera we are currently using is an extremely versatile unit with "progressive scan." Each frame is recorded as a single non-interlaced image. Other advantages include excellent sensitivity (allowing us to run the tungsten bulbs at a lower voltage), built in color-correction, control of functions from the computer, and a trouble-free constant operation in camera mode. One important practical advantage of the Canon XL-1 is that the lens can be dismounted and once a custom adapter piece has been machined, the CCDs can be placed directly at the image plane of the objective. This simple arrangement can result in extraordinary high quality
DV is usually used as the high quality source video. DV material is subsequently compressed for delivery in different formats; for example as QuickTime, MPEG, RealG2 SureStream, or Windows Media ASF format.
The particular advantage of the Canon XL-1 is that it has interchangeable lenses. We have machined a special adapter for the XL-1 that allows it to be mounted directly to our Zeiss Universal.
The DV format defines image size, frame rate, and compression. Raw dv files will be available (suffix .dv). Media Cleaner can encapsulate DV movies as QT movies. This may be a good option for ftp site storage. QT encapsulated movies will not play smoothly except on the latest generation of PCs. The encapsulate movies cannot be recorded directly to a MiniDV camcorder.
Other format contenders: DVD-video (the video available on disc in video stores) uses MPEG compression
An uncompressed or source NTSC signal required a bandwidth of approximately 27 MB/sec. In contrast, the higher quality miniDV format requires only 3.6 MB/sec. This trick is accomplished by an extremely efficient compression method carried out in the camera. The miniDV format uses 4:1:1 or 4:2:0 color subsampling compression. The luminance signal is not subsampled at all, but the chromanance channel has either one-quarter or one-half the original spatial resolution. This can result in subtle color aliasing artifacts. However, in the context of imaging either Nissl-stained or immunohistologically-stained specimens that are (or will be) part of the MBL, and that will be used for data acquisition and analysis the subtle color aliasing artifact is not important when weighed against the numerous advantages of this digital format.
Use of an alpha channel.
An alpha channel is often used in video applications to apply text and graphic overlays onto a video source. In the context of this project, we will be using an alpha channel to label each frame with a Z-axis position as well as with calibration. We have experimented with various ways to use alpha channels with QT4 movies (see <nervenet.org/mbl/mbl.html.>).
The real success of several parts of this project depends on developing effective, semiautomatic processing of image stacks. We will rely very heavily on a marvelous utility program called Media Cleaner Pro 4, that includes very powerful batch operations. It can readily handle batch lists of up to 1000 files. Media Cleaner outputs a batch log file that can be used to verify progress and setting that are being applied.
The main bottleneck will involve those steps that require human invention. The only significant computation bottleneck is the asymmetric compression of image stacks for QT4 web publication. The Sorenson Video codec takes up to 2 seconds to compress each frame. It will therefore take 12 minutes to compress each through-focus master clip to the version for web distribution.
Streaming video versus On-demand video.
There are two types of streaming video: "true streaming" and "HTTP streaming." HTTP streaming is also referred to a "progressive download" streaming or "on-demand video" and this method is much most appropriate for acquiring and playing QT movies in the MBL collection. Neuroscientists and stereologists are far more likely to run through the image stack forward and backward, effectively changing the focal plane in the tissue.The time dimension has been converted to the third spatial dimension and it is important to be able to move up and down with equal speed and equal image quality. In contrast, true streaming is essentially a live video feed from the iScopes to the client through a QT intermediary. This method requires the bandwidth of the media feed from the microscope matches that of the viewers connection. This process requires special streaming server protocols. We will provide both types of video, but using entirely different servers.
Keyframes or not.
In a through-focus series at 1-µm steps there is a great deal of z-axis image redundancy that can be stripped away without significantly reducing image quality. This "temporal" or interframe compression can be carried out using several compression-decompression utilities: Indeo (an Intel Co. method), the Sorenson Video codec (our current preferred codec for QuickTime4), or MPEG-2 (used for full broadcast quality DVD-Video). To maintain high quality throughout the Z-axis set we plan to set keyframes every 5 frames or every 5 µm. A keyframe is a frame that is spared from compression during the production of final output movie. Most compression methods assume that the movie will be played in a preferred direction, moving forward in time. But for a through-focus series, there is no preferred direction, and as mentioned above, it is critical that our video microscopists are able to focus up and down with equal speed and image quality. Finding the optimal compression method that meets this design criterion and that leaves us with z-stacks that are under 2 MB is our goal. Given that the original unprocessed stack is under 7.2 MB we may be able to achieve compact files just using a mild JPEG compression on each frame. An example of this type of compression is shown on our web site at <nervenet.org/mbl/mbl.html>. This simple would be ideal because each frame would have precisely the same image quality. Whatever solution we strike upon we are not making an irrevocable decisionsall stacks will be archived in their original full miniDV resolution.
Construction of through-focus image stacks.
Each through-focus movie will be 4050 frames. Thirty to 40 frames (enough to extend through the entire section) will be devoted to the stack itself. The remaining frames will be appended JPEG frames. They appended frames will consist of the following:
1. A generic MBL title frame with support credit, web address;
2. Synopsis of case data from our phenotype databases,
3. Image of whole slide,
4. Image of section with stack target marked and precise acquisition target coordinates,
5. Data on time and microscope/video set-up used to acquire QT movie,
6. Full frame grid-type calibration standard,
7. Edge-only marginal calibration,
8. Grey scale gradients
9. Standard color bar,
10. Counting frame (50 x 50 µm).
11. Counting frame (25 x 25 µm).
A small calibration "watermark" will be placed on every frame. Each frame will be labeled with its z-axis position (corrected for Snells Law).
Data rate of the DV format is 3.6 MB/sec. This is a real-time 1:8 compression relative to a corresponding NTSC signal that is implemented by the camera hardware. A two-sided DV-RAM disk with a capacity of 5.2 GB can hold up to 1400 seconds, or 24 minutes of material. A single DV-RAM could therefore comfortably fit well over 1200 original quality through-focus series. Access to individual files on such a disk is rapid. A single miniDV tape could store several hours worth of original quality QT movies.
The QuickTime multimedia architecture as an effective vehicle for delivering through-focus image stacks. This architecture is currently the best choice for cross-platform compatibility, whether for streaming video, html progressive movie delivery, or for the producing CD-ROM or DV-RAM disks.
The challenge of developing batch acquisition and processing procedures to automatically generate, archive, and compress QT4 movies. The compression-decompression codec that we will use in conjunction with QT4 movies is the Sorenson Video Pro edition. This is a highly asymmetric codecthat is, the compression is much more computationally intensive that the decompression. This is advantageous from the point of view of achieving fast playback of movies, but it puts the rate at which movies can be compressed is as slow as 1 sec per frame even with extremely fast microcomputers (G3). It will therefore require approximately an order of magnitude more time to compress through-focus series than it does to acquire the stack. We do not yet know what the optimal solution and match will be between stack acquisition and stack compression. Even in a worst case scenario we do not expect it to take more than 5 minutes to compress the 30 to 40 frames that will be acquired at each site. We think that we should easily be able to acquire 20 to 30 movies per hour. The highest sampling density that we will use to "scan" a slide will be a 1 mm grid pattern. This relatively fine sampling grid will generate between 200 and 300 sample sites per slide, a number which can be acquired in half a day.
|Project 3:Neurocartographer and Segmentation of the MBL