PROFILES: Case Study: High-Performance Computing
DNA Productions, Inc.
Background
DNA Productions, Inc. has created award winning 2D and 3D animated projects in Dallas, Texas since 1987. Examples of DNA's projects include the Oscar nominated, "Jimmy Neutron: Boy Genius" feature film for Paramount and Nickelodeon and "The Adventures of Jimmy Neutron, Boy Genius" television series, which currently airs on Nickelodeon. When DNA began development on a new animated feature film called "The Ant Bully," it made novel and interesting use of render farm and workflow systems built on top of open-source Grid Engine 6.
Quick Facts
- Business: DNA Productions, Dallas TX , 3D animation projects and features
- CPUs managed by Grid Engine: 1,400
- Storage: 73TB Isilon clustered storage
- Average # SGE jobs per day: 150,000
- Average # "shots" per week: 100
Seriously Cool: Grid Engine prolog and epilog scripts report job parameters, start times, job array index position, usage data and exit status information to a central SQL database. The data is piped to web dashboards that provide producers and animators with "percentage shot complete" and "percentage frame complete" information. The density of job related information retained in the SQL database is enough to allow for 100% accurate re-execution of any prior job for any reason.
Animation Workload and Grid Engine Workflow
The central work unit on the render farm is a "shot". The entire feature film can be described and broken down into a series of thousands and thousands of required shots. A shot contains a number of frames and each frame has roughly 10 different layers reflecting characteristics such as lighting, shadow, texture maps and more. The number of frames per shot varies from dozens to thousands. For this particular feature there are 1600 shots with an average length of 86 frames.
Each shot with all included frames and layers can be rendered independently and during the course of production a shot may be rendered and re-rendered many times. It is also possible to simply revisit/redo a layer or frame within a shot group.
Shots are rigidly named (example: 'xy_1_100_030_00_v005') and the naming scheme is 100% consistent and enforced across all groups and departments. This extends to the naming of the Grid Engine jobs and their output directories as well as to the physical layout of the multi-terabyte Isilon clustered storage system.
The storage layout is similar to what this author has seen at scientific organizations involved with massive genome sequencing efforts. At a sequencing facility, knowing the name of the contig, experiment or clone ID would allow someone to efficiently traverse the shared filesystem to find or load the relevant raw and derived data. At DNA Productions, the Isilon NAS directory structure is laid out to allow someone knowing only a shot name to find exactly the files, textures, metadata or media required. This allows for multiple, disconnected workflow and production systems to know *exactly* where to read and write files.
The render farm nodes started out with Fedora Core 2 Linux and a modern 2.6.x kernel although newly acquired 64bit AMD systems are running Fedora Core 4. The workstations and render farm nodes all have gigabit connections to the core network and shared storage. There is no routing, DNS or network topology difference between a animator's Linux workstation and a cluster node, allowing for production workstations to join the rendering grid as needed.
The production applications are nearly all commercial in nature. Some are FLEXlm licensed and others have proprietary licensing server systems. Example commercial applications include: Maya, Houdini, Massive and the Pixar RenderMan tools. All of the tools are well wrapped for submission to Grid Engine. Licenses are not exclusive to the cluster as many of the same applications are also run on workstations.
Grid Engine Configuration
When it was first set up, the Grid Engine 6.0u4 Grid Engine installation was effectively a default install with FIFO scheduling. The only major changes from built-in defaults were the use of classic spooling, a "job_load_adjustments np_load_avg=1.00" scheduling parameter tweak and a hard-coded "maxujobs=40" constraint.
Over time, the configuration has been subjected to various rounds of optimization and enhancement effort. Currently there are 10 cluster queues defined, with 14 host groups and 6 custom defined complex resources. The primary resource allocation implementation is based on the Functional Policy mechanism.
The maxujobs=100 parameter is still set during the day but is can automatically grow up to 250 during the night based on system load. The POSIX Priority policy is used at job submission time to help rank jobs by importance. Some jobs are submitted automatically with hold conditions that prevent execution until evening hours.
Workflow
There are two very interesting systems that DNA Productions has integrated with Grid Engine. The first is a set of graphical Grid Engine wrapper tools that automate the process of submitting Grid Engine jobs. These wrapper tools completely hide the standard SGE binaries ("qsub", "qrsh") from the users allowing a production member to (for instance) take a scene description file and send it out for rendering.
The GUI wrapper automatically creates shell scripts containing the embedded qsub and job array commands and submits it to the cluster. Because the qsub commands are programmatically generated without user interaction they can rigidly enforce standard job naming and output location conventions as well as contain embedded resource requests for things such as software license tokens. The wrappers provide two key advantages – they hide a significant amount of cluster related complexity from the end user while also ensuring that jobs are submitted in a uniform way that is consistent with production guidelines and workflow requirements.
The characteristics of the jobs themselves are very interesting. Average job runtime goes from several minutes to at most a few hours, except for special cases such as simulation runs and other special FX or testing efforts. The average job array contains roughly ~10 tasks within it.
What this means from a Grid Engine perspective is that there is enough "churn" within the system (active jobs completing and draining from the system) to allow the configured resource allocation policies to work rapidly. To put it another way - producers or department heads can easily pick out and prioritize pending Grid Engine jobs that are associated with critical shots.
The second integrated system is very impressive and is yet un-named.
Integrated with Grid Engine via the use of Prolog and Epilog scripts is the ability to pipe job submission & execution related information into a SQL database BEFORE and AFTER any SGE job or array element task completes. The prolog script will connect to the SQL database during job dispatch and persistently store specific information related to job submission (Job_ID, start time, host, task_id, and logpath). As each task completes, the epilog script will store the Job_ID, end_time, and array task_id. Augmenting the prolog and epilog scripts are database entries made by the DNA "farmwrapper" job submission engine. The end result is that enough information is captured in the database to enable any job to be re-created exactly and resubmitted to the render farm.
[ back to list of case studies ]
[ back to top ]
|