Mirabilis Design Logo
Technical Support

Designing an Information Appliance

Marketing has to explore feasibility of requests coming in from customer surveys and field request for new products.  Product engineering and the CTO's office have to develop prototypes of this new technology for demonstration at major trade shows.  During this period, the feasibility, conformance to market requirements and product specifications has to be designed and verified.  For this purpose it is essential to combine timing and functional with true data streams to explore the system.  Companies such as Nokia generate 120 products out of just 5-10 platforms.  The robustness and flexibility of these platforms is critical.  In addition, companies are developing platforms that can be used for a variety of applications.  The OMAP platform from TI and the Geode from National are examples of such generic platforms that can be adapted by customers for a variety of applications in the consumer and Wireless markets. Semiconductor companies must demonstrate the superiority of their platforms over the rival.  Most perform that today using relationships and data sheets.  Most of these are not dynamic and cannot capture the differentiating value.  Customers need an evaluation platform to experiment with the performance and functionality of these platforms or ICs for their application and traffic requirements. The VisualSim models provide an effective evaluation platform and are a replacement for the static reference designs available on semiconductor and system companies Web site.  Customers could be on your Web site exploring new technology the same way you are currently on our Web site viewing and modifying these models.

Project Objectives

To illustrate these requirements, a Personal Data Assistant (PDA) that contains a smart media and a wireless connection is considered. The selected application to evaluate the system is an MPEG stream.  The model utilizes information that is normally available on semiconductor companies data sheet to size the individual components.  The model studies the latency and image quality for the rendering of a MPEG4 image on a CRT.

The MPEG4 data can arrive from two sources- memory stick/ hard drive and wireless interface.  The data flows in the following manner from the source to the CRT:
  • PCI->PCI Bridge->Bus->CPU->Bus->Video Processor->Bus->Cache/SRAM->Bus->PCI Bridge->PCI.  
The images are decoded and displayed by the Video Processor.  The purpose of this experiment is to
  • Size the hardware components such as CPU, Bus and Video Processor
  • Determine if the Quantization algorithm must be processed on the CPU or on a dedicated Video Processor
  • Degradation level of the output picture due system latency and compression quality   
Some of the critical issues raised are:
  • Contention for the CPU Bus by different hardware resources.
  • Utilization and buffering at hardware components such as Bus and CPU.
  • Maximum image arrival rate

Capabilities Demonstrated

This model demonstrates the separation methodology and also shows how customers can utilize the Web as a media for transmitting technical information in a dynamic manner.  This model combines software application with the hardware.  In addition, this models displays the original transmitted image and the image after decoding, thus showing the superiority of the particular implementation of the Vector Quantization algorithm for HDTV design.  This model can also be expanded extensively to explore numerous other trade-offs such as image ratio versus CPU cycles consumed, determine the optimum software execution platform - CPU vs. custom processor and reduce power by sequencing tasks to optimizing the switching functions.

Model Development Statistics

Define data flow through the system = 2 Days
Number of unique blocks required to create the model = 6
Time to do the initial model construction = 1 Day
Model analysis and refinement = 3 days
Documentation = .5 day

Model Construction

This VisualSim system model consists of three sections-  Workload generator, behavior description and architecture description.  The connection between these various sections is using the Virtual method provided by VisualSim.  In the diagram, mapping between the behavior and the architecture is done using the Virtual Execution method.  

In this model, the transaction generated are the incoming MPEG frames.  These are two generators in the hierarchical block shown as the Workload_Gen- one for the Antenna and one for the hard drive.  The transactions are generated in a pattern described by a poisson distribution.  A refined model can use the actual arrival rate of the MPEG streams by capturing a arrival stream and feeding that file as the input.  The Workload_Gen has one parameter- Transaction.  This parameter can be modified to increase or reduce the number of frames that are transmitted in any second.  Each transaction is considered as a Data Structure and contains multiple fields.  These fields carry information that are required for the simulation.  The list of Data Structure fields can be seen in the Transaction output from the Dual Processor Model.  The MPEG data is not actually transmitted over the entire simulation but rather a representation in the form of frame size is sent.  The frame can be encapsulated as an object field in the data structure to be used by any part of the design that evaluates the algorithm.  In this example the image processing algorithm is evaluated in two locations- at the behavior and at the Video processor.  In this example, the parameter "transaction" is said to be exported to the upper layer as it is made common for the entire layer of this system.  This parameter can be made global and will be evaluated at simulation time or can be specified at any level of the hierarchy.

The output from the Workload_gen is fed into the behavior portion of the design.  The behavior describes the flow of data through the appliance.  As the execution is the same for the data generated from the antenna and the hard drive, the flow is also similar.  There is one decision tree in this flow and that is to determine if the data needs to be retrieved from the cache or memory.  One of the fields of the data structure contains a random distribution between 0 and 1.  if the field value is 0, then data is acquired from the Cache but if the value is 1, then the data is acquired from the SDRAM.  Each item in the behavior flow is mapped to respective hardware or software on the architecture.  When data arrives at the PCI behavior block, a request is sent to the PCI architecture where the execution occurs.  The behavior simply defines the functionality while the actual execution is performed on the architecture blocks.  The behavior is described using the Mapper_Adv SmartBlock.

The architecture elements are defined using the Scheduler SmartBlock.  The Video Processor has further refinement in that the actual vector quantization algorithm is implemented.  The processing requirements at each entity is determined by the size of the incoming frame and the clock speed of that block.  The connections between the architecture blocks are for statistics gathering and do not impact the simulation.

The results are gathered in two locations- Result hierarchical block and the Timeline plotter.  The output from the right bottom port of the architecture blocks is the statistics output.  The data from this port is captured and displayed using the plotter in the Result block.  All of the SmartBlock have statistics generation that can be utilized to generate data on the fly.  This eliminates the need for performing complex analysis outside of the environment.  The right middle port reports the time utilized of the block and can be plotted on a timeline.

Results

There are a number of results that are generated from this simulation:
  • Timeline plot showing the execution of the hardware elements over the period of the simulation
  • Component utilization for the CPU, Video processor, Bus, Bridge and PCI
  • Queue Occupancy for the same elements during the simulation.
  • Original image and the quanitized image to compare the quality.  The top is the original image while the lower one is the quantized image.
Image Viewers- There is considerable degradation in the quality of the output image after Quantization has been performed.  There are two options available- include additional digital packing techniques to compensate for the number of lost bits, create a smaller image blocks size (4:2 to 2:1) or increase the number of frames buffered before the display. Increasing the block height and width degrades the image further while reducing the block size substantially increases the quality.  In this manner the algorithm can be fine-tuned.  Changing the block size affects the bus and Video processor utilization.

Utilization Graph- All of the components are heavily under utilized except for the CPU Bus.  This is to be expected as the CPU Bus is accessed atleast 4 times by each flow.  Additional tradeoff can be performed to determine if the video processor can be eliminated.  This is a common problem faced by appliance makers to eliminate the application processor and share the load between the DSP and the CPU.  The appliance platform has a large headroom and a number of new features can be added onto this product without modifying underlying hardware.  This is important where this product is being sold as a generic development platform or where multiple product emanate from a single platform.

Timeline Plot- The timeline plot indicates that the instructions are accessing the SRAM at a much higher rate than the cache.  This could be a reason for the image quality degradation.  The cache speed can be maintained the same as the utilization graph indicates but the cache hit ratio must be increased.  It is possible to determine the optimum cache hit ratio.

Refinement Opportunities

There are a number of refinement opportunities for this model to get more analysis reports and thus make detailed decision.  Some are described below:
  • Correlate the change in the block size for the vector quantization to the number of processing cycles consumed at the Video processor.  So when the value is changed from 4:2 to 2:1, the quality of the image improves substantially.  But the processing multiplies.  This needs to be tied together.
  • Add further refinement in the flow to include the aspect of error correction and A/D for the MPEG stream coming over the antenna.  This will require additional processing and maybe some lookup table activity.
  • Capture additional data associated with contention on the PCI bridge and the CPU Bus.  This is important.  As is evident, the data makes multiple trips to the bus an this can be a major cause for overload.  One of the team members did a MPEG encoder/decoder chip design for a customer.  By performing a quick analysis using a similar model, the bus loading was determined to be 125% of the bus bandwidth.  The project was mid-way through the Verilog coding phase.  The original analysis was done using Excel.  The multiple loading of the same data on the bus was not properly analyzed in a analytical method.  This is addressed very well with VisualSim.

Mirabilis Design© Copyright 2006, All Rights Reserved.