Optimizing Tip on Adaptive Arithmetic Coding

The Wayback Machine - https://web.archive.org/web/20090731195608/http://www.codeguru.com:80/cpp/cpp/algorithms/compression/article.php/c5089/

Videos

Submit an Article

Discussion Forums

Resource Directory

Announcements

Book List

Book Reviews

List of Gurus

Codeguru Blog

Guest Book

Technology Jobs

About Us

FAQs

Site Map

RSS Feeds

All

VC++/C++

.NET/C#

VB

See more EarthWeb Network feeds


User ID:
Password:
Remember Me:

Forgot Password?
Not a member? Click here for more information and to register.

Become a Marketplace Partner

internet.commerce

Partners & Affiliates

Home >> Visual C++ / C++ >> C++ >> Algorithms & Formulas >> Compression/Decompression

Optimizing Tip on Adaptive Arithmetic Coding
Rating: none

Alexey V. Shanin (view profile)
January 2, 2002

The arithmetic coding data compression method is very well-known. It has a number of implementations used in of popular complex, multistaged compression techniques. This article, therefore, assumes that the reader will be readers with the basic concepts of arithmetic coding.

This article describes an optimization point I found in the most commonly used adaptive Witten-Neal-Cleary implementation (also known as "the finite-precision algorithm"). For details (including source code samples), you can refer to the Mark Nelson's article on arithmetic coding. Below I'm using the common terminology that article also follows.
(continued)

In the practical implementations I've seen, the cumulative symbol frequency table (a component of the algorithm) was represented by an array of integer numbers containing, for each given symbol, the sum of frequencies of all the symbols having indices less than given. The array was sorted in arbitrary fashion (for instance, by "values" of symbols) or by (non-cumulative) frequencies of symbols. Encoding a regular symbol needs corresponding frequency interval to be determined; decoding needs determining a frequency interval containing a given point. Both actions are followed by updating the adaptive model, what's commonly done by increasing the non-cumulative symbol's frequency (= increasing the cumulative frequencies of all the symbols having indices not less than the one of the symbol encoded/decoded). The first thing is fast, the second can be done fast enough by binary search in the array (which is monotone; I've noted it's not done in the sample I've referenced above), but the third requires a considerable number of increments to perform. Sorting the array by frequencies gives effect only for small arrays (= small total number of symbols) and EXTREMELY non-uniform distribution of symbols.

So I've suggested to use a tree-like representation of the frequency information (stored in an array). Assume we totally have 8 symbols : 0, 1, ..., 7 and their cumulative frequencies Q[0] = 0, Q[1], ... Q[7], Q[8] = (the sum of all non-cumulative frequencies). We can break the whole frequency interval (0, Q[8]) with Q[4] into 2 subintervals (of lengths Q[4] and Q[8] - Q[4]), then do the same with the intervals obtained (using Q[2] and Q[6]), and so on. Thus we obtain a binary tree of "breaking points", each node of which represents breaking of an interval into the two. Let the node contain the breaking location counting from the lower bound of an interval. If we now store the resulting tree in an array using the indices of breaking points (as Q[] elements) in the Q[] array, we will obtain the following array A[]:
Q[0] = 0, Q[1], Q[2], Q[3] - Q[1], Q[4], Q[5] - Q[4], Q[6] - Q[4], Q[7] - Q[6] (, Q[8]).
Now, obtaining interval bounds and updating the model while encoding/decoding can be done parallelly. To pass a given symbol while encoding, binary-search it's index in the "0, 1, ..., 7, 8" array (starting to compare with "8", then "4" ...) with increasing A[k] when going left the node "k". To pass a given point while decoding, binary-search it in A[] with subtracting A[k] from the point when going right the node "k" (and increasing A[k] if going to the left). Initial state of A[] is not hard to determine (in our case, it is "0, 1, 2, 1, 4, 1, 2, 1, 8"), and the overflow preventing (halving the frequencies) is also trivial. Q[8] can be stored in A[0] instead of A[8]. I'm attaching a code shortcut containing 'CFreqModIncr' class implementing all this functionality (sorry, but I didn't have time to make up a complete project demonstrating it :) (maybe someone will do it "for me and others" ? ;).

// *** A short sample.

// Construct the model.
CFreqModIncr model;

// Initialize to keep 11 symbols and
// set up the increment.
model.Initialize(11);
model.dwIncrement = 1;

// To pass a symbol with index 'dwIndex'
// while encoding, use :
DWORD dwLeft, dwRight, dwTotal = model.GetTotalFrequency();
model.PassByIndex(dwIndex, dwLeft, dwRight);
... // ... and encode the (dwLeft/dwTotal,
... // dwRight/dwTotal) interval

// To decode a symbol, obtain a point
// 'dwPoint' in frequency units, then use :
DWORD dwLeft, dwRight, dwTotal = model.GetTotalFrequency();
DWORD dwIndex = model.PassByPoint(dwPoint, dwLeft, dwRight);
... // dwIndex = index of the symbol
... // decoded; other 'dw's are to be
// used in the rest of decoding process ...

model.Uninitialize();

Downloads

Download source - 2 Kb

Tools:

Add www.codeguru.com to your favorites

Add www.codeguru.com to your browser search box

IE 7 |

Firefox 2.0 |

Firefox 1.5.x

Receive news via our XML/RSS feed

RATE THIS ARTICLE: Excellent Very Good Average Below Average Poor

(You must be signed in to rank an article. Not a member? Click here to register)

Latest Comments:

• Huffmann compression algorithm - Legacy CodeGuru (02/08/2003)
• Patents? - Legacy CodeGuru (08/09/2002)
• Lexics... - Legacy CodeGuru (03/20/2001)

View All Comments

Add a Comment:

Title:
Comment:
Pre-Formatted:	Check this if you want the text to display with the formatting as typed (good for source code)

(You must be signed in to comment on an article. Not a member? Click here to register)

WebMediaBrands Corporate Info

Legal Notices, Licensing, Reprints, Permissions, Privacy Policy.
Advertise | Newsletters | Shopping | E-mail Offers | Freelance Jobs

Solutions

Whitepapers and eBooks

Whitepaper: Introducing the Azure Services Platform
Whitepaper: Introduction to Microsoft .NET Services for Developers
Whitepaper: Securing Microsoft's Cloud Infrastructure
Internet.com eBook: Becoming a Better Project Manager
Microsoft Partner Portal: What Azure Means for Microsoft Partners

Internet.com eBook: Web Development Frameworks for Java
Internet.com eBook: Developing a Content Management System Strategy
Article: Ten Things IT Professionals Should Know About Windows 7
Article: New BlackBerry Enterprise Server Eases Administration for IT Pros
MORE WHITEPAPERS, EBOOKS, AND ARTICLES

Webcasts

Get Started with .NET Services

MORE WEBCASTS, PODCASTS, AND VIDEOS

Downloads and eKits

Amyuni: PDF & PDF/A Engine for .NET and ActiveX Apps
Red Gate Free Trial: SQL Toolbelt
Iron Speed Designer Application Generator

Download Award-Winning Red Gate SQL Backup Pro
MORE DOWNLOADS, EKITS, AND FREE TRIALS

Tutorials and Demos

Internet.com Hot List: Get the Inside Scoop on the Hottest IT and Developer Products

MORE TUTORIALS, DEMOS AND STEP-BY-STEP GUIDES

May	JUL	Aug
	31
2008	2009	2010