您的位置:首页

关于HuffmanCoding的简单分析

2017-07-09 08:18 232 查看

1.what's problem we faced?

/**

* Q: what's problem we faced?

*

* A: Data compression is still a problem, even now. we want to compress

* the space of data. This desire is more and more stronger when we

* need to deal with some operation about data transmission. Before

* we start this article, it may be helpful if you try to provide a valid way

* to compress data . I tried, but failed obviously. That why I write this

* article. ^_^

*/

2. How can I solve it?

/**

* Q: How can I solve it?

*

* A: Where have problem is where have an answer, although it not always

* the best one. In 1951, a algorithm was introduced by David A. Huffman.

* It is different from the normal code and is a variable length code, which

* have different length of code for different symbol. Now, there are two

* problems:

*

* No.1: is variable length code possible? How can we know the length

* of current symbol?

*

* The answer is prefix code. Think about this, a tree like following:

*

*

* O

* 1 / \ 0

* O O

* 1 / \ 0 c

* O O

* a b

*

* This is a simple binary tree. There are three leaf node: a, b ,and c.we

* label all of left branch as 1, and all of right branch as 0. So if we want

* to arrive the leaf node a, the path is 11. In a similar way, we can get

* all of nodes:

* a : 11

* b : 10

* c : 0

*

* By accident, we get a variable length code.

*

*

* No.2: How can we use variable length code to compress a series of symbol?

*

* Now that we have a ability about variable length code. Some funny thing

* will happen. Image this, In a data, which consist of a series of symbols,

* some of symbols have occur at high proportion. some of symbols has occur

* at low proportion. If we use some shorter code to indicate those symbols

* which have a high proportion, the space of data will smaller than ever.

* That is what we want.

*

* Now, we have been know that we could compress a data by use variable length

* code. However, the next problem is what kind of variable length code is what we

* want. what kind of code is optimal ?

*/

3. What is HuffmanCoding ?

/**

* Q: What is HuffmanCoding ?

*

* A:Now,the problem is how can I create a optimal tree ? Do you have any idea?

* Huffman was introduced a algorithm. It is looks like greedy algorithm. It is may

* be simple, but the result is valid( this will be demonstrated below). The simplest

* construction algorithm use a priority queue where the node with lowest probability

* is given highest priority, the steps as following:

*

* 1. create a leaf node for each symbol, and add it to the priority queue.

* 2. while there is more than one node in the queue:

* 1. remove two nodes that have the highest priority.

* 2. create a new node as the parent node of the two nodes above. the

* probability of this one is equal to the sum of the two nodes' probabilities.

* 3. add the new node to the queue.

* 3. the remaining node is the root of this tree. Read it's code as we do above.

*

*/

4. is it optimal ?

/**

* Q: is it optimal ?

*

* A: Hard to say. I haven't a valid method to measure this. About this issue, it is necessary to hear

* about other people's advice. I believe there must be some exciting advice. By the way, this article

* is just talk about compress of independent symbol, another important issue is about related symbol.

* That maybe a serious problem.

*

*/

5. source code

/**
*    Here is an simple example
*/

#include <stdio.h>
#include <iostream>

/**
*    In a Huffman tree, some of nodes is valid symbol, and other is a combine node, which
*    haven't a valid symbol. we need to label it in our nodes.
*/
enum ELEM_TYPE {
ET_VALID,
ET_INVALID,
ET_MAX,
};

typedef int    INDEX;

/**
*    this is a container, we push all of element to it, and pop element by a priority. It is
*    a class template since we don't know the type of data element.
*/
template <class ELEM>
class Container {
public:
Container( int capacity);
~Container( );
/*
*    push a element to this container.
*/
bool push( ELEM item);
/*
*    pop a element from this container, the smallest one have the most priority.
*    Of course, the element must have provide a reload function for operator '<'.
*/
bool pop( ELEM &item );

private:
bool _find_idle( INDEX &num);
bool _set_elem( INDEX num, ELEM &elem);
bool _get_elem( INDEX num, ELEM &elem);

ELEM                *ele;
ELEM_TYPE    *stat;
int                        cap;
};

template <class ELEM>
Container<ELEM>::Container(  int capacity)
{
this->ele = new ELEM[capacity] ;
this->stat = new ELEM_TYPE[capacity];

int        i;
for( i=0; i<capacity; i++)
this->stat[i] = ET_INVALID;

this->cap = capacity ;
}

template <class ELEM>
Container<ELEM>::~Container(  )
{
if( this->ele!=NULL )
delete []this->ele;

if( this->stat!=NULL )
delete []this->stat;

this->cap = 0;
}

template <class ELEM>
bool Container<ELEM>::push( ELEM item)
{
INDEX        num = -1;

if( (!this->_find_idle( num))
||(!this->_set_elem( num, item)))
return false;

return true;
}

template <class ELEM>
bool Container<ELEM>::pop( ELEM &item )
{
INDEX    i = 0;
INDEX    Min;

/*
*    find the first valid element.
*/
while( (this->stat[i]!=ET_VALID)
&&( i<this->cap))
i++;

for( Min = i ; i<this->cap; i++)
{
if(  ( this->stat[i]==ET_VALID)
&&( this->ele[i]<this->ele[Min]))
{
Min = i;
}
}

return this->_get_elem( Min, item);
}

template <class ELEM>
bool Container<ELEM>::_find_idle( INDEX &num)
{
INDEX        i;
for( i=0; i<this->cap; i++)
{
if( this->stat[i]==ET_INVALID )
{
num = i;
return true;
}
}

return false;
}

template <class ELEM>
bool Container<ELEM>::_set_elem( INDEX num, ELEM &elem)
{
if( (num>=this->cap)
||(num<0) )
return false;

this->stat[num] = ET_VALID;
this->ele[num] = elem;

return true;
}

template <class ELEM>
bool Container<ELEM>::_get_elem( INDEX num, ELEM &elem)
{
if( (num<0)
||(num>=this->cap))
return false;

this->stat[num] = ET_INVALID;
elem =  this->ele[num];

return true;
}

/**
*    define a type of symbol. It will be used to record all information about a symbol.
*/
typedef char SYMINDEX;
typedef int SYMFRE;

class Symbol {
public:
/*
*    In the Huffman tree, we need to compute the sum of two child symbol.
*    For convenience,build a reload function is necessary.
*/
Symbol operator + ( Symbol &s);
SYMINDEX        sym;
SYMFRE            freq;
};

Symbol Symbol::operator +( Symbol &s)
{
Symbol        ret;
ret.sym = '\0';
ret.freq = this->freq + s.freq;
return ret;
}

/**
*    define a node of binary tree. It will be used to create a Huffman tree.
*/
class HTreeNode {
public:
/*
*    In the container, we need compare two nodes. So this node must
*    provide a reload function about '<'.
*/
bool operator< ( HTreeNode &n);

HTreeNode        *lchild;
HTreeNode        *rchild;
Symbol                sym;
};

bool HTreeNode::operator < ( HTreeNode &n)
{

return this->sym.freq<n.sym.freq?

true: false;
}

/**
*    This is the core structure. It will build a Huffman coding based on our input symbol.
*/
class HuffmanCoding {
public:
HuffmanCoding( );
~HuffmanCoding( );
bool Set( Symbol s[], int num);
bool Work( void);

private:
/*
*    create a Huffman tree.
*/
bool CreateTree(Symbol s[], int num );
bool DestroyTree( );
/*
*    read Huffman coding from a Huffman tree.
*/
bool ReadCoding( );
bool TravelTree( HTreeNode *parent, char *buf, INDEX cur);

Symbol                *sym ;
int                        sym_num ;
HTreeNode        *root ;
};

HuffmanCoding::HuffmanCoding( )
{
this->sym = NULL;
this->sym_num = 0;
this->root = NULL;
}

HuffmanCoding::~HuffmanCoding( )
{
if( this->sym!=NULL)
delete []this->sym;

this->sym_num = 0;
this->DestroyTree( );
}

/**
*    receive data from outside. Actually, this function is not necessary.But for make the
*    algorithm looks like more concise,maybe this function is  necessary.
*/
bool HuffmanCoding::Set( Symbol s [ ], int num)
{
this->DestroyTree( );

this->sym = new Symbol[num];
for( int i=0; i<num; i++)
this->sym[i] = s[i];

if( NULL!=this->sym)
{
this->sym_num = num;
return true;
}
else
{
this->sym_num = 0;
return false;
}
}
/**
*    The core function. In this function, we create a Huffman tree , then read it.
*/
bool HuffmanCoding::Work( void)
{

//Create a Huffman tree
if( !this->CreateTree( this->sym, this->sym_num))
return false;
//read Huffman coding
if( !this->ReadCoding( ))
return false;

return true;
}

bool HuffmanCoding::CreateTree( Symbol s[], int num)
{
/*
*    create a priority tank. It always pop the element of the highest priority in the tank.
*/
Container<HTreeNode>	tank(num);
for( int i=0; i<this->sym_num; i++)
{
HTreeNode        node;
node.lchild = NULL;
node.rchild = NULL;
node.sym = s[i];
tank.push( node);
}
/*
*    always pop two nodes, if fail, that's means there is only one node remain and it
*    is the root node of this Huffman tree.
*/
HTreeNode        node1;
HTreeNode        node2;
while(  tank.pop( node1)
&& tank.pop( node2) )
{
HTreeNode        parent;
parent.lchild = new HTreeNode;
parent.rchild = new HTreeNode;
*parent.lchild = node1;
*parent.rchild = node2;
parent.sym = node1.sym + node2.sym;
/*
*    push new node to the tank.
*/
tank.push( parent);
}

this->root = new HTreeNode(node1);

return true;
}

bool HuffmanCoding::DestroyTree( )
{

return false;
}

bool HuffmanCoding::ReadCoding( )
{
char        *code;
code = new char[this->sym_num + 1];
/*
*    travel the Huffman tree and print the code of all valid symbols.
*/
this->TravelTree( this->root, code, 0);

delete []code;

return true;
}

#define        LCHAR    '1'
#define        RCHAR    '0'

bool HuffmanCoding::TravelTree( HTreeNode *parent, char *buf, INDEX cur)
{
buf[cur] = '\0';
if( (parent->lchild==NULL)
&&(parent->rchild==NULL) )
{//end node
printf("[ %c] : %s\n", parent->sym.sym, buf);
}

if( parent->lchild!=NULL )
{
buf[cur] = LCHAR;
this->TravelTree( parent->lchild, buf, cur + 1);
}

if( parent->rchild!=NULL )
{
buf[cur] = RCHAR;
this->TravelTree( parent->rchild, buf, cur + 1);
}

return true;
}

static Symbol	sArr[ ] = {
{ '0', 0},
{ '1', 1},
{ '2', 2},
{ '3', 3},
{ '4', 4},
{ '5', 5},
{ '6', 6},
{ '7', 7},
{ '8', 8},
{ '9', 9},
};

int main()
{
HuffmanCoding    hcoding;
hcoding.Set( sArr, 10);
hcoding.Work( );

return 0;
}
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: