A pattern-driven Huffman encoding and positional encoding for DNA compression

Published

30-06-2025

Keywords:

Compression Ratio, Deoxyribonucleic Acid, Huffman Coding, Positional Encoding Technique

Dimensions Badge

Issue

Section

Research article

Authors

  • Arunachalaprabu G Research Scholar in Computer Science, Thanthai Periyar Government Arts & Science College (Autonomous), Affiliated to Bharathidasan University, Tiruchirappalli, Tamilnadu, India.
  • Fathima Bibi K Assistant Professor in Computer Science, Thanthai Periyar Government Arts & Science College (Autonomous), Affiliated to Bharathidasan University, Tiruchirappalli, Tamilnadu, India.

Abstract

Researchers from bioinformatics, biology, biotechnology, and medical sciences who are engaged in genetic data analysis face significant challenges in the manipulation and storage of large datasets. Compression algorithms are essential for increasing storage capacity and reducing the number of bits required to represent nucleotide bases. The Pattern-driven Huffman Encoding and Positional Encoding for DNA Compression (P2DNAComp) algorithm is designed to compress both non-repetitive and repetitive pattern bases within DNA sequences. This demonstrates the algorithm’s adaptability across various pattern types in genomic data. P2DNAComp employs a systematic approach to efficiently compress DNA sequences. It reads the sequences and constructs a symbol table to maintain the positional values of repeated patterns. Using Huffman coding, the algorithm determines the optimal bit representation for each repeated pattern to maximize storage efficiency. For non-repetitive patterns, a coded table is created to store positional values. Subsequently, a positional encoding technique is applied to minimize the number of bits needed for efficient representation. The maximum positional value is set as the upper limit, and the minimum number of bits required is computed using a binary logarithm function. The final compressed sequence is generated by encoding both repetitive and non-repetitive patterns. Using standard datasets from the GenBank database, the performance of the P2DNAComp algorithm was evaluated based on compression ratio, compression/decompression time, and compression gain. The algorithm achieved an average compression ratio of 1.09 bits per base (bpb), an average compression gain of 86.279%, and average compression and decompression times of 0.547 and 0.563 seconds, respectively.

How to Cite

G, A., & Bibi K, F. (2025). A pattern-driven Huffman encoding and positional encoding for DNA compression. The Scientific Temper, 16(06), 4456–4467. Retrieved from https://scientifictemper.com/index.php/tst/article/view/2054

Downloads

Download data is not yet available.

Similar Articles

<< < 10 11 12 13 14 15 16 17 18 > >> 

You may also start an advanced similarity search for this article.